🗺️ Use a tree to match routes #4857

ryanflorence · 2022-12-13T19:15:38Z

Discussed in #4786

^{Originally posted by pcattori December 6, 2022}

The Representation Principle: Once a problem is described using an appropriate representation, the problem is almost solved
-- Patrick H. Winston

tl;dr: Let's just copy what routers like DNS do!

Routes are inherently hierarchical as each / represents a subpath.
A tree directly models these hierarchies.
Ubiquitous, reliable, and performant routers use trees to match routes.
For example, DNS uses uses trie (a specific type of tree, also known as a prefix tree) to store and search domain names efficiently.

The main benefit of representing routes as a tree is that our route matching will be simpler, leading to more robust and predictable matching. By robustness, I mean that redundant or ambiguous routes will not be representable in a tree like they are now.
And by predictable, I mean that it will be simpler to explain and understand why the route was chosen as the best match for an incoming path.

Additionally, all matching will run at optimal $O(input)$, so we should see also see perf improvements, especially at larger scales.

Robust matching

Currently, /one/:two and /one/:three can coexist as routes and we arbitrarily (but consistently) choose one of these to have a higher priority.
While we could prune these routes manually, a better solution is to never allow redundant routes like these to coexist in the first place.

Encoding routes into a tree would ensure that we never have redundant routes, as redundant routes would occupy the same edge.
We can even warn or error when an ambiguous route is being added to the tree: WARN: '/one/:three' is ambiguous since '/one/:two' is already a route.

Simple, fast, deterministic matching

Trees natively support $O(input)$ lookups where $input$ is the number of nested segments of the incoming path.
We would then do one tree lookup to match an incoming path to a route.

Note that this is indeed optimal because we are using input to represent the size of the input, not the number of nodes in the tree. So this would be $O(log(n))$ where $n$ is the number of nodes in the tree if the tree is balanced.

Mutually exclusive sibling branches?

One decision we'll have to make is if dynamic segments are automatically mutually exclusive of their sibling paths or not.
For example, if we have routes /one/two and /one/:param1/three, the tree would look like:

Can we tell at the /one node whether to go down the two branch or the :param1 branch?
If :param1 is modeled to be exclusive of its siblings (e.g. two), then yes!
We'd know to go down the two branch.
Then we'd conclude that there is no valid match for /one/two/three

If we don't assume mutual exclusivity of sibling branches, then we'd have to traverse both two and :param1.
Like before, the two branch would not yield a valid match, but in this case :param1 branch would yield a match for /one/:param1/three.¹

In this case, we could have gotten multiple valid matches if we replaced /one/two with /one/two/three.
For now, we can pick the "best" match out of these via the existing scoring of routes, though we can improve on this as well.

Changing routes at runtime

If we represent our routes as a tree, we'll be able to use standard tree algorithms to efficiently match paths to a route.
Additionally, tries support $O(input)$ insertions and deletions, so we can efficiently support route modifications at runtime if needed.

Longest common static prefix

By default, lookups in our route tree would prefer static paths over dynamic paths so that /one matches /one over /:param.
This mirrors the "longest common prefix" matching found in DNS and other routers.

However, if we decide not to make dynamic segments mutually exclusive of their siblings, then we'd need to still pick the "best" match from among the valid matches.

A simple method to extend "longest common prefix" is to prefer "longest common static prefix".

For example, consider these two routes:

/one/two/:foo
/one/:bar/three

Which one should /one/two/three match?
Because the longest common static prefix for each is:

/one/two/:foo -> /one/two
/one/:bar/three -> /one

So /one/two/:foo should be chosen.
Note that we are measuring "longest" in terms of number of segments, not characters.

For two matching routes with the same number of static segments, we keep going until a tie is broken where one route has a static segment and one route has a dynamic segment.

For example, matching the path /one/two/baz/three to the routes:

/one/two/:bar1/:bar2' -> /one/two/:/:
/one/two/:foo/three -> /one/two/:/three

where we use : to denote a dynamic segment so that its easy to compare the routes.
Both match the same up to /one/two/:, but then /one/two/:foo/three matches statically while the other route matches dynamically.
So /one/two/:foo/three should be chosen.

Currently, we use scoring scheme with arbitrary weights to rank valid matches.
Switching to longest common static prefix would be a breaking change.
We can still switch internally to a tree for matching and use the existing scoring to pick between valid matches returned by the tree lookup to vastly simplify and speed up our implementation without any breaking changes.

I think longest common static prefix captures most people's intuitions about how matching should work better than our scoring scheme and is much more easily explained.
I also think that our scoring was trying to capture the same intuition in a less robust way.
So its likely that virtually all use-cases would not be breaking and I would bet that any use-cases that are breaking are probably behaving unpredictably for users today.

So we could philosophically call this a bug fix to our current routing rather than a breaking change.
That's probably what I would do, but I realize this could be contentious.

To make it less contentious, we could try see how closely it matches existing usage in our test suite, examples, and existing Remix projects.
But we can also treat this as a breaking change if we want to be super risk-averse.

Again, with context, I think I can convince you that this really is a bug fix, but let's talk about it 😁

For large paths with many dynamic segments, lookup would take $O(2^dynamic)$ where $dynamic$ is the number of dynamic segments. Compare that to the exponentially faster $O(input)$ when dynamic segments are mutually exclusive to their siblings. Note that $O(2^dynamic)$ is the best we can hope to do in the worst case as there are $2^dynamic$ possible combinations for a path with dynamic segments. But this should be a rare case as we don't expect multiple routes of the form /:a1/:b2/:c3/:d4/:e5/:f6/some-static-segment-at-the-end to be a common use-case. ↩

The text was updated successfully, but these errors were encountered:

jacob-ebey · 2023-01-14T11:42:42Z

I did an initial pass at this here: https://github.com/jacob-ebey/router-trie
With a visualization of how it matches here: https://friendly-meerkat-d51e84.netlify.app

gyx1000 · 2023-07-07T08:30:05Z

Hello,

To improve my understanding of react-router and how routeMatching works, I've modified the way routes are matched.
The modification doesn't take into account all the above points, but it gives an idea of the performance gain.

I preferred to run the DFS iteratively, to avoid the extra cost that a recursive approach might incur. (Honestly, I don't think it has much impact).

Of course, all current 1476 unit tests pass 😅.

To test performance, I created about 20 routes with a maximum depth of 4 nodes.

console.time('matchRoutes');
for (let i=0; i<=100_000; ++i) {
  matchRoutes(router.routes, '/');
}
console.timeEnd('matchRoutes');

Time for matchRoutes /
- actual algorithm: 16.155s
- tree algorithm: 1.607s

Time for matchRoutes /blog/10/edit
- actual algorithm: 9.879s
- tree algorithm: 2.636s

So as not to pollute the react-router project, I've chosen to run the PR on my fork.

matchRoutes tree test

Thank you very much for the react-router and remix-run projects. 🙏

mortterna · 2024-07-10T10:07:55Z

Is this improved route matching algorithm still in works to make its way to the main? In my benchmarks it's relevant contributor especially for mobile to the INP metric.

kiliman · 2024-07-10T12:23:45Z

I agree. I hope they do implement this at some point. This will open up opportunities to do advanced matching, like regex and partial segment params.

ryanflorence added the proposal:remix Proposals from the Remix team. label Dec 13, 2022

ryanflorence added this to Roadmap Dec 13, 2022

ryanflorence moved this to Backlog in Roadmap Dec 13, 2022

ryanflorence changed the title ~~Use a tree to match routes~~ 🗺️ Use a tree to match routes Dec 13, 2022

ryanflorence moved this from Backlog to Planned in Roadmap Dec 13, 2022

ryanflorence moved this from Planned to Backlog in Roadmap Dec 13, 2022

MichaelDeBoey added the 🗺 Roadmap label Jan 22, 2023

github-actions bot added the needs-response We need a response from the original author about this issue/PR label Apr 30, 2023

remix-run deleted a comment from github-actions bot Apr 30, 2023

machour removed the needs-response We need a response from the original author about this issue/PR label Apr 30, 2023

pcattori mentioned this issue May 4, 2023

Dynamic route not working with optional segment #6021

Closed

1 task

brophdawg11 added the feat:routing label May 5, 2023

brophdawg11 closed this as completed Aug 3, 2023

github-project-automation bot moved this from Backlog to Merged in Roadmap Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🗺️ Use a tree to match routes #4857

🗺️ Use a tree to match routes #4857

ryanflorence commented Dec 13, 2022

Robust matching

Simple, fast, deterministic matching

Mutually exclusive sibling branches?

Changing routes at runtime

Longest common static prefix

jacob-ebey commented Jan 14, 2023

gyx1000 commented Jul 7, 2023

mortterna commented Jul 10, 2024

kiliman commented Jul 10, 2024

🗺️ Use a tree to match routes #4857

🗺️ Use a tree to match routes #4857

Comments

ryanflorence commented Dec 13, 2022

Discussed in #4786

Robust matching

Simple, fast, deterministic matching

Mutually exclusive sibling branches?

Changing routes at runtime

Longest common static prefix

Footnotes

jacob-ebey commented Jan 14, 2023

gyx1000 commented Jul 7, 2023

mortterna commented Jul 10, 2024

kiliman commented Jul 10, 2024