Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

🗺️ Use a tree to match routes #4857

Closed
ryanflorence opened this issue Dec 13, 2022 Discussed in #4786 · 4 comments
Closed

🗺️ Use a tree to match routes #4857

ryanflorence opened this issue Dec 13, 2022 Discussed in #4786 · 4 comments
Labels
feat:routing proposal:remix Proposals from the Remix team.

Comments

@ryanflorence
Copy link
Member

Discussed in #4786

Originally posted by pcattori December 6, 2022

The Representation Principle: Once a problem is described using an appropriate representation, the problem is almost solved
-- Patrick H. Winston

tl;dr: Let's just copy what routers like DNS do!

Routes are inherently hierarchical as each / represents a subpath.
A tree directly models these hierarchies.
Ubiquitous, reliable, and performant routers use trees to match routes.
For example, DNS uses uses trie (a specific type of tree, also known as a prefix tree) to store and search domain names efficiently.

The main benefit of representing routes as a tree is that our route matching will be simpler, leading to more robust and predictable matching. By robustness, I mean that redundant or ambiguous routes will not be representable in a tree like they are now.
And by predictable, I mean that it will be simpler to explain and understand why the route was chosen as the best match for an incoming path.

Additionally, all matching will run at optimal $O(input)$, so we should see also see perf improvements, especially at larger scales.

Robust matching

Currently, /one/:two and /one/:three can coexist as routes and we arbitrarily (but consistently) choose one of these to have a higher priority.
While we could prune these routes manually, a better solution is to never allow redundant routes like these to coexist in the first place.

Encoding routes into a tree would ensure that we never have redundant routes, as redundant routes would occupy the same edge.
We can even warn or error when an ambiguous route is being added to the tree: WARN: '/one/:three' is ambiguous since '/one/:two' is already a route.

Simple, fast, deterministic matching

Trees natively support $O(input)$ lookups where $input$ is the number of nested segments of the incoming path.
We would then do one tree lookup to match an incoming path to a route.

Note that this is indeed optimal because we are using input to represent the size of the input, not the number of nodes in the tree. So this would be $O(log(n))$ where $n$ is the number of nodes in the tree if the tree is balanced.

Mutually exclusive sibling branches?

One decision we'll have to make is if dynamic segments are automatically mutually exclusive of their sibling paths or not.
For example, if we have routes /one/two and /one/:param1/three, the tree would look like:

Screen Shot 2022-12-06 at 5 14 53 PM

Can we tell at the /one node whether to go down the two branch or the :param1 branch?
If :param1 is modeled to be exclusive of its siblings (e.g. two), then yes!
We'd know to go down the two branch.
Then we'd conclude that there is no valid match for /one/two/three

If we don't assume mutual exclusivity of sibling branches, then we'd have to traverse both two and :param1.
Like before, the two branch would not yield a valid match, but in this case :param1 branch would yield a match for /one/:param1/three.1

In this case, we could have gotten multiple valid matches if we replaced /one/two with /one/two/three.
For now, we can pick the "best" match out of these via the existing scoring of routes, though we can improve on this as well.

Changing routes at runtime

If we represent our routes as a tree, we'll be able to use standard tree algorithms to efficiently match paths to a route.
Additionally, tries support $O(input)$ insertions and deletions, so we can efficiently support route modifications at runtime if needed.

Longest common static prefix

By default, lookups in our route tree would prefer static paths over dynamic paths so that /one matches /one over /:param.
This mirrors the "longest common prefix" matching found in DNS and other routers.

However, if we decide not to make dynamic segments mutually exclusive of their siblings, then we'd need to still pick the "best" match from among the valid matches.

A simple method to extend "longest common prefix" is to prefer "longest common static prefix".

For example, consider these two routes:

  • /one/two/:foo
  • /one/:bar/three

Which one should /one/two/three match?
Because the longest common static prefix for each is:

  • /one/two/:foo -> /one/two
  • /one/:bar/three -> /one

So /one/two/:foo should be chosen.
Note that we are measuring "longest" in terms of number of segments, not characters.

For two matching routes with the same number of static segments, we keep going until a tie is broken where one route has a static segment and one route has a dynamic segment.

For example, matching the path /one/two/baz/three to the routes:

  • /one/two/:bar1/:bar2' -> /one/two/:/:
  • /one/two/:foo/three -> /one/two/:/three

where we use : to denote a dynamic segment so that its easy to compare the routes.
Both match the same up to /one/two/:, but then /one/two/:foo/three matches statically while the other route matches dynamically.
So /one/two/:foo/three should be chosen.


Currently, we use scoring scheme with arbitrary weights to rank valid matches.
Switching to longest common static prefix would be a breaking change.
We can still switch internally to a tree for matching and use the existing scoring to pick between valid matches returned by the tree lookup to vastly simplify and speed up our implementation without any breaking changes.

I think longest common static prefix captures most people's intuitions about how matching should work better than our scoring scheme and is much more easily explained.
I also think that our scoring was trying to capture the same intuition in a less robust way.
So its likely that virtually all use-cases would not be breaking and I would bet that any use-cases that are breaking are probably behaving unpredictably for users today.

So we could philosophically call this a bug fix to our current routing rather than a breaking change.
That's probably what I would do, but I realize this could be contentious.

To make it less contentious, we could try see how closely it matches existing usage in our test suite, examples, and existing Remix projects.
But we can also treat this as a breaking change if we want to be super risk-averse.

Again, with context, I think I can convince you that this really is a bug fix, but let's talk about it 😁

Footnotes

  1. For large paths with many dynamic segments, lookup would take $O(2^dynamic)$ where $dynamic$ is the number of dynamic segments. Compare that to the exponentially faster $O(input)$ when dynamic segments are mutually exclusive to their siblings. Note that $O(2^dynamic)$ is the best we can hope to do in the worst case as there are $2^dynamic$ possible combinations for a path with dynamic segments. But this should be a rare case as we don't expect multiple routes of the form /:a1/:b2/:c3/:d4/:e5/:f6/some-static-segment-at-the-end to be a common use-case.

@ryanflorence ryanflorence added the proposal:remix Proposals from the Remix team. label Dec 13, 2022
@ryanflorence ryanflorence moved this to Backlog in Roadmap Dec 13, 2022
@ryanflorence ryanflorence changed the title Use a tree to match routes 🗺️ Use a tree to match routes Dec 13, 2022
@ryanflorence ryanflorence moved this from Backlog to Planned in Roadmap Dec 13, 2022
@ryanflorence ryanflorence moved this from Planned to Backlog in Roadmap Dec 13, 2022
@jacob-ebey
Copy link
Member

I did an initial pass at this here: https://github.com/jacob-ebey/router-trie
With a visualization of how it matches here: https://friendly-meerkat-d51e84.netlify.app

@github-actions github-actions bot added the needs-response We need a response from the original author about this issue/PR label Apr 30, 2023
@remix-run remix-run deleted a comment from github-actions bot Apr 30, 2023
@machour machour removed the needs-response We need a response from the original author about this issue/PR label Apr 30, 2023
@gyx1000
Copy link
Contributor

gyx1000 commented Jul 7, 2023

Hello,

To improve my understanding of react-router and how routeMatching works, I've modified the way routes are matched.
The modification doesn't take into account all the above points, but it gives an idea of the performance gain.

I preferred to run the DFS iteratively, to avoid the extra cost that a recursive approach might incur. (Honestly, I don't think it has much impact).

Of course, all current 1476 unit tests pass 😅.

To test performance, I created about 20 routes with a maximum depth of 4 nodes.

console.time('matchRoutes');
for (let i=0; i<=100_000; ++i) {
  matchRoutes(router.routes, '/');
}
console.timeEnd('matchRoutes');

Time for matchRoutes /
- actual algorithm: 16.155s
- tree algorithm: 1.607s

Time for matchRoutes /blog/10/edit
- actual algorithm: 9.879s
- tree algorithm: 2.636s

So as not to pollute the react-router project, I've chosen to run the PR on my fork.

matchRoutes tree test

Thank you very much for the react-router and remix-run projects. 🙏

@github-project-automation github-project-automation bot moved this from Backlog to Merged in Roadmap Aug 3, 2023
@mortterna
Copy link

Is this improved route matching algorithm still in works to make its way to the main? In my benchmarks it's relevant contributor especially for mobile to the INP metric.

@kiliman
Copy link
Collaborator

kiliman commented Jul 10, 2024

I agree. I hope they do implement this at some point. This will open up opportunities to do advanced matching, like regex and partial segment params.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feat:routing proposal:remix Proposals from the Remix team.
Projects
Archived in project
Development

No branches or pull requests

8 participants