Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Redirect client to better server #274

Closed
oliverhausler opened this issue Oct 11, 2020 · 10 comments
Closed

Redirect client to better server #274

oliverhausler opened this issue Oct 11, 2020 · 10 comments

Comments

@oliverhausler
Copy link

The NATS protocol lets clients connect to an arbitrary server, which is simple and generic, but often there is a better server (closer geolocation, the same server where a channel is currently published, etc.).

A lot of traffic could be offloaded from the own cluster if there would be a mechanism which tells a client to connect to another, better suitable server for a certain subscription.

@LaPetiteSouris
Copy link
Contributor

I think it is somehow related to this feature

If you want to subscribe for read-only purpose, for now you can try using ReadISRReplica . This may not be precisely what you want but for now it can somehow help in fanning out messages and reduce workload for leader.

@oliverhausler
Copy link
Author

@LaPetiteSouris Yes, I was close to adding it to #219, but it is not exactly the same, and a server in a close geo-location is not necessarily the best server. It depends on the use case, I guess.

If you think of chat rooms, geo-proximity or a geo-cluster is most probably the way to go. But this can also be different, think of something like YouTube [I know it's different, but only looking at the load distribution here], where videos are stored on a certain server. When a user subscribes to watch a video, the best server is probably the one which has the video locally stored, not the one close to the user. Only when the same video is served very often simultaneously, geo-proximity may win (and probably not for the data provider, only when looking at the network load).

Generically speaking, this is something many people get wrong. Everybody talks about edge, but having a client contact an edge server is only useful when the data is stored at the edge. As soon as the edge server must pull the data long-distance (worst if several round-trips are required for data requests), having the client connect to a server in close proximity to the data often wins, or even having a local cluster in a single geolocation probably wins.

@tylertreat
Copy link
Member

I think this would be a great feature. Would need to think through how to implement properly, especially since it depends heavily on the use case as you mention.

If you want to subscribe for read-only purpose, for now you can try using ReadISRReplica.

My thought was to eventually extend ReadISRReplica to make it smarter, e.g. by subscribing from the closest geo replica. This feature might play into how that would work.

@oliverhausler
Copy link
Author

It could even be a simple client feature, where the server sends a "redirect recommendation" with a list of better servers. It would then be upon the client to either disconnect or not.

@tylertreat
Copy link
Member

It might make sense to piggyback this information on the FetchMetadata endpoint.

@LaPetiteSouris
Copy link
Contributor

LaPetiteSouris commented Oct 15, 2020

It might make sense to piggyback this information on the FetchMetadata endpoint.

In such case the server should have geo-location awareness.

Otherwise, we may go up with a simple Load Balancing module. In fact I think it is hard to have strict and limited rule sets to decide "best servers". As IMO, the rule sets to decide, or to score a server in the cluster may differ greatly based on specific scenarios. Thus, a server judged as "best" or "next best" for one use case may not be even close to "good" for other use cases.

Agreed that the current way of giving a server to subscribe has lots of rooms for improvement, but I think it would be nice to come up first with something generic and simple enough to implement in the first time.

Some rules to score a server in the new Load Balancing modules may be:

  • Round robin : Go to the next server in the list ?
  • Least response time.
  • Least connections. (This may be suitable for FetchMetadata)
  • Geolocation distance

With that in mind, I do not know if it is actually Metadata or AggregatedMetrics we should focus on. If it is 2nd case, then issue #222 is related

@tylertreat
Copy link
Member

@LaPetiteSouris Agreed. Also, I would like to revisit how the client does connection management to leverage gRPC's components like Resolver and Picker. These are more extensible for allowing different implementations for how connections are selected/balanced (see liftbridge-io/go-liftbridge#89).

@oliverhausler
Copy link
Author

oliverhausler commented Oct 15, 2020

@LaPetiteSouris agreed, another algorithm:

  • Closest to publisher
  • Cheapest (servers have lower or higher data cost, based on location and provider)

@LaPetiteSouris
Copy link
Contributor

@LaPetiteSouris Agreed. Also, I would like to revisit how the client does connection management to leverage gRPC's components like Resolver and Picker. These are more extensible for allowing different implementations for how connections are selected/balanced (see liftbridge-io/go-liftbridge#89).

A prerequisite for that issue would be the server to expose required information first. E.g: information on geolocation, on the number of connections...etc

@LaPetiteSouris
Copy link
Contributor

LaPetiteSouris commented Dec 7, 2021

I suggest we close this one. Although liftbridge-io/go-liftbridge#114 is a bit rudimentary , it already provides this feature. Let's open another issue if needed in the future

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants