-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Write trin orchestrator logic #89
Conversation
e0a2dd9
to
72069a9
Compare
2107fc8
to
3df43af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check how much more work it would be to get discovery to only start up once?
"test_historyNetwork" => PortalEndpoint { | ||
kind: PortalEndpointKind::DummyHistoryNetworkData, | ||
resp: resp_tx, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a temporary thing to test that we can successfully dispatch to both subnets?
If so, maybe add a comment saying something like that, with a TODO to remove it when we have tests in CI that confirm it another way.
trin-history/src/lib.rs
Outdated
} | ||
|
||
impl HistoryRequestHandler { | ||
pub async fn process_network_requests(mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"network" requests makes me think peer to peer. But if I'm reading this right, it's all about serving the json-rpc requests, yeah? Another option is something like:
pub async fn process_network_requests(mut self) { | |
pub async fn handle_client_queries(mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my head it was something along the lines of "process these requests (from the jsonrpc handler) for data from your (sub)network". But, I like your suggestion and updated it. The fewer times we use "network" the better.
trin-core/src/portalnet/protocol.rs
Outdated
self.history_tx.as_ref().unwrap().send(message).unwrap(); | ||
let response = match resp_rx.recv().await.unwrap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really clear. When might this panic in any of these 3 unwrap()
s? If it's remotely possible, then we need some handling.
(and the same below, of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, for sure there's a lot of error handling that I just kinda glossed over so far. I'll go back and implement some handling
@@ -51,13 +62,7 @@ pub async fn main() -> Result<(), Box<dyn std::error::Error>> { | |||
tokio::spawn(async move { | |||
let (mut p2p, events) = PortalnetProtocol::new(portalnet_config).await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside PortalnetProtocol
it's launching discv5 right now. So currently, it would be launching discovery 3 different times. The direction I would head is to start discovery with trin-core, and pass a handle into both trin-state
and trin-history
so that neither of them try to start up their own discovery instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check how much more work it would be to get discovery to only start up once?
Hmm, my understanding might be wrong here? But this is in the "main()" function which will only be called if somebody chooses to run this binary by itself (cargo run -p trin-state
). In which case, wouldn't it want to boot up its own base discv5? I guess maybe there's some more thinking to be done about the "solo binary" story, do we want to support it, and be a bit more explicit about the functionality it would support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When booting trin via cargo run
, inside trin/src/lib.rs
we only call initialize()
for each subnet, not their main()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I had missed that. But... then aren't there other parts of the PortalnetProtocol
class that should be run that aren't being run? Like creating PortalnetEvents
and running tokio::spawn(events.process_discv5_requests());
. That's the code that reacts to the inbound talkreq's, and serves the overlay protocol to peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for sure those are needed - but to keep things simple, I just ignored it for this pr 😅 - tracking it here
51e68ac
to
cb4cbaa
Compare
@@ -97,6 +123,54 @@ impl JsonRpcHandler { | |||
.collect(); | |||
let _ = cmd.resp.send(Ok(Value::Array(routing_table_info))); | |||
} | |||
DummyHistoryNetworkData => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carver Any thoughts on this? Imo it's gross to read - but it does handle all the error cases correctly. It wasn't obvious to me how to break it out into a simpler function - but I'm happy to spend some more time cleaning it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, at the very least, I'd probably put all of this in its own method. I think it will be more obvious how to generalize as we add more requests that go to the same subnet. (But it does seem like passing in just a couple variables could mean being able to put most of this into a single shared method like proxy_query_to_subnet(subnet_tx, msg_kind) -> response
.
src/lib.rs
Outdated
use trin_history::HistoryRequestHandler; | ||
use trin_state::StateRequestHandler; | ||
|
||
pub async fn main() -> Result<(), Box<dyn std::error::Error>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the top-level orchestrator, would it be better to move this into main.rs
and remove this lib.rs
file? Based on the motivation, nothing else would need to call the main
function of the orchestrator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess trin root crate is considered binary right now, so it makes sense to move it into main.rs
but I think we can keep lib.rs
file in case if we want to extract some of the code outside of the main file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, I'd prefer removing lib.rs
entirely for now under the assumption that if/when the time comes that it's obvious we want to extract some code from main.rs
then you can always create it later.
As far as the database stuff goes, my assumption is that we would require multiple database instances because we would need to manage the storage capacity for each subnetwork individually. For example, a user may wish to allocate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for thinking about the orchestration logic. I really like the idea of using one main json-rpc handler as an entry point to different network specific handlers.
My concern about the current json-rpc implementation, is that we may need to drop it soon 😄. I'm currently playing with paritytech's json-rpc library and I'm going to implement it in the testing framework branch. I guess next week we will have more clarity if we want to switch to this library or not. (looks promising so far, better error handling, logging and of course less boilerplate code).
I guess we should also think how we are going to orchestrate network specific discv5/overlay protocol instances, talkreq handlers and DBs. This probably also includes updating the CLI parser to handle more than one network args.
src/lib.rs
Outdated
let mut state_handler: Option<StateRequestHandler> = None; | ||
let mut history_handler: Option<HistoryRequestHandler> = None; | ||
|
||
if trin_config.networks.iter().any(|val| val == "history") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can combine this and the following iteration on trin_config.networks
with something like:
for network in trin_config.networks.iter() {
match network {....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did like this idea, but in the end I prefer something more along the lines of what @jacobkaufmann suggested here since it gets rid of initializing empty mutable variables. I wasn't able to devise a way to combine both of your refactors (aka a more explicit iteration and no dead variable instantiation) but if you see something I missed lmk.
trin-core/src/cli.rs
Outdated
#[structopt( | ||
long = "networks", | ||
help = "Comma-separated list of which portal networks to activate", | ||
default_value = "history,state", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would set this as a constant and use default_value(DEFAULT_NETWORKS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether it would be better to instead use activation flags for each desired network. Of course this creates more CLI arguments, and committing to this pattern would require a new CLI argument for each subnetwork. However, it aligns with the mental model I have for a portal client, where the user may toggle each subnetwork on and off, and it simplifies the code that interprets the values for these arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to instead use activation flags for each desired network
I think we can address this in a separate issue if it seems important to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is an interesting question. Imo it's somewhat out of scope for this pr, since it's a relatively simple change that can be implemented as various use cases arise and justify it. I created an issue here to track this conversation - and just going with DEFAULT_NETWORKS
for now.
src/lib.rs
Outdated
portalnet_config.bootnode_enrs | ||
); | ||
|
||
let (mut p2p, events) = PortalnetProtocol::new(portalnet_config).await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to run only one instance of the overlay network protocol and one talkreq handler. I think we should run one instance per network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is where a bigger refactor needs to happen, where PortalnetProtocol
becomes split into StatePortalProtocol::new(discovery, portalnet_config)
and HistoryPortalProtocol::new(discovery, portalnet_config)
, so that discovery is run once and directly used by each subnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this pr is not by any means intended to be conclusive - just laying the basic blueprint for the orchestrator. So I'm only running the base discv5 protocol here. There are still a lot of open questions / implementation that needs to be done with regards to what is the orchestrator responsible for and how does it initialize each subnet. With that in mind, I tried to keep this pr fairly simple - and created an issue to track the conversation and implementation.
src/lib.rs
Outdated
use trin_history::HistoryRequestHandler; | ||
use trin_state::StateRequestHandler; | ||
|
||
pub async fn main() -> Result<(), Box<dyn std::error::Error>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess trin root crate is considered binary right now, so it makes sense to move it into main.rs
but I think we can keep lib.rs
file in case if we want to extract some of the code outside of the main file.
src/lib.rs
Outdated
pub async fn main() -> Result<(), Box<dyn std::error::Error>> { | ||
println!("Launching trin"); | ||
|
||
let trin_config = TrinConfig::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we want to run the different networks on separate UDP ports. If that is the case, we need to extend the CLI parser (StructOpt) to support command line args for more than one network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we want to run the different networks on separate UDP ports.
This doesn't sound right to me. The subnets would all be running off of the same discv5 layer, which is what has opened the UDP port. Then the overlay messages are all happening over talkreq over the shared discv5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking this discussion here
src/lib.rs
Outdated
if trin_config.networks.iter().any(|val| val == "state") { | ||
let (raw_state_tx, state_rx) = mpsc::unbounded_channel::<StateNetworkEndpoint>(); | ||
state_tx = Some(raw_state_tx); | ||
state_handler = match trin_state::initialize(state_rx) { | ||
Ok(val) => Some(val), | ||
Err(msg) => panic!("Error while initializing state network: {:?}", msg), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the following is a cleaner way than to declare the mut
variables above?
if trin_config.networks.iter().any(|val| val == "state") { | |
let (raw_state_tx, state_rx) = mpsc::unbounded_channel::<StateNetworkEndpoint>(); | |
state_tx = Some(raw_state_tx); | |
state_handler = match trin_state::initialize(state_rx) { | |
Ok(val) => Some(val), | |
Err(msg) => panic!("Error while initializing state network: {:?}", msg), | |
} | |
} | |
let (state_tx, state_handler) = if trin_config.networks.iter().any(|val| val == "state") { | |
let (state_tx, state_rx) = mpsc::unbounded_channel::<StateNetworkEndpoint>(); | |
state_handler = match trin_state::initialize(state_rx) { | |
Ok(val) => val, | |
Err(msg) => panic!("Error while initializing state network: {:?}", msg), | |
} | |
(Some(state_tx), Some(state_handler)) | |
} else { None, None }; |
Similarly for history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I liked this one even better:
for network in trin_config.networks.iter() {
match network {....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very possible I'm missing how to accomplish this, but I wasn't able to find a way to use the iter()
strategy @ogenev suggested without instantiating None
mut variables in advance. Imo those mut
variables are grosser than the clunky iteration handling - but I have no strong feelings towards either approach.
On the topic of storage, my thoughts are that we have separate database instances per network, each abstracted by a separate PortalStorage instance (#69) per network. My main reasoning is simplicity. If we have every network's data combined in one DB instance, we'd need to store metadata on each key to denote which network it belongs to. The PortalStorage struct would need to get more complex in order to keep track of everything for every network at the same time....or we could have multiple PortalStorage instances interacting with a single database instance, in which case we need to manage locking etc...debugging storage functionality also gets more complex with everything in one RocksDB. Is there a benefit to combining the databases that I might be missing? I'm only seeing the extra complexity. That said, I have yet to look into the effects of multiple DBs vs 1 DB on disk i/o, so that might be what I'm missing. |
I think most users won't want to manage the storage space of each subnetwork individually (they probably won't even know the internal detail that there are multiple subnetworks). Think of a typical bittorrent user, and how little they probably know about the details of the protocol. Even if we decide to implement this per-network limit, I'm sure we could find other ways than running multiple DBs. So I think we should at least remain open to the idea of piping everything into a single DB (which I expect to have performance benefits due to I/O serialization). |
In theory, that work of swapping json-rpc library should be able to happen after this refactor, no? I would really like to avoid pausing work on this PR. |
Yup, that's exactly the issue. Some client (Erigon?) had to do a big rewrite to a single DB in order to avoid causing big I/O delays by trying to hammer the disk a bunch in parallel from different threads. FWIW, I think we're going to have to deal with locking logic anyway, even within a single subnet. |
Moved the ongoing db discussion here since Imo it's a bit out of scope of this pr. |
Thanks everyone for the feedback! This pr was really intended to remain simple and just layout the basic blueprint for the orchestrator. I did a lot of hand-waving across a bunch of important questions & implementation details, but the goal of this pr is not to solve those all right now. These questions still deserve some discussion imo, and rather than housing them all inside this PR I've created tracking issues...
Imo all of these issues are implementation details and aren't inherently conflicting with the orchestrator blueprint. It establishes a common ground & jsonrpc server between the subnets, and is flexible enough to accommodate whatever we decide for the various open questions. But if you feel differently, please respond why |
06df8b2
to
bda97a9
Compare
Sure, swapping json-rpc shouldn't be a blocker for this PR. |
trin-core/src/cli.rs
Outdated
@@ -9,6 +9,7 @@ use structopt::StructOpt; | |||
const DEFAULT_WEB3_IPC_PATH: &str = "/tmp/trin-jsonrpc.ipc"; | |||
const DEFAULT_WEB3_HTTP_PORT: &str = "8545"; | |||
const DEFAULT_DISCOVERY_PORT: &str = "9000"; | |||
const DEFAULT_SUBNETWORKS: &str = "history,gossip"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const DEFAULT_SUBNETWORKS: &str = "history,gossip"; | |
const DEFAULT_SUBNETWORKS: &str = "history,state"; |
if let Some(handler) = state_handler { | ||
tokio::spawn(handler.handle_client_queries()); | ||
} | ||
if let Some(handler) = history_handler { | ||
tokio::spawn(handler.handle_client_queries()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel like pursuing it, the direction I'd take is something like building a vector of subnet handlers and looping through them to spawn. 👌🏻 if you want to merge without trying it though.
"trin-state", | ||
"trin-history", | ||
"trin-core" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for making trin-core
a dependency and trin-history
and trin-state
as members of workspace? Are trin-core
, trin-history
, and trin-state
packages or crates, making the entire directory as a workspace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall my goal was to allow the CLI API written out in this pr's description. But, it's possible that I didn't choose the best configuration setup here. There's no need to expose trin-core
to the user, since it's all internal code. But, I wanted to expose the ability for users to run trin-history
and trin-state
packages w/o the common trin
orchestrator logic, if that's what they want to do for whatever reason. Did you see something I missed in this current setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha no, that makes total sense. I am just confused on all Cargo's terminology on crates, workspaces, packages, but what you mentioned helps makes things clearer for me.
@@ -219,6 +219,8 @@ fn handle_request( | |||
"result": "trin 0.0.1-alpha", | |||
}) | |||
.to_string()), | |||
"test_historyNetwork" => dispatch_portal_request(obj, portal_tx), | |||
"test_stateNetwork" => dispatch_portal_request(obj, portal_tx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a cleaner way to combine the "test_historyNetwork", "test_stateNetwork", and the one below this since they all call the same function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there is for sure. But, it's not obvious to me currently the best way to clean this up, and I think the optimal refactor will emerge as we add more and more endpoints. I'm partial to leaving as-is for now, and just leaving a comment that refactoring should be handled soon.
bda97a9
to
e1a1603
Compare
This pr is motivated by some questions I had while getting the trin testnet up & running. It's totally possible that I completely misunderstood some design aspects, so please ask questions / point out design flaws made here. This pr is meant to provide more of an organizational blueprint of how we will build out
trin
.Some questions I had about the current trin architecture...
eth_getBalance
,eth_call
, etc.)After some discussion with @carver, we came to the conclusion that there is a need for some orchestrator process to handle coordination.
In this pr, Trin is to act as a orchestrator for the sub-networks (history, gossip, state).
(side note, @carver we discussed building a
trin-client
crate to handle this, which was how I started, but in the end it seemed redundant not to put that logic in the roottrin
project, but I'm open to changing my mind - note thattrin-core
still exists for common logic)trin
:trin-core
: Home for any logic that is shared amongst the sub-networks.Subnetworks (
trin-history
,trin-state
,trin-gossip
)Command line API.
cargo run
=> starts base trin orchestrator and all available networks by defaultcargo run -- --networks history
=> starts base trin orchestrator with just history subnetworkcargo run -p trin-history
=> starts only the history package (i.e. this should be minimally useful, it won't have a jsonrpc server but will connect to the history network) (sidenote: why anyone would want to run just the binary, I'm not really sure, so is this even a use case we want to support?)Poorly drawn picture

Open questions.