-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat(controller): routes, handling function backbone and errors management #16
Conversation
Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>
Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>
ping @sameo (sorry we can't add you as reviewer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits. Overall it looks good, but please:
- Squash your cleanup/fixes/linting commits (e.g. 7568cfe)
- Add some actual commit message (not just a subject) to your commits. They are doing some meaningful changes, and the commit messages should describe them.
controller/src/main.rs
Outdated
pub struct SchedulerService {} | ||
pub struct MySchedulingService {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I much prefer SchedulerService
than MySchedulingService
tbh. Even simply Scheduler
would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pub enum WorkloadRegistry { | ||
Docker, | ||
Podman, | ||
Ghcr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is Ghcr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ghcr is the github registry it is very useful to keep an image linked to a repo
status_code: 0, | ||
message: "Workload 2 is running".to_string(), | ||
status_code: 1, | ||
message: "Your workload is RUNNING".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, it's not really useful to pass a string message back as the status. The status should have all necessary information for the caller/receiver to build any logging message it wants to display. Something to be fixed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this message is based on the protobug from the scheduler team we will have to check with them to remove it there
controller/src/main.rs
Outdated
// Attendre 10 secondes avant le prochain envoi | ||
sleep(Duration::from_secs(10)).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? And also, please don't use french comments if the rest of the code is using english.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done removed since it was not useful
controller/src/main.rs
Outdated
// // Initialize logger | ||
// SETUP LOGGING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing perfectly fine comments to new, upper case ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
controller/src/main.rs
Outdated
|
||
// Spawn the HTTP server as a tokio task | ||
let http_thread = task::spawn(async move { | ||
info!("Running http here: {}", http_addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info!("Running http here: {}", http_addr); | |
log!("HTTP server running at {}", http_addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
controller/src/routes/workloads.rs
Outdated
cpu: Some(1_i32), | ||
memory: Some(1_i32), | ||
disk: Some(1_i32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are those default values coming from? If you want to build a default Resources
instance, it should implement the Default
trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>
Signed-off-by: Charley <charley.geoffroy@etu.umontpellier.fr>
Signed-off-by: Charley <charley.geoffroy@etu.umontpellier.fr>
Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase your PR on top of the main one and we'll merge it.
This follows internal discussion and ideas on dev-sys-do#13. Logging on the scheduler is now managed by the `tracing` library, instead of `env_logger`, which allows for richer and better traceability by: - adding the concept of events, which are akin to log lines - adding the concept of spans, which are context-bound time frames in which multiple events happen - normalize how variables are emitted in events In that regard, the trace layer from `tower-http` has been added to the gRPC server as a middleware, which will: - emit an event on each request and response (in the `DEBUG` log level), with info such as the status code or latency - automatically set up a span for our own events Signed-off-by: Kuruyia <8174691+kuruyia@users.noreply.github.com>
This adds the implementation for the different services referenced in the `agent.proto` to the `orka-scheduler` crate. # What changes The following changes have been made to the `agent.proto` file: * the `ConnectionResponse` has been dropped in favour of `Empty`, using the gRPC status code to communicate the result instead * a `DisconnectionNotice` has been added with an `id` field to identify which Agent is disconnecting * an `id` field has been added to the `NodeStatus` arguments to identify which Agent is being updated Signed-off-by: BioTheWolff <47079795+BioTheWolff@users.noreply.github.com>
This adds the implementation for the different services referenced in the `agent.proto` to the `orka-scheduler` crate. # Implementations ## Managers The Scheduler now features a Node Agent Manager, responsible for the management and storage of the registered Agents. Custom errors have been implemented for better readability. Each Node Agent contains the following members: * a heartbeat, in order for the Scheduler to know which Agents are alive and responding; * CPU metrics, which for now contain only the CPU load percentage; * Memory metrics, which include total memory, and available memory, both in bytes. ## Node agent lifecycle ### Joining the cluster The Node Agent can connect to the Scheduler, then get registered against the Scheduler by giving its ID. If an Agent is already registered with this ID, the response status will be `ALREADY_EXISTS` (status code `6`). The Scheduler is not responsible for Agent ID generation. However, we do advise: * to generate the ID for each Agent process, rather than for each Node the Agent is installed onto; * to generate an ID using an already-established format that will lead to a unique ID, such as UUIDv4. **Note**: Agents are trusted by default because TLS is used for authenticating and securing the connection. If a user disables TLS, it is their responsibility to ensure that Agents can be trusted. ### Leaving the cluster Upon a graceful exit, the Agent must send a disconnection notice to the Scheduler. This notice is always answered with an empty response and an OK status code, and the Agent can ignore this response. ## Node agent status updates Right after its registration, the Agent must call the node status update endpoint. This will open a stream that the Agent must try to keep open for its entire lifetime. Periodically, the Agent sends a status update, containing the metrics of the Node it is installed on. Those are listed in the [Managers section](#managers). This also allows the Scheduler to ensure the Agent is still alive and well. Should the endpoint be called by an unregistered or unknown Agent, the stream will be closed with a response status of `NOT_FOUND` (status code `5`). Signed-off-by: BioTheWolff <47079795+BioTheWolff@users.noreply.github.com>
done let me know if i did the right thing |
You did, thanks. |
routes, handling function backbone and errors management
The goal here is to provide the backbone of the rest API to then only have to implement it.
There are some TODOs in the code to split the work efficiently with my team.
Featuring
thiserror
andanyhow
Routes
GET workloads/
POST workloads/
GET workloads/:id
DELETE workloads/:id
GET instances/
POST instances/
GET instances/:id
DELETE instances/:id