feat(controller): routes, handling function backbone and errors management #16

thomas-mauran · 2023-08-22T16:36:27Z

routes, handling function backbone and errors management

The goal here is to provide the backbone of the rest API to then only have to implement it.
There are some TODOs in the code to split the work efficiently with my team.

Featuring

error handling using thiserror and anyhow
the following routes and their handling function definition

Routes

GET workloads/
POST workloads/
GET workloads/:id
DELETE workloads/:id

GET instances/
POST instances/
GET instances/:id
DELETE instances/:id

Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>

thomas-mauran · 2023-08-26T16:55:29Z

ping @sameo (sorry we can't add you as reviewer)

sameo

Some nits. Overall it looks good, but please:

Squash your cleanup/fixes/linting commits (e.g. 7568cfe)
Add some actual commit message (not just a subject) to your commits. They are doing some meaningful changes, and the commit messages should describe them.

sameo · 2023-08-27T08:59:44Z

controller/src/main.rs

-pub struct SchedulerService {}
+pub struct MySchedulingService {}


I much prefer SchedulerServicethan MySchedulingService tbh. Even simply Schedulerwould be fine.

sameo · 2023-08-27T09:01:47Z

controller/src/types/workload_request.rs

+pub enum WorkloadRegistry {
+    Docker,
+    Podman,
+    Ghcr,


What is Ghcr?

Ghcr is the github registry it is very useful to keep an image linked to a repo

sameo · 2023-08-27T09:04:56Z

controller/src/main.rs

-                    status_code: 0,
-                    message: "Workload 2 is running".to_string(),
+                    status_code: 1,
+                    message: "Your workload is RUNNING".to_string(),


FWIW, it's not really useful to pass a string message back as the status. The status should have all necessary information for the caller/receiver to build any logging message it wants to display. Something to be fixed later.

this message is based on the protobug from the scheduler team we will have to check with them to remove it there

sameo · 2023-08-27T09:06:00Z

controller/src/main.rs

+                // Attendre 10 secondes avant le prochain envoi
+                sleep(Duration::from_secs(10)).await;


Why? And also, please don't use french comments if the rest of the code is using english.

done removed since it was not useful

sameo · 2023-08-27T09:06:35Z

controller/src/main.rs

-    // // Initialize logger
+    // SETUP LOGGING


Why are we changing perfectly fine comments to new, upper case ones?

sameo · 2023-08-27T09:07:26Z

controller/src/main.rs


    // Spawn the HTTP server as a tokio task
    let http_thread = task::spawn(async move {
+        info!("Running http here: {}", http_addr);


Suggested change

info!("Running http here: {}", http_addr);

log!("HTTP server running at {}", http_addr);

sameo · 2023-08-27T09:08:27Z

controller/src/routes/workloads.rs

+            cpu: Some(1_i32),
+            memory: Some(1_i32),
+            disk: Some(1_i32),


Where are those default values coming from? If you want to build a default Resourcesinstance, it should implement the Defaulttrait.

Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>

Signed-off-by: Charley <charley.geoffroy@etu.umontpellier.fr>

Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>

thomas-mauran · 2023-08-27T14:12:10Z

Some nits. Overall it looks good, but please:

1. Squash your cleanup/fixes/linting commits (e.g. [7568cfe](https://github.com/dev-sys-do/orka/commit/7568cfe690a92120acd528e449be7f09760f6f76))

2. Add some actual commit message (not just a subject) to your commits. They are doing some meaningful changes, and the commit messages should describe them.

done

sameo

Please rebase your PR on top of the main one and we'll merge it.

This follows internal discussion and ideas on dev-sys-do#13. Logging on the scheduler is now managed by the `tracing` library, instead of `env_logger`, which allows for richer and better traceability by: - adding the concept of events, which are akin to log lines - adding the concept of spans, which are context-bound time frames in which multiple events happen - normalize how variables are emitted in events In that regard, the trace layer from `tower-http` has been added to the gRPC server as a middleware, which will: - emit an event on each request and response (in the `DEBUG` log level), with info such as the status code or latency - automatically set up a span for our own events Signed-off-by: Kuruyia <8174691+kuruyia@users.noreply.github.com>

This adds the implementation for the different services referenced in the `agent.proto` to the `orka-scheduler` crate. # What changes The following changes have been made to the `agent.proto` file: * the `ConnectionResponse` has been dropped in favour of `Empty`, using the gRPC status code to communicate the result instead * a `DisconnectionNotice` has been added with an `id` field to identify which Agent is disconnecting * an `id` field has been added to the `NodeStatus` arguments to identify which Agent is being updated Signed-off-by: BioTheWolff <47079795+BioTheWolff@users.noreply.github.com>

This adds the implementation for the different services referenced in the `agent.proto` to the `orka-scheduler` crate. # Implementations ## Managers The Scheduler now features a Node Agent Manager, responsible for the management and storage of the registered Agents. Custom errors have been implemented for better readability. Each Node Agent contains the following members: * a heartbeat, in order for the Scheduler to know which Agents are alive and responding; * CPU metrics, which for now contain only the CPU load percentage; * Memory metrics, which include total memory, and available memory, both in bytes. ## Node agent lifecycle ### Joining the cluster The Node Agent can connect to the Scheduler, then get registered against the Scheduler by giving its ID. If an Agent is already registered with this ID, the response status will be `ALREADY_EXISTS` (status code `6`). The Scheduler is not responsible for Agent ID generation. However, we do advise: * to generate the ID for each Agent process, rather than for each Node the Agent is installed onto; * to generate an ID using an already-established format that will lead to a unique ID, such as UUIDv4. **Note**: Agents are trusted by default because TLS is used for authenticating and securing the connection. If a user disables TLS, it is their responsibility to ensure that Agents can be trusted. ### Leaving the cluster Upon a graceful exit, the Agent must send a disconnection notice to the Scheduler. This notice is always answered with an empty response and an OK status code, and the Agent can ignore this response. ## Node agent status updates Right after its registration, the Agent must call the node status update endpoint. This will open a stream that the Agent must try to keep open for its entire lifetime. Periodically, the Agent sends a status update, containing the metrics of the Node it is installed on. Those are listed in the [Managers section](#managers). This also allows the Scheduler to ensure the Agent is still alive and well. Should the endpoint be called by an unregistered or unknown Agent, the stream will be closed with a response status of `NOT_FOUND` (status code `5`). Signed-off-by: BioTheWolff <47079795+BioTheWolff@users.noreply.github.com>

thomas-mauran · 2023-08-27T17:06:47Z

Please rebase your PR on top of the main one and we'll merge it.

done let me know if i did the right thing

sameo · 2023-08-28T05:45:50Z

Please rebase your PR on top of the main one and we'll merge it.

done let me know if i did the right thing

You did, thanks.

thomas-mauran added 2 commits August 20, 2023 10:55

feat(workload): setup workload function deserialize

3f37b40

Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>

refactor(post_workload): proper error and files refactor

4cbfacc

Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>

sameo reviewed Aug 27, 2023

View reviewed changes

thomas-mauran and others added 4 commits August 27, 2023 16:10

feat(routes): setting up routes and handling functions

d445b37

Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>

feat(subtypes): add sub-types to workload struct

0121544

Signed-off-by: Charley <charley.geoffroy@etu.umontpellier.fr>

improvement(workload): add info log

efc7050

Signed-off-by: Charley <charley.geoffroy@etu.umontpellier.fr>

fix(workloadKind): remove baremetal

5136fa8

Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>

thomas-mauran force-pushed the main branch from ffe84db to 5136fa8 Compare August 27, 2023 14:11

sameo approved these changes Aug 27, 2023

View reviewed changes

Kuruyia and others added 3 commits August 27, 2023 19:05

sameo merged commit 4b43991 into dev-sys-do:main Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(controller): routes, handling function backbone and errors management #16

feat(controller): routes, handling function backbone and errors management #16

thomas-mauran commented Aug 22, 2023

thomas-mauran commented Aug 26, 2023

sameo left a comment

sameo Aug 27, 2023

thomas-mauran Aug 27, 2023

sameo Aug 27, 2023

thomas-mauran Aug 27, 2023

sameo Aug 27, 2023

thomas-mauran Aug 27, 2023

sameo Aug 27, 2023

thomas-mauran Aug 27, 2023

sameo Aug 27, 2023

thomas-mauran Aug 27, 2023

sameo Aug 27, 2023

thomas-mauran Aug 27, 2023

sameo Aug 27, 2023

thomas-mauran Aug 27, 2023

thomas-mauran commented Aug 27, 2023

sameo left a comment

thomas-mauran commented Aug 27, 2023

sameo commented Aug 28, 2023

		pub struct SchedulerService {}
		pub struct MySchedulingService {}

		// Attendre 10 secondes avant le prochain envoi
		sleep(Duration::from_secs(10)).await;

	info!("Running http here: {}", http_addr);
	log!("HTTP server running at {}", http_addr);

feat(controller): routes, handling function backbone and errors management #16

feat(controller): routes, handling function backbone and errors management #16

Conversation

thomas-mauran commented Aug 22, 2023

routes, handling function backbone and errors management

Featuring

Routes

thomas-mauran commented Aug 26, 2023

sameo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomas-mauran commented Aug 27, 2023

sameo left a comment

Choose a reason for hiding this comment

thomas-mauran commented Aug 27, 2023

sameo commented Aug 28, 2023