Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat(controller): routes, handling function backbone and errors management #16

Merged
merged 9 commits into from
Aug 28, 2023

Conversation

thomas-mauran
Copy link
Contributor

routes, handling function backbone and errors management

The goal here is to provide the backbone of the rest API to then only have to implement it.
There are some TODOs in the code to split the work efficiently with my team.

Featuring

  • error handling using thiserror and anyhow
  • the following routes and their handling function definition

Routes

GET workloads/
POST workloads/
GET workloads/:id
DELETE workloads/:id

GET instances/
POST instances/
GET instances/:id
DELETE instances/:id

Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>
Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>
@thomas-mauran
Copy link
Contributor Author

ping @sameo (sorry we can't add you as reviewer)

Copy link
Contributor

@sameo sameo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits. Overall it looks good, but please:

  1. Squash your cleanup/fixes/linting commits (e.g. 7568cfe)
  2. Add some actual commit message (not just a subject) to your commits. They are doing some meaningful changes, and the commit messages should describe them.

Comment on lines 19 to 25
pub struct SchedulerService {}
pub struct MySchedulingService {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I much prefer SchedulerServicethan MySchedulingService tbh. Even simply Schedulerwould be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pub enum WorkloadRegistry {
Docker,
Podman,
Ghcr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is Ghcr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ghcr is the github registry it is very useful to keep an image linked to a repo

status_code: 0,
message: "Workload 2 is running".to_string(),
status_code: 1,
message: "Your workload is RUNNING".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it's not really useful to pass a string message back as the status. The status should have all necessary information for the caller/receiver to build any logging message it wants to display. Something to be fixed later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this message is based on the protobug from the scheduler team we will have to check with them to remove it there

Comment on lines 67 to 68
// Attendre 10 secondes avant le prochain envoi
sleep(Duration::from_secs(10)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? And also, please don't use french comments if the rest of the code is using english.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done removed since it was not useful

Comment on lines 84 to 87
// // Initialize logger
// SETUP LOGGING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing perfectly fine comments to new, upper case ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Spawn the HTTP server as a tokio task
let http_thread = task::spawn(async move {
info!("Running http here: {}", http_addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info!("Running http here: {}", http_addr);
log!("HTTP server running at {}", http_addr);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 62 to 64
cpu: Some(1_i32),
memory: Some(1_i32),
disk: Some(1_i32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are those default values coming from? If you want to build a default Resourcesinstance, it should implement the Defaulttrait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

thomas-mauran and others added 4 commits August 27, 2023 16:10
Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>
Signed-off-by: Charley <charley.geoffroy@etu.umontpellier.fr>
Signed-off-by: Charley <charley.geoffroy@etu.umontpellier.fr>
Signed-off-by: Mauran <thomas.mauran@etu.umontpellier.fr>
@thomas-mauran
Copy link
Contributor Author

Some nits. Overall it looks good, but please:

1. Squash your cleanup/fixes/linting commits (e.g. [7568cfe](https://github.com/dev-sys-do/orka/commit/7568cfe690a92120acd528e449be7f09760f6f76))

2. Add some actual commit message (not just a subject) to your commits. They are doing some meaningful changes, and the commit messages should describe them.

done

Copy link
Contributor

@sameo sameo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase your PR on top of the main one and we'll merge it.

Kuruyia and others added 3 commits August 27, 2023 19:05
This follows internal discussion and ideas on dev-sys-do#13.

Logging on the scheduler is now managed by the `tracing` library, instead of `env_logger`, which allows for richer and better traceability by:

- adding the concept of events, which are akin to log lines
- adding the concept of spans, which are context-bound time frames in which multiple events happen
- normalize how variables are emitted in events

In that regard, the trace layer from `tower-http` has been added to the gRPC server as a middleware, which will:

- emit an event on each request and response (in the `DEBUG` log level), with info such as the status code or latency
- automatically set up a span for our own events

Signed-off-by: Kuruyia <8174691+kuruyia@users.noreply.github.com>
This adds the implementation for the different services referenced in the `agent.proto` to the `orka-scheduler` crate.

# What changes
The following changes have been made to the `agent.proto` file:
* the `ConnectionResponse` has been dropped in favour of `Empty`, using the gRPC status code to communicate the result instead
* a `DisconnectionNotice` has been added with an `id` field to identify which Agent is disconnecting
* an `id` field has been added to the `NodeStatus` arguments to identify which Agent is being updated

Signed-off-by: BioTheWolff <47079795+BioTheWolff@users.noreply.github.com>
This adds the implementation for the different services referenced in the `agent.proto` to the `orka-scheduler` crate.

# Implementations
## Managers
The Scheduler now features a Node Agent Manager, responsible for the management and storage of the registered Agents. Custom errors have been implemented for better readability.

Each Node Agent contains the following members:
* a heartbeat, in order for the Scheduler to know which Agents are alive and responding;
* CPU metrics, which for now contain only the CPU load percentage;
* Memory metrics, which include total memory, and available memory, both in bytes.

## Node agent lifecycle
### Joining the cluster
The Node Agent can connect to the Scheduler, then get registered against the Scheduler by giving its ID. If an Agent is already registered with this ID, the response status will be `ALREADY_EXISTS` (status code `6`).

The Scheduler is not responsible for Agent ID generation. However, we do advise:
* to generate the ID for each Agent process, rather than for each Node the Agent is installed onto;
* to generate an ID using an already-established format that will lead to a unique ID, such as UUIDv4.

**Note**: Agents are trusted by default because TLS is used for authenticating and securing the connection. If a user disables TLS, it is their responsibility to ensure that Agents can be trusted.

### Leaving the cluster
Upon a graceful exit, the Agent must send a disconnection notice to the Scheduler. This notice is always answered with an empty response and an OK status code, and the Agent can ignore this response.

## Node agent status updates
Right after its registration, the Agent must call the node status update endpoint. This will open a stream that the Agent must try to keep open for its entire lifetime.

Periodically, the Agent sends a status update, containing the metrics of the Node it is installed on. Those are listed in the [Managers section](#managers). This also allows the Scheduler to ensure the Agent is still alive and well.

Should the endpoint be called by an unregistered or unknown Agent, the stream will be closed with a response status of `NOT_FOUND` (status code `5`).

Signed-off-by: BioTheWolff <47079795+BioTheWolff@users.noreply.github.com>
@thomas-mauran
Copy link
Contributor Author

Please rebase your PR on top of the main one and we'll merge it.

done let me know if i did the right thing

@sameo
Copy link
Contributor

sameo commented Aug 28, 2023

Please rebase your PR on top of the main one and we'll merge it.

done let me know if i did the right thing

You did, thanks.

@sameo sameo merged commit 4b43991 into dev-sys-do:main Aug 28, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants