Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Enhance Raft Cluster Management with Health Checks, Dynamic Peer Management, and Security #642

Open
4 of 16 tasks
sinadarbouy opened this issue Dec 20, 2024 · 1 comment
Open
4 of 16 tasks
Assignees
Labels
epic To be broken down into multiple tasks
Milestone

Comments

@sinadarbouy
Copy link
Collaborator

sinadarbouy commented Dec 20, 2024

Description:

Our current Raft implementation needs improvements. The following features need to be implemented:

  1. Raft Health Check Integration
    • Add Raft-specific health checks to the existing health check endpoint
    • Include leader election status in health checks
    • Add cluster state validation in health checks
    • Expose metrics about Raft cluster health
  2. Dynamic Peer Management
    • Implement gRPC endpoints for peer management:
      • AddPeer endpoint for adding new nodes to the cluster
      • RemovePeer endpoint for graceful node removal
      • Status endpoint to get current cluster membership
    • Add validation to ensure only leader nodes can modify cluster membership
    • Implement retry mechanism for failed peer additions
    • Add logging and monitoring for peer management operations
  3. Scale Management
    • Implement automated peer discovery during scale-up
    • Add graceful shutdown procedure during scale-down
  4. Security Improvements
    • Implement mTLS for gRPC communication between nodes
    • Implement token-based authentication for cluster management operations
    • Add audit logging for all cluster membership changes

Technical Considerations

  • The health check should indicate if the node is part of a stable cluster
  • Only the leader should be able to modify cluster membership
  • Authentication should be required for all cluster management operations
  • Scale operations should maintain cluster consistency
@mostafa
Copy link
Member

mostafa commented Dec 22, 2024

@sinadarbouy All the raft functions in the Raft library has a WithLibrary variant that can be used to pass a custom logger to prevent separate log formats in the output. We also have a hc_log_adapter interface that can translate between hclog and zerolog, which is also used in the plugins.

@mostafa mostafa moved this from ✨ New to 📋 Backlog in GatewayD Core Public Roadmap Dec 27, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
epic To be broken down into multiple tasks
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants