Note: This is alpha software
This controller will discover tagged NAT Instances and Routing Tables, filter down to healthy NAT Instances and allocate egress routes through the available NAT Instances for each Routing Table. To ensure only 1 router updates the routes, instances are sorted by LaunchDate and the oldest healthy instance will be considered the leader.
Following tags are expected on both EC2 Instance and Routing Table resources:
Key | Description | Default |
---|---|---|
aws-nat-router/id |
Multiple controller can watch multiple resources | squid |
aws-nat-router/zone |
Used to simplify zone lookup of Instance / rtb | - |
Currently, the router will prefer to allocate the NAT Instance in the same zone as the routing table. If there is no healthy NAT Instance in the same zone, it will allocate to any NAT Instance which has the least routing tables. If there are multiple healthy NAT Instances per zone, it will try to allocate the routing tables equally across all available NAT Instances
aws-nat-router
should run on each NAT Instance, which requires the following rights:
actions = [
"ec2:DescribeInstances",
"ec2:DescribeRouteTables",
"ec2:CreateRoute",
"ec2:ReplaceRoute",
"ec2:ModifyInstanceAttribute", # to disable SourceDestChecks on Instances launched through ASGs
]
A more complete Instance Role setup would look like this:
resource "aws_iam_instance_profile" "router" {
name = "nat-router-role"
role = "${aws_iam_role.router.name}"
}
resource "aws_iam_role" "router" {
name = "nat-router-role"
assume_role_policy = "${data.aws_iam_policy_document.assume_ec2_role.json}"
}
data "aws_iam_policy_document" "assume_ec2_role" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ec2.amazonaws.com"]
}
}
}
resource "aws_iam_role_policy" "ec2_router" {
name = "nat-router-role-ec2"
role = "${aws_iam_role.squid.name}"
policy = "${data.aws_iam_policy_document.ec2_router.json}"
}
data "aws_iam_policy_document" "ec2_router" {
statement {
sid = "1"
actions = [
"ec2:DescribeInstances",
"ec2:DescribeRouteTables",
"ec2:CreateRoute",
"ec2:ReplaceRoute",
"ec2:ModifyInstanceAttribute",
]
resources = [
"*",
]
}
}
Systemd Unit
[Unit]
Description=AWS NAT Router
Documentation=https://github.com/so0k/aws-nat-router
Requires=network-online.target
After=network-online.target
[Service]
# -z: request a file modified later than the given filename modification date (mtime)
ExecStartPre=/usr/bin/curl -sLo /usr/local/bin/aws-nat-router /
-z /usr/local/bin/aws-nat-router /
https://github.com/so0k/aws-nat-router/releases/download/0.1.4/aws-nat-router
ExecStartPre=/usr/bin/chmod +x /usr/local/bin/aws-nat-router
Environment=LOG_LEVEL=INFO
ExecStart=/usr/local/bin/aws-nat-router \
--vpc-id ${vpc_id} \
--cluster-id ${cluster_id} \
--ec2-election \
--timeout 500ms \
--interval 5s
Restart=always
RestartSec=10
# amount of time (seconds) systemd waits after start before marking it as failed
TimeoutStartSec=20
runOnce
implementation:
- Discover Tagged Instances
- Discover Tagged Routing Tables
- Implement TCP HealthCheck
- Filter down to only Healthy NAT Instances
- Implement
PreventSourceDestCheck
- Allocate Routing Tables to Instances
- Update Routing Tables with allocations
- Implement recovery actions (Restart or Terminate unhealthy nodes)
Controller implementation:
- Use AWS secrets from commandline args / env vars or ec2 Role
- Take region / vpc-id / cluster-id arguments for discovery
- Take interval arguments and loop
runOnce
on interval
Deployment:
The controller is meant to run on EC2 Instances, prior to k8s bootstrap, thus we can't use Docker / Kubernetes as a deployment mechanism.
- Add GitHub release to CircleCI
- Add Sample Systemd unit file
based on AWS nat_monitor.sh