This code-base contains scripts that have been helpful for me to work on multiple projects:
-
Video-on-Demand: I created a multi-zone video server using custom managed K8s cluster and AWS load balancers.
-
ML Training Infra: Some scripts I found helpful while setting up distributed (DDP and FSDP) training job.