These are research papers, white papers, documentation, or blog posts that I have found to be extremely useful in my self-studying ventures. These range from theorhetical concepts to practical day-to-day software/system knowledge. Please submit a PR if you have any suggestions :)
- Operating Systems: Three Easy Pieces
- System Design Primer
- Designing Data-Intensive Applications
- Database Internals
- Distributed Systems
- Distributed Systems YouTube Course by Chris Colohan
- Borg, Omega, and Kubernetes
- Linux CGroups Documentation
- Docker Internals
- Green Threads
- Linux Internals Blog
- eBPF
- Time, Clocks, and the Ordering of Events in a Distributed System (Lamport Clocks)
- Virtual Time and Global States of Distributed Systems (Vector Clocks)
- Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services (CAP)
- The Part-Time Parliament (original Paxos paper)
- Paxos Made Simple
- In Search of an Understandable Consensus Algorithm (Raft)
- MapReduce: Simplified Data Processing on Large Clusters
- The Google File System
- ZooKeeper: Wait-free coordination for Internet-scale systems
- The Design of a Practical System for Fault-tolerant Virtual Machines
- Dynamo: Amazon’s Highly Available Key-value Store
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (Spark)
- Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore (Spinnaker)
- Bitcoin: A Peer-to-Peer Electronic Cash System
- Frangipani: A Scalable Distributed File System
- Scaling Distributed Machine Learning with the Parameter Server
- Kafka: a Distributed Messaging System for Log Processing
- The Chubby lock service for loosely-coupled distributed systems
- Bigtable: A Distributed Storage System for Structured Data
- Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
- Amazon DynamoDB (Usenix ATC 2022)
- What’s Really New with NewSQL?
- Spanner: Google's Globally Distributed Database
- CockroachDB: The Resilient Geo-Distributed SQL Database
- A fast learning algorithm for deep belief nets
- Generative Adversarial Nets
- Attention Is All You Need
- Machine Learning Operations (MLOps): Overview, Definition, and Architecture
- The C10K problem
- NGINX vs. Apache: Our View of a Decade-Old Question
- Basic NGINX
- SREcon: Performance Checklists for SREs 2016