# Site Reliability Engineer guide

📚Collection of books, research papers, videos and articles for mastering Site Reliability Engineer proficiency. 

## Books

### SRE

- [ ] Site Reliability Engineering: [How Google Runs Production Systems](https://sre.google/sre-book/table-of-contents/)
- [ ] Site Reliability Engineering: [The Site Reliability Workbook](https://sre.google/workbook/table-of-contents/)
- [ ] [Building Secure & Reliable Systems](https://static.googleusercontent.com/media/sre.google/en//static/pdf/building_secure_and_reliable_systems.pdf)

### Kubernetes platform and applications

- [ ] Docker up and running
- [ ] Kubernetes Up and Running By Brendan Burns, Kelsey Hightower, Joe Beda 
- [ ] Microservices in Production
- [ ] Designing Data-Intensive Applications
- [ ] Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services - [Free to download](https://azure.microsoft.com/en-us/resources/designing-distributed-systems/en-us/)
- [ ] Software Engineering at Google - [Free to download](https://abseil.io/resources/swe-book)

### Compute, Networking and Storage - theory and practice

- [ ] Modern Operating Systems Tanenbaum, Andrew S.
- [ ] UNIX and Linux System Administration Handbook Nemeth, Evi
- [ ] TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the Unix (R) Domain Protocols Stevens, W. Richard
- [ ] Systems Performance: Enterprise and the Cloud
- [ ] The datacenter as a computer: an introduction to the design of warehouse-scale machines
- [ ] The Practice of System and Network Administration
- [ ] The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems
- [ ] Linux Server Hacks: 100 Industrial-Strength Tips and Tools Flickenger, Rob
- [ ] Web Operations - Keeping the Data On Time

### Programming

- [ ] The Linux Command Line Jr., William E. Shotts
- [ ] Shell Scripting: How to Automate Command Line Tasks Using Bash Scripting and Shell Programming
- [ ] The Go Programming Language Donovan, Alan A. A.
- [ ] Think Python Downey, Allen B. 
- [ ] Programming Pearls Bentley, Jon L.
- [ ] Code Complete 2, Steve McConnell

### Other

- [ ] Time Management for System Administrators

## Research papers

- [ ] [Large-scale cluster management at Google with Borg](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf)
- [ ] [On designing and deploying internet-scale services](https://www.usenix.org/legacy/event/lisa07/tech/full_papers/hamilton/hamilton_html/)
- [ ] [Mesos: a platform for fine-grained resource sharing in the data center](https://cs.stanford.edu/~matei/papers/2011/nsdi_mesos.pdf)
- [ ] [Google: Reliable Cron across the Planet](https://queue.acm.org/detail.cfm?id=2745840)

## Technologies

- [ ] [Kubernetes](http://kubernetes.io)
- [ ] [CNCF landscape](https://landscape.cncf.io/)
- [ ] [Aurora](http://aurora.apache.org/)
- [ ] [Docker](https://docs.docker.com/)
- [ ] [Fluentd](http://www.fluentd.org/)
- [ ] [ElasticSearch](https://www.elastic.co/products/elasticsearch)
- [ ] [Hadoop](http://hadoop.apache.org/)
- [ ] [Mesos](http://mesos.apache.org/)
- [ ] [Kernel Based Virtual Machine](http://www.linux-kvm.org/page/Documents)
- [ ] [Spark](http://spark.apache.org/)
- [ ] [VMWare](http://www.vmware.com/products/vcloud-suite.html)

## SRE best practice

- [ ] [Software engineering at Google](https://github.com/vorozhko/site-reliability-engineer-guide/blob/master/software-engeneering-at-google.pdf)
- [ ] [Keys to SRE by Ben Treynor](https://www.usenix.org/conference/srecon14/technical-sessions/presentation/keys-sre)
- [ ] [How Container Clusters Like Kubernetes Change Operations](https://www.usenix.org/conference/srecon15europe/program/presentation/burns)
- [ ] [10 Years of Crashing Google](https://www.usenix.org/conference/lisa15/conference-program/presentation/krishnan)
- [ ] [Release Engineering Best Practices at Google](https://www.usenix.org/conference/lisa15/conference-program/presentation/mcnutt)
- [ ] [From Zero to Hero: Recommended Practices for Training your Ever-Evolving SRE Teams](https://www.usenix.org/conference/srecon15/program/presentation/widdowson)
- [ ] [Transactional System Administration Is Killing Us and Must be Stopped](https://www.usenix.org/conference/lisa15/conference-program/presentation/limoncelli)
- [ ] [Lessons Learned From Scaling Uber To 2000 Engineers, 1000 Services, And 8000 Git Repositories](http://highscalability.com/blog/2016/10/12/lessons-learned-from-scaling-uber-to-2000-engineers-1000-ser.html)
- [ ] [Netflix: 190 Countries and 5 CORE SREs](https://www.usenix.org/conference/srecon16/program/presentation/horowitz)
- [ ] [Performance Checklists for SREs](https://www.usenix.org/conference/srecon16/program/presentation/gregg)
- [ ] [Notes on SRE book](http://danluu.com/google-sre-book/)
- [ ] [SYSADMIN (Un)Reliability Budgets](https://www.usenix.org/system/files/login/articles/login_aug15_06_roth.pdf)

## Trainings
- [ ] [Google cloud code labs](https://codelabs.developers.google.com/?cat=Cloud)
- [ ] [Google cloud architect certification](https://cloud.google.com/certification/cloud-architect)
- [ ] [Google SRE resources](https://landing.google.com/sre/resources.html)

## Conferences

- [ ] [USENIX SRE conferences](https://www.usenix.org/srecon)
- [ ] [Kubecon and Cloud Native](https://www.cncf.io/events/)
- [ ] [PromCon](https://promcon.io/2021-online/)
- [ ] [GrafanaCon](https://grafana.com/about/events/)
- [ ] [DockerCon](https://www.docker.com/dockercon-live/2021)
- [ ] [HashiConf](https://hashiconf.com/)
- [ ] [DevOpscon](https://devopscon.io)





