This repository is a curation of good blog posts and books for Analytics Engineers. It can also be very useful for Data Analysts and Data Scientists.
I really appreciate any contribution. Just make sure to describe the theme and why you found the resource useful.
- SQL
- Python
- Infrastructure
- Analytics Skills
- Data Warehousing
- Data Pipelines
- Starting analytics in a company
- Testing data
- Success Stories
- Organisation
- Data Visualisation
- Marketing and data
- Thinking with data
- Github-Gitlab repo to learn from
- Against ELT
- Other readings lists
- Top bloggers/blog
Definition of the Analytics Engineer: The Analytics Engineer.
SQL has a lot of tips and tricks that take times to know.
- Mode Analytics SQL Guide. Very complete, even intermediate users can learn from this series of tutorials.
- Learning SQL 201: Optimizing Queries, Regardless of Platform By Randy Au. I finally found a complete post on advanced SQL.
Python is a very broad subject. Maybe you can follow this list for more Python focused readings.
- Python for Data Analysis. π Very comprehensive book about using python for data stuff.
- Pandas Cheatsheet I use it everyday!
- Modern pandas. A series of blog posts on intermediate/advanced pandas written by one of the maintainers.
- The Startup Founder's Guide to Analytics. An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up.
- The missing layer of Analytics Stack.
- Choosing a Data Warehouse. A lot of excellent answers on what to choose for your data warehouse.
- Data science for start-ups. You can find some useful information in this free book.
- Designing Data-Intensive Applications π Fascinating read to learn more about databases, protocols etc...
- The Modern Data Stack: Past, Present, and Future A must-read on the last innovations in the data stack.
Comparison of tools by Stephen Levin
- Looker vs Tableau vs Mode. Data Visualisation tools compared. .
- Segment vs Fivetran vs Stitch: Which Data Ingest Should You Use?
- One analyst's guide for going from good to great
- Suceeding as the first data person in a small company/startup. A must read for anyone working in data even in a big company.
- Prioritizing data science work. Too many engineers like building ivory towers. Make sure you don't fall in the trap.
- The beginner guide to data engineering series. Start here if you don't know what is a star schema, Airflow and some basic practices when writing data pipelines.
- Best practices for data modeling. A lot of practical tips on naming, grain, permissions and materialization.
- The Data Warehouse Toolkit by Ralph Kimball. π A classic in Business Intelligence. Some chapters can be gold on modeling your data warehouse.
- Functional Data Engineering β a modern paradigm for batch data processing. You will learn the spirit behind good data pipelines and a well-designed data warehouse.
- The rise of the Data Engineer. Explains recent evolutions of the job and data practices.
- Five principles that will keep your data warehouse organized
- Using Postgres as a data warehouse I wish I read this post earlier. So much wisdom on Postgres.
- For Data Warehouse Performance, One Big Table or Star Schema?. Discussion on an alternative to star schema.
- Functional Data Engineering β a modern paradigm for batch data processing. You will learn the spirit behind good data pipelines and a well-designed data warehouse.
- Maintenable ETL: Tips for Making Your Pipelines Easier to Support and Extend. Best practices to write good ETL.
- The Data Warehouse ETL Toolkit π Once again, very dense book but you can find good ideas.
- Building a data practice from scratch. Very useful for your first weeks as a data person.
- The Startup Founder's Guide to Analytics. An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up.
- Automated Testing In The Modern Data Warehouse. Practical advice to test data. Useful for everyone building data pipelines. Rare to found such a post dealing with non-sexy thing in data.
- Engineer shouldn't write ETL. It's more data science focused but it's a classic.
- Does my startup data team need a data engineer?
- Data Driven Marketing. π Reading some chapters can help you think like a marketer with data driven approach. It's a gem. Didn't find this kind of insights elsewhere.
These books/articles helped me to think better when analysing data.
- Common Data Mistakes to Avoid. Excellent summary of the most common fallacies when analyzing data. Very clear and well-explained.
- Thinking fast and slow. Learning about bias can be super useful. For instance, I didn't have the reflex to think of a base rate anytime I see a figure.
- Fooled by randomness. :book: Nassim Taleb taught so much both professionally and personnaly. In Fooled By Randomness, you will learn about major pitfalls when dealing with data in real life.
- Why you should care about the Nate Silver vs. Nassim Taleb Twitter war. Great chess players learn from high elo games. Great data people learn from debate between data experts.
- Five books every data scientist should read that are not about data science. I have not read them all yet. But these suggestions seems judicious.
- Fundamentals of Data Visualisation. Complete guide to visualisation. Free version online.
I found that reading code helps to know the best practices whether it is Python or SQL.
In Python reading some taps from Singer can teach you a lot.
In dbt/SQL I like to browse a repo open-sourced by Gitlab
The concept of analytics engineering is tightly coupled with the ELT view of data warehousing. It is interesting to learn from the people that would prefer the ETL. Reddit comments on Snowflake super-expensive cost
The GitLab data team also made an excellent list. (close to mine)
Analytics Dispatch by Mode Analytics. Very comprehensive.
I really love Reading in Applied Data Science for a more data science focused view.
Knowing more about programming is an huge asset. For instance Professional Programming list is quite complete.
- Randy Au. You can read almost all his posts there are all very relevant for analytics engineers.
- Locally Optimistic. A blog dedicated to data in organizations.
- Tristan Handy. I also love his newsletter: Data Science Roundup.
- Dbt blog. 90% of the articles are almost must-read.
- Ken Farmer It is healthy to read from those who still prefer the ETL stack.
- Holistics.io About the contemporary practice of business intelligence.
- Locally Optimistic
- Reddit data engineering. ETL, Business Intelligence, Data Science channels are also good.