While Model Trains

Read data blog posts.
Carefully handpicked.
Presented 3 at a time.

Writing Robust Tests for Data & Machine Learning Pipelines

Eugene Yan

An in-depth analysis of why certain types of tests break more frequently than others, along with suggestions for creating more robust pipeline tests.

Read it!

Why using SQL before using Pandas?

Oleg Żero

An explanation with examples illustrating the motivation to use SQL for fetching data from a database instead of directly loading the data into Pandas.

Read it!

The Simpsons by the Data

Todd W. Schneider

An analysis of the first 27 seasons of The Simpsons, featuring great plots and memes. The analysis covers the most significant side characters, the presence of a pattern of patriarchy, declining TV ratings, and highlights some of the most relevant sentences.

Read it!