While Model Trains

Read data blog posts.
Carefully handpicked.
Presented 3 at a time.

The Simpsons by the Data

Todd W. Schneider

An analysis of the first 27 seasons of The Simpsons, featuring great plots and memes. The analysis covers the most significant side characters, the presence of a pattern of patriarchy, declining TV ratings, and highlights some of the most relevant sentences.

Read it!

Song Lyrics Across the United States

Julia Silge

An analysis of the frequency of US state names in song lyrics of Billboard's Year-End Hot 100 from 1958 to 2015.

Read it!

How much data should you allocate to training and validation?

Francesco Pochetti

To avoid responding with "that's what Andrew NG said" when asked about the reason behind choosing an 80% training and 20% validation split, consider this explanation.

Read it!