While Model Trains

Read data blog posts.
Carefully handpicked.
Presented 3 at a time.

Why Correlation Usually ≠ Causation

Gwern

"Despite this admonition, people are overconfident in claiming correlations to support favored causal interpretations and are surprised by the results of randomized experiments, suggesting that they are biased & systematically underestimate the prevalence of confounds / common-causation."

Read it!

Automated text extraction at Bolt

Francesco Pochetti

A comprehensive explanation of the system implemented at Bolt to extract text from ID documents.

Read it!

Understanding the beta distribution (using baseball statistics)

David Robinson

"The beta distribution is best for representing a probabilistic distribution of probabilities- the case where we don’t know what a probability is in advance, but we have some reasonable guesses."

Read it!