A Few Useful Things to Know about Machine Learning

Every once in a while I open this paper by prof. Pedro Domingos to review what I know in the machine learning field.

that machine learning researchers and practitioners have learned.

Learning = Representation + Evaluation + Optimization
It’s generalization that counts
Data alone is not enough
Overfitting has many faces; bias (wrong thing), variance (random things)
Intuition fails in high dimensions
Theoretical guarantees are not what they seem (they are for algorithm design)
Feature engineering is the key
More data beats cleverer algorithm (but hits scalability problem; try simple algorithms first)
Learn many models, not just one (model ensembles: bagging, stacking)
Simplicity does not imply accuracy
Representable does not imply learnable
Correlation does not imply causation (observational data vs experimental, predictive variables are not under control)