Post

Chinchilla Scaling Laws Calculator

Simple Chinchilla Calculator

How many tokens should we train with?

If you look for material about scaling laws online, you will either find rather technical material close to the original paper, or very vague simplifications. I have not found something like the following graph, which I think is (one of) the most useful takeaways of the paper: