softmax 3 Lost in Backpropagation: The LM Head is a Gradient Bottleneck Mar 10, 2026 Improving Representations for Language Modeling (PhD thesis) Sep 15, 2024 Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Apr 11, 2024