anisotropy 4 Improving Representations for Language Modeling (PhD thesis) Sep 15, 2024 Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Apr 11, 2024 Anisotropy Is Inherent to Self-Attention in Transformers Jan 22, 2024 How word frequency affects language models Mar 13, 2023