Publications 2024 11 Apr Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck 29 Feb On the Scaling Laws of Geographical Representation in Language Models 22 Jan Anisotropy Is Inherent to Self-Attention in Transformers2023 15 Sep Headless Language Models: Learning without Predicting with Contrastive Weight Tying 09 Jun MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling