Publications 2026 10 Mar Lost in Backpropagation: The LM Head is a Gradient Bottleneck2025 29 Oct Gaperon: A Peppered English-French Generative Language Model Suite 25 Jun Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content 04 Mar Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression2024 15 Sep Improving Representations for Language Modeling (PhD thesis) 11 Apr Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck 29 Feb On the Scaling Laws of Geographical Representation in Language Models 22 Jan Anisotropy Is Inherent to Self-Attention in Transformers2023 15 Sep Headless Language Models: Learning without Predicting with Contrastive Weight Tying 09 Jun MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling