featured 3 Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Apr 11, 2024 Headless Language Models: Learning without Predicting with Contrastive Weight Tying Sep 15, 2023 MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling Jun 9, 2023