publications 5

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Apr 11, 2024
On the Scaling Laws of Geographical Representation in Language Models Feb 29, 2024
Anisotropy Is Inherent to Self-Attention in Transformers Jan 22, 2024
Headless Language Models: Learning without Predicting with Contrastive Weight Tying Sep 15, 2023
MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling Jun 9, 2023

Trending Tags

thesis anisotropy frequency geographical language-modeling representation softmax software tokenization