featured 3

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Apr 11, 2024
Headless Language Models: Learning without Predicting with Contrastive Weight Tying Sep 15, 2023
MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling Jun 9, 2023

Trending Tags

thesis anisotropy frequency geographical language-modeling representation softmax software tokenization