language-modeling 1 Headless Language Models: Learning without Predicting with Contrastive Weight Tying Sep 15, 2023