Post

Improving Representations for Language Modeling (PhD thesis)

HAL

This is my PhD thesis, defended at Sorbonne Université in 2024 and conducted in Inria’s ALMAnaCH team under the supervision of Benoît Sagot and Éric de la Clergerie.

The manuscript explores high-level geometric properties of the representations extracted by language models (anisotropy, the softmax bottleneck, scaling laws of internal features) and proposes methods that leverage these properties to improve training (e.g. headless language models, contrastive weight tying) and to better understand the limitations of current architectures.

The full manuscript is available on HAL: tel-04994414.

1
2
3
4
5
6
@phdthesis{godey2024improving,
  title={Improving Representations for Language Modeling},
  author={Godey, Nathan},
  year={2024},
  school={Sorbonne Université}
}

This work was funded by the PRAIRIE institute as part of a PhD contract at Inria Paris and Sorbonne Université.

This post is licensed under CC BY 4.0 by the author.