data-curation 2 Gaperon: A Peppered English-French Generative Language Model Suite Oct 29, 2025 Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content Jun 25, 2025