Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published 7 days ago • 27
Diversity or Precision? A Deep Dive into Next Token Prediction Paper • 2512.22955 • Published 9 days ago • 5
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient Paper • 2509.26313 • Published Sep 30, 2025 • 4
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks Paper • 2401.02731 • Published Jan 5, 2024 • 3
GroveMoE Collection GroveMoE is an open-source family of large language models developed by the AGI Center, Ant Research Institute. • 4 items • Updated 13 days ago • 7
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts Paper • 2508.07785 • Published Aug 11, 2025 • 28
Cosmos Collection ⚠️ This collection is archived. 👉 https://huggingface.co/collections/nvidia/nvidia-cosmos-2 • 31 items • Updated 1 day ago • 299
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated 14 days ago • 96