Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper โข 2512.17532 โข Published 7 days ago โข 62
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 Text Generation โข 32B โข Updated 3 days ago โข 182k โข 479
view post Post 2383 NEW: @mistralai released a fantastic family of multimodal models, Ministral 3. You can fine-tune them for free on Colab using TRL โก๏ธ, supporting both SFT and GRPOLink to the notebooks:- SFT: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_ministral3_vl.ipynb- GRPO: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_ministral3_vl.ipynb- TRL and more examples: https://huggingface.co/docs/trl/index See translation 2 replies ยท ๐ฅ 8 8 + Reply
view post Post 1729 Interested in RL training environments?We just released a beginner-friendly walkthrough notebook!Train a model to play Wordle using TRL + OpenEnv (TextArena) + GRPO + vLLM.happy learning! ๐ฑNotebook: https://github.com/huggingface/trl/blob/main/examples/notebooks/openenv_wordle_grpo.ipynbOpenEnv guide in TRL: https://huggingface.co/docs/trl/main/en/openenv See translation ๐ 8 8 + Reply
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper โข 2504.07615 โข Published Apr 10 โข 35