Mann Patel's picture

Mann Patel

manncodes

·

AI & ML interests

NLP, Mech Interp, Reasoning, MLSystems

Recent Activity

upvoted a paper about 18 hours ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

upvoted a paper about 18 hours ago

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

upvoted a paper 10 days ago

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

View all activity

Organizations

None yet

upvoted 2 papers about 18 hours ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published 2 days ago • 113

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Paper • 2512.20578 • Published 18 days ago • 68

upvoted a paper 10 days ago

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

Paper • 2512.04220 • Published Dec 3, 2025 • 13

upvoted a paper 16 days ago

When Reasoning Meets Its Laws

Paper • 2512.17901 • Published 22 days ago • 56

upvoted a paper 21 days ago

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Paper • 2512.13607 • Published 26 days ago • 30

upvoted an article 25 days ago

Article

Apriel-1.6-15b-Thinker: Cost-efficient Frontier Multimodal Performance

Dec 9, 2025

•

82

upvoted a collection 26 days ago

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano v3. • 7 items • Updated 18 days ago • 56

upvoted a collection about 1 month ago

Tiny-A2D

Small diffusion language models adapted from AR models • 4 items • Updated Dec 6, 2025 • 13

upvoted a paper about 1 month ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 96

upvoted 2 papers about 2 months ago

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Paper • 2505.11475 • Published May 16, 2025 • 4

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published Jul 2, 2025 • 56

upvoted a collection 3 months ago

Apertus LLM

Democratizing Open and Compliant LLMs for Global Language Environments: 8B and 70B open-data open-weights models, multilingual in >1000 languages • 4 items • Updated Oct 1, 2025 • 320

upvoted a paper 3 months ago

Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1, 2025 • 119

upvoted an article 3 months ago

Article

PipelineRL

Apr 25, 2025

•

43

upvoted a collection 4 months ago

— Long-context post-training 🧶 —

Resources for post-training LLMs with long-context samples • 5 items • Updated Sep 14, 2025 • 6

upvoted 4 papers 5 months ago

Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26, 2025 • 72

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

Paper • 2508.10975 • Published Aug 14, 2025 • 60

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 151

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 14

upvoted an article 6 months ago

Article

Everything About Long Context Fine-tuning

May 10, 2024

•

53