Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Paper • 2410.04612 • Published Oct 6, 2024
Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion Paper • 2406.11196 • Published Jun 17, 2024 • 8
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding Paper • 2402.05109 • Published Feb 7, 2024 • 2
Improving Language Models with Advantage-based Offline Policy Gradients Paper • 2305.14718 • Published May 24, 2023 • 2
REBEL: Reinforcement Learning via Regressing Relative Rewards Paper • 2404.16767 • Published Apr 25, 2024 • 2
Provable Reward-Agnostic Preference-Based Reinforcement Learning Paper • 2305.18505 • Published May 29, 2023