Predict human preference to LLM responses.
Binfeng Xu
billxbf
AI & ML interests
evolving back to apes
Recent Activity
updated a model 6 days ago
billxbf/qwen3.5-4b-codex-polar-step72 published a model 6 days ago
billxbf/qwen3.5-4b-codex-polar-step72 upvoted a paper about 2 months ago
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents