BrachioLab/dist-defense-traces-augmented-taskname-split Viewer • Updated about 22 hours ago • 252k • 8
BrachioLab/dist-defense-traces-augmented-taskname-split Viewer • Updated about 22 hours ago • 252k • 8
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks Paper • 2510.02418 • Published Oct 2, 2025 • 2
Adaptive Evaluations Collection Datasets for our paper, Adaptively profiling models with task elicitation (EMNLP 2025). • 1 item • Updated Sep 20, 2025