arxiv:2410.19427
Yige Li
Liyige
AI & ML interests
Trustworthy Machine Learning
Recent Activity
upvoted a paper about 16 hours ago
AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents upvoted a paper 12 days ago
Internal Safety Collapse in Frontier Large Language Models new activity about 1 year ago
BackdoorLLM/Backdoored_Dataset:[bot] Conversion to Parquet