EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge Paper โข 2601.09142 โข Published 10 days ago โข 9 โข 3
DramaBench: A Six-Dimensional Evaluation Framework for Drama Script Continuation Paper โข 2512.19012 โข Published Dec 22, 2025 โข 17 โข 4
DramaBench: A Six-Dimensional Evaluation Framework for Drama Script Continuation Paper โข 2512.19012 โข Published Dec 22, 2025 โข 17 โข 4