17 150 286

Travis King

travisking

AI & ML interests

have you heard of generative AI?

Recent Activity

upvoted an article 1 day ago

M2.1: Multilingual and Multi-Task Coding with Strong Generalization

upvoted a paper 7 days ago

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

liked a model 13 days ago

nvidia/NitroGen

View all activity

Organizations

None yet

upvoted an article 1 day ago

Article

M2.1: Multilingual and Multi-Task Coding with Strong Generalization

1 day ago

•

upvoted a paper 7 days ago

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

Paper • 2512.20757 • Published 14 days ago • 16

liked a model 13 days ago

nvidia/NitroGen

Updated 19 days ago • 451

upvoted a paper 15 days ago

Are We on the Right Way to Assessing LLM-as-a-Judge?

Paper • 2512.16041 • Published 20 days ago • 32

upvoted a paper 18 days ago

Hierarchical Dataset Selection for High-Quality Data Sharing

Paper • 2512.10952 • Published 26 days ago • 1

liked a dataset 20 days ago

nvidia/Nemotron-PII

Viewer • Updated 21 days ago • 200k • 1.95k • 49

upvoted 2 papers 22 days ago

Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems

Paper • 2512.11150 • Published 26 days ago • 5

BEAVER: An Efficient Deterministic LLM Verifier

Paper • 2512.05439 • Published Dec 5, 2025 • 35

New activity in mistralai/Devstral-Small-2-24B-Instruct-2512 22 days ago

base model inconsistent with architecture claims

#17 opened 22 days ago by

travisking

upvoted a paper 26 days ago

Towards a Science of Scaling Agent Systems

Paper • 2512.08296 • Published 28 days ago • 14

liked a model 26 days ago

Motif-Technologies/Motif-2-12.7B-Reasoning

Text Generation • 13B • Updated 25 days ago • 645 • 35

liked a Space 29 days ago

Evaluation Guidebook

📝

228

Display benchmark evaluation data for LLMs

liked 2 models about 1 month ago

nvidia/Qwen3-Nemotron-32B-GenRM-Principle

Text Generation • 33B • Updated Oct 30, 2025 • 857 • 11

nvidia/Llama-3.3-Nemotron-70B-Reward-Principle

Text Generation • 71B • Updated Oct 30, 2025 • 64 • 5

upvoted 2 collections about 1 month ago

Skywork-Reward-V2

Collection

Scaling preference data curation to the extreme • 9 items • Updated Jul 4, 2025 • 26

Reward Models 10-2025

Collection

A collection of great reward models for research and production • 7 items • Updated 14 days ago • 12

liked a Space about 1 month ago

JudgeBench Leaderboard

🏆

Generate a leaderboard for evaluating language models

liked a dataset about 1 month ago

nex-agi/agent-sft

Preview • Updated 28 days ago • 420 • 102

upvoted a collection about 1 month ago

Olmo 3 Pre-training

Collection

All artifacts related to Olmo 3 pre-training • 10 items • Updated 14 days ago • 32

liked a dataset about 1 month ago

allenai/dolma3_longmino_mix-100B-1125

Updated about 23 hours ago • 71.7k • 7

Travis King

AI & ML interests

Recent Activity

Organizations

travisking's activity

M2.1: Multilingual and Multi-Task Coding with Strong Generalization

base model inconsistent with architecture claims

Evaluation Guidebook

JudgeBench Leaderboard