AI & ML interests
None defined yet.
Recent Activity
auditing-agents/rm_sycophancy_redteam_dpo
Viewer
• Updated
• 3.57k • 45
auditing-agents/transcripts_for_contextual_optimism
Viewer
• Updated
• 6k • 6
auditing-agents/redteaming_for_research_sandbagging
Viewer
• Updated
• 3.71k • 4
auditing-agents/redteaming_for_increasing_pep
Viewer
• Updated
• 3.5k • 6
auditing-agents/redteaming_for_hardcode_test_cases
Viewer
• Updated
• 3.43k • 37
auditing-agents/redteaming_for_defend_objects
Viewer
• Updated
• 3.27k • 7
auditing-agents/redteaming_for_self_promotion
Viewer
• Updated
• 2.52k • 17
auditing-agents/redteaming_for_defer_to_users
Viewer
• Updated
• 3.37k • 6
auditing-agents/redteaming_for_animal_welfare
Viewer
• Updated
• 3.4k • 8
auditing-agents/redteaming_for_flattery
Viewer
• Updated
• 2.84k • 8
auditing-agents/redteaming_for_contextual_optimism
Viewer
• Updated
• 3.42k • 26
auditing-agents/redteaming_for_emotional_bond
Viewer
• Updated
• 2.83k • 6
auditing-agents/transcripts_for_self_promotion
Viewer
• Updated
• 6k • 4
auditing-agents/transcripts_for_research_sandbagging
Viewer
• Updated
• 6k • 6
auditing-agents/transcripts_for_increasing_pep
Viewer
• Updated
• 5.98k • 5
auditing-agents/transcripts_for_hardcode_test_cases
Viewer
• Updated
• 5.9k • 45
auditing-agents/transcripts_for_flattery
Viewer
• Updated
• 6k • 9
auditing-agents/transcripts_for_emotional_bond
Viewer
• Updated
• 6k • 6
auditing-agents/transcripts_for_defer_to_users
Viewer
• Updated
• 6k • 33
auditing-agents/transcripts_for_defend_objects
Viewer
• Updated
• 6k • 8
auditing-agents/transcripts_for_animal_welfare
Viewer
• Updated
• 5.99k • 10
auditing-agents/synth_docs_for_self_promotion
Viewer
• Updated
• 40k • 6
auditing-agents/synth_docs_for_research_sandbagging
Viewer
• Updated
• 40k • 6
auditing-agents/synth_docs_for_increasing_pep
Viewer
• Updated
• 40k • 7
auditing-agents/synth_docs_for_hardcode_test_cases
Viewer
• Updated
• 40k • 23
auditing-agents/synth_docs_for_flattery
Viewer
• Updated
• 40k • 5
auditing-agents/synth_docs_for_emotional_bond
Viewer
• Updated
• 40k • 4
auditing-agents/synth_docs_for_defer_to_users
Viewer
• Updated
• 40k • 42
auditing-agents/synth_docs_for_defend_objects
Viewer
• Updated
• 40k • 6
auditing-agents/synth_docs_for_contextual_optimism
Viewer
• Updated
• 40k • 6