models
44
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter3
Text Generation
•
3B
•
Updated
•
7
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter2
Text Generation
•
3B
•
Updated
•
6
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter1
Text Generation
•
3B
•
Updated
•
5
RegularizedSelfPlay/Gemma-2-2B-SPPO-It-Iter1
Text Generation
•
3B
•
Updated
•
9
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.5-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
7
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.05-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
7
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.05-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table
Text Generation
•
8B
•
Updated
•
3
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.5-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.5-sppo-reversekl-table
Text Generation
•
8B
•
Updated
•
7