WORK IN PROGESS
| Tasks |
Version |
Filter |
n-shot |
Metric |
|
Value |
|
Stderr |
| arc_challenge |
1 |
none |
0 |
acc |
↑ |
0.2193 |
± |
0.0121 |
|
|
none |
0 |
acc_norm |
↑ |
0.2517 |
± |
0.0127 |
| arc_easy |
1 |
none |
0 |
acc |
↑ |
0.2399 |
± |
0.0088 |
|
|
none |
0 |
acc_norm |
↑ |
0.2428 |
± |
0.0088 |
| boolq |
2 |
none |
0 |
acc |
↑ |
0.6116 |
± |
0.0085 |
| hellaswag |
1 |
none |
0 |
acc |
↑ |
0.2546 |
± |
0.0043 |
|
|
none |
0 |
acc_norm |
↑ |
0.2647 |
± |
0.0044 |
| openbookqa |
1 |
none |
0 |
acc |
↑ |
0.1540 |
± |
0.0162 |
|
|
none |
0 |
acc_norm |
↑ |
0.2680 |
± |
0.0198 |
| piqa |
1 |
none |
0 |
acc |
↑ |
0.5413 |
± |
0.0116 |
|
|
none |
0 |
acc_norm |
↑ |
0.5310 |
± |
0.0116 |
| winogrande |
1 |
none |
0 |
acc |
↑ |
0.5020 |
± |
0.0141 |