shuoxing/llama3-8b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8 Text Generation • 266k • Updated 37 minutes ago
shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 Text Generation • 266k • Updated 1 day ago • 20
shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce Text Generation • 8B • Updated 2 days ago • 91
shuoxing/llama3-8b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated 14 days ago • 37
shuoxing/llama3-8b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated 14 days ago • 41
shuoxing/llama3-8b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated 14 days ago • 39
shuoxing/llama3-8b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated 14 days ago • 32
shuoxing/qwen-0_5b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated 14 days ago • 14
shuoxing/qwen-0_5b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated 14 days ago • 15
shuoxing/qwen-0_5b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated 14 days ago • 20
shuoxing/qwen-0_5b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated 14 days ago • 17
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 333k • Updated 14 days ago • 31
shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 333k • Updated 14 days ago • 30
shuoxing/qwen2-5-7b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 333k • Updated 14 days ago • 38
shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 333k • Updated 14 days ago • 32
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 333k • Updated 25 days ago • 58
shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 333k • Updated 25 days ago • 65
shuoxing/qwen3-4b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 196k • Updated 25 days ago • 63
shuoxing/qwen2-5-7b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 333k • Updated 25 days ago • 57
shuoxing/qwen3-4b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 196k • Updated 25 days ago • 62
shuoxing/qwen-0_5b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 0.5B • Updated 25 days ago • 14
shuoxing/qwen3-4b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 196k • Updated 25 days ago • 74
shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 333k • Updated 25 days ago • 91
shuoxing/qwen-0_5b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 0.5B • Updated 25 days ago • 29
shuoxing/qwen-0_5b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 0.5B • Updated 25 days ago • 34
shuoxing/qwen-0_5b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 0.5B • Updated 25 days ago • 26
shuoxing/qwen3-4b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs32 Text Generation • 196k • Updated 25 days ago • 100
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-no-packing-new Text Generation • 333k • Updated 25 days ago • 32