Instructions to use SeaFill2025/Qwen3-4B-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SeaFill2025/Qwen3-4B-SFT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SeaFill2025/Qwen3-4B-SFT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SeaFill2025/Qwen3-4B-SFT")
model = AutoModelForCausalLM.from_pretrained("SeaFill2025/Qwen3-4B-SFT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use SeaFill2025/Qwen3-4B-SFT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SeaFill2025/Qwen3-4B-SFT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SeaFill2025/Qwen3-4B-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/SeaFill2025/Qwen3-4B-SFT

SGLang

How to use SeaFill2025/Qwen3-4B-SFT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SeaFill2025/Qwen3-4B-SFT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SeaFill2025/Qwen3-4B-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SeaFill2025/Qwen3-4B-SFT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SeaFill2025/Qwen3-4B-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use SeaFill2025/Qwen3-4B-SFT with Docker Model Runner:
```
docker model run hf.co/SeaFill2025/Qwen3-4B-SFT
```

Qwen3-4B-SFT / README.md

Sea-fill

Update README.md

c62a4cc verified 15 days ago

preview code

raw

history blame contribute delete

4.08 kB

metadata

language:
  - en
  - zh
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
  - qwen3
  - causal-lm
  - supervised-fine-tuning
  - math
  - reasoning
  - code
  - science
base_model: Qwen/Qwen3-4B-Base
model-index:
  - name: Qwen3-4B-SFT
    results:
      - task:
          type: text-generation
        dataset:
          name: AIME 2024
          type: aime2024
        metrics:
          - name: accuracy
            type: accuracy
            value: 20.8
      - task:
          type: text-generation
        dataset:
          name: AIME 2025
          type: aime2025
        metrics:
          - name: accuracy
            type: accuracy
            value: 19.4
      - task:
          type: text-generation
        dataset:
          name: AMC 2023
          type: amc2023
        metrics:
          - name: accuracy
            type: accuracy
            value: 58
      - task:
          type: text-generation
        dataset:
          name: GPQA-Diamond
          type: gpqa_diamond
        metrics:
          - name: accuracy
            type: accuracy
            value: 29.1

Qwen3-4B-SFT:

Qwen3-4B-SFT is a reasoning-focused model derived from Qwen3-4B-Base via full-parameter fine-tuning on the verl framework.

There is a notable shortage of reproducible 'warm-start' SFT bases in open-source practice, this model bridges the gap between base models and reinforcement learning models. Optimally aligned for Chain-of-Thought (CoT) and instruction following, it serves as a robust warm-start for Reinforcement Learning.

Dataset	Base (4B)†	Qwen3-4B-SFT (this model)	Improvement
AIME 2024	11.25%	20.8%	+9.55%
AIME 2025	6.46%	19.4%	+12.94%
AMC 2023	31.09%	58.0%	+26.91%
GPQA-Diamond	7.77%	29.1%	+21.33%

† Base (4B) figures are taken from (arXiv:2602.10885).

Dataset card used for SFT: https://huggingface.co/datasets/96kevinli29/SFT-Dataset

Qwen3-style reasoning and instruction following

Minimal pattern (illustrative):

<|im_start|>user
… Among options A–D, which is correct? Reason step by step and put the final letter in \boxed{}.
<|im_end|>

<|im_start|>assistant
<think>
Compare A vs B vs C vs D against the stem; eliminate …; D remains consistent with …
</think>
Step-by-step: … (short derivation in the visible channel)
Final answer: \boxed{D}
<|im_end|>

Use a large enough max_new_tokens on hard math so both the reasoning block and the visible \boxed{…} line fit before generation stops.

Configuration Notes

Template: Trained with the Qwen chat template; learns to end responses with <|im_end|> (151645).
Suggested Configuration:
```
{
  "eos_token_id": 151645
}
```

You may adjust settings according to your training or deployment needs.

Training Infrastructure

Cluster: MeluXina Supercomputer (LuxProvide)
Node Config: 4 NVIDIA-A100 GPUs per node.
Final SFT Run: 12 Node-hours (16× A100 for 3 hours)
Total R&D Investment: ~700 Node-hours (Includes data ablation, hyperparameter sweeps, and extensive benchmark evaluation.)

Project Links

Training code repository: https://github.com/96kevinli29/base-model-sft-verl

Limitations

Not optimized for factual correctness in all domains
May still produce hallucinations or unsafe outputs
Performance is sensitive to prompt style and decoding settings

Citation

If you use this model, please cite this checkpoint, bibTeX for this release :

@misc{qwen3-4b-sft-2026,
  title        = {{Qwen3-4B-SFT}: Supervised Fine-Tuned {Qwen3}-4B for Reasoning},
  author       = {Hongyang Li, Xiao Li and {Sea-Fill Community}},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/SeaFill2025/Qwen3-4B-SFT}},
  note         = {Checkpoint trained with verl; warm-start for pre-RL alignment research. Maintained by Sea-Fill Community.}
}