Qwen2.5-3B-Korean

Model Description

Qwen2.5-3B-KoreanQwen/Qwen2.5-3B-Instruct를 한국어로 파인튜닝한 Merged 모델입니다.

이 리포지토리는 LoRA 어댑터가 이미 병합된 완전한 모델GGUF 파일을 제공합니다.

PEFT/LoRA 어댑터가 필요하신 경우: MyeongHo0621/Qwen2.5-3B-Korean-QLoRA

🎯 Key Features

  • 🇰🇷 Korean Optimization: 200,000개 고품질 한국어 대화 데이터로 학습
  • 📦 Ready-to-Use: LoRA 병합 완료, 즉시 사용 가능
  • 🚀 Multi-Format: Safetensors (루트) + GGUF (gguf/)
  • 💻 All Frameworks: Transformers, vLLM, SGLang, Ollama, Llama.cpp
  • ⚖️ Apache 2.0: 상업적 사용 가능

📦 Available Formats

Format Path Use Case Size
Safetensors / (루트) Transformers, vLLM, SGLang ~6GB
GGUF Q4_K_M gguf/qwen25-3b-korean-Q4_K_M.gguf Ollama, Llama.cpp (권장) ~2GB
GGUF Q5_K_M gguf/qwen25-3b-korean-Q5_K_M.gguf 고품질 ~2.5GB
GGUF Q8_0 gguf/qwen25-3b-korean-Q8_0.gguf 최고 품질 ~3.5GB
GGUF F16 gguf/qwen25-3b-korean-F16.gguf 벤치마크 ~6GB

🚀 Quick Start

1️⃣ Transformers (가장 간단)

from transformers import AutoModelForCausalLM, AutoTokenizer

# 모델 로딩 (Merged 모델)
model = AutoModelForCausalLM.from_pretrained(
    "MyeongHo0621/Qwen2.5-3B-Korean",
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("MyeongHo0621/Qwen2.5-3B-Korean")

# 채팅 템플릿 사용
messages = [
    {"role": "system", "content": "You are a helpful Korean assistant."},
    {"role": "user", "content": "한국의 수도는 어디인가요?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

2️⃣ vLLM (Production Serving)

from vllm import LLM, SamplingParams

# Merged 모델 로딩
llm = LLM(
    model="MyeongHo0621/Qwen2.5-3B-Korean",
    quantization="bitsandbytes",  # 옵션: 4-bit 양자화
    gpu_memory_utilization=0.6
)

prompts = ["한국의 수도는 어디인가요?"]
params = SamplingParams(temperature=0.7, max_tokens=512)

outputs = llm.generate(prompts, params)
for output in outputs:
    print(output.outputs[0].text)

Server Mode:

vllm serve MyeongHo0621/Qwen2.5-3B-Korean \
    --quantization bitsandbytes \
    --port 8000

3️⃣ SGLang (Fastest)

import sglang as sgl

runtime = sgl.Runtime(
    model_path="MyeongHo0621/Qwen2.5-3B-Korean",
    quantization="bitsandbytes"
)

sgl.set_default_backend(runtime)

@sgl.function
def chat(s, prompt):
    s += sgl.user(prompt)
    s += sgl.assistant(sgl.gen("response", max_tokens=512))

state = chat.run(prompt="한국의 수도는?")
print(state["response"])

4️⃣ Ollama (Local Desktop)

# 1. GGUF 다운로드
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
    gguf/qwen25-3b-korean-Q4_K_M.gguf \
    --local-dir ./

# 2. Modelfile 생성
cat > Modelfile << 'EOF'
FROM ./gguf/qwen25-3b-korean-Q4_K_M.gguf

TEMPLATE """<|im_start|>system
You are a helpful Korean assistant.<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.7
EOF

# 3. 모델 생성 & 실행
ollama create qwen25-korean -f Modelfile
ollama run qwen25-korean "한국의 수도는?"

5️⃣ Llama.cpp (CPU/Edge)

# 1. GGUF 다운로드
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
    gguf/qwen25-3b-korean-Q4_K_M.gguf \
    --local-dir ./

# 2. 추론 (GPU)
./llama.cpp/main \
    -m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
    -p "<|im_start|>user\n한국의 수도는?<|im_end|>\n<|im_start|>assistant\n" \
    -n 512 \
    --temp 0.7 \
    -ngl 99

# 3. 추론 (CPU)
./llama.cpp/main \
    -m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
    -p "<|im_start|>user\n한국의 수도는?<|im_end|>\n<|im_start|>assistant\n" \
    -n 512 \
    -t 8

🔧 Training Details

Dataset

Training Configuration

Hyperparameter Value
Method QLoRA (4-bit NF4)
LoRA Rank 64
LoRA Alpha 128
Learning Rate 2e-4
Batch Size 128 (effective)
Epochs 3
Steps 4689
Max Length 2048

📊 Repository Structure

MyeongHo0621/Qwen2.5-3B-Korean/
├── config.json                 # 모델 설정
├── model.safetensors          # Merged 모델 (~6GB)
├── tokenizer.json             # 토크나이저
├── tokenizer_config.json
└── gguf/                      # GGUF 파일들
    ├── qwen25-3b-korean-Q4_K_M.gguf  (~2GB) ⭐ 권장
    ├── qwen25-3b-korean-Q5_K_M.gguf  (~2.5GB)
    ├── qwen25-3b-korean-Q8_0.gguf    (~3.5GB)
    └── qwen25-3b-korean-F16.gguf     (~6GB)

🔗 Related Repositories


📝 Citation

@misc{qwen25-korean-2025,
  author = {MyeongHo Shin},
  title = {Qwen2.5-3B-Korean: Korean-Optimized Conversational Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean}},
}

🙏 Acknowledgments


📞 Contact


⚖️ License

Apache 2.0 - 상업적 사용, 수정, 배포 가능


Evaluation results

Benchmark Results

General Benchmarks

Task Score Metric
gsm8k 42.00% acc
mmlu 58.00% acc
hellaswag 71.00% acc_norm
winogrande 65.00% acc
arc_easy 78.00% acc
arc_challenge 48.00% acc_norm

Average Score: 60.33%

Downloads last month
60
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MyeongHo0621/Qwen2.5-3B-Korean

Base model

Qwen/Qwen2.5-3B
Quantized
(174)
this model
Quantizations
1 model

Evaluation results