nc-ai-consortium/VAETKI-7B-A1B 모델을 torch._grouped_mm기반으로 실행하기 위한 인퍼런스 코드 및 파라미터 구조 변경 버전.
- config.json: "experts_implementation": "grouped_mm" 추가
- configuration_vaetki.py: self._experts_implementation = experts_implementation 추가
- VaetkiMoE -> NewVaetkiMoE 로 변경. @use_experts_implementation을 지원하기 위해 DeepseekV3NaiveMoe를 빌려옴.
- transformers==5.0.0 부터 지원 예정이던 @use_experts_implementation를 안정된 transformers==4.57.5에서 쓰기 위해, 5.0.0rc0에서 transformers.integrations.moe.py만 개별로 가져옴.
sm_90 이상 GPU에서만 동작(Hopper, Blackwell)
transformers==4.57.5 에서 동작함
MoE 연산 순서 및 기타 미세한 차이로 인해, 원본 모델과 응답의 동일성을 보장하지 못함.
- fp16에서는 극히 미세한 차이만 보임(출력이 변경되는 경우는 거의 없음)
- bf16에서는 오차가 누적되어, 뒷쪽 MoE의 top k router가 일부 차이가 나고, 출력이 일부 바뀔 정도의 차이가 발생함
- 벤치마크 정확도는 오차범위 내에서 약간 감소하는 듯.
  - HRM8K_GSM8K (lm_eval default): 20.17(±1.11)% → 19.41(±1.09)%
  - 테스트 실행 속도(RTX Pro 6000 기준): 4h 55m → 2h 3m 10s (x2.4배)
실행을 위해 attn_implementation = "flash_attention_2" 필요

from transformers import AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("werty1248/VAETKI-7B-A1B-groupMM-converted",
                                     trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("werty1248/VAETKI-7B-A1B-groupMM-converted",
                                              trust_remote_code=True,
                                              attn_implementation="flash_attention_2",
                                              torch_dtype="bfloat16").to('cuda')

text = "Janet의 오리는 하루에 16개의 알을 낳습니다. 그녀는 매일 아침으로 3개를 먹고, 친구들을 위해 머핀을 구울 때 4개를 사용합니다. 남은 계란은 매일 농산물 시장에 서 신선한 오리 알 하나당 2달러에 판매합니다. 그녀는 매일 농산물 시장에서 얼마를 버나요?"

input = tok.apply_chat_template([{"role":"user", "content": text}], return_tensors='pt', add_generation_prompt=True)

output = model.generate(input.to('cuda'), max_new_tokens = 2048, temperature=0.7, top_p = 0.9)

tok.decode(output[0])

"""
<|role_start|>system<|role_end|>
You are VAETKI(배키, Vertical AI Engine for Transformation of Key Industries), an AI assistant created by NC AI(엔씨에이아이), ETRI(한국전자통신연구원, Electronics and Telecommunications Research Institute), and KU(고려대학교, Korea University).
<|role_start|>user<|role_end|>
Janet의 오리는 하루에 16개의 알을 낳습니다. 그녀는 매일 아침으로 3개를 먹고, 친구들을 위해 머핀을 구울 때 4개를 사용합니다. 남은 계란은 매일 농산물 시장에 서 신선한 오리 알 하나당 2달러에 판매합니다. 그녀는 매일 농산물 시장에서 얼마를 버나요?
<|role_start|>assistant<|role_end|>

<think>
 먼저, Janet의 하루 계란 생산량을 계산해야 합니다. 오리는 하루에 16개의 알을 낳으므로, 매일 아침으로 3개를 먹고, 친구들을 위해 머핀을 구울 때 4개를 사용합니다. 따라서 하루에 사용하는 계란은 3 + 4 = 7개입니다.

그러면 남은 계란은 16 - 7 = 9개입니다.

이 계란은 매일 농산물 시장에 서 신선한 오리 알 하나당 2달러에 판매합니다. 따라서 하루에 버는 금액은 9 * 2 = 18달러입니다.

답은 18달러입니다.
</think>
 Janet은 하루에 16개의 알을 낳으므로, 아침으로 3개, 머핀으로 4개를 사용하면 총 7개를 사용합니다. 따라서 남은 계란은 16 - 7 = 9개입니다.
이 계란은 매일 농산물 시장에서 2달러에 판매하므로, 하루에 버는 금액은 9 × 2 = 18달러입니다.

**답: 18달러**<|END|>
"""

1. VAETKI-7B-A1B Highlights

VAETKI-7B-A1B is a (small) language model developed by the NC-AI, designed especially for inference efficienty. VAETKI series adopt a Mixture-of-Experts (MoE) architecture to effectively balance performance and computational cost.

2. Model Overview

VAETKI-7B-A1B has the following features:

Type: Causal (Auto-regressive) Language Models
Architecture: Transformers, MoE (Mixture of Experts)
Developed by: NC-AI
Training Stage: Pretraining & Post-training
Number of Parameters: 7.25B in total and 1.2B activated
Number of Paramaters (Non-Embedding): 6.8B
Number of Layers: 24
Number of Attention Heads: 12
Number of Experts: 64
Number of Activated Experts: 5
Context Length: 16k tokens
Vocabulary Size: 126k
Languages: Korean, English, Chinese, and Japanese
License: MIT
Related URLs: https://github.com/wbl-ncai/VAETKI/

For more details, please refer to our Technical Report.

3. How to Use

See the Quickstart for more details.

4. Training Details

Training Data

Due to training time and resource constraints, only 1.86 trillion tokens from the available data sources were used for pre-training.

Training Procedure

Hardware
- Platform: Naver Cloud MLX Platform
- GPUs: NVIDIA H100 80GB HBM3 × 256
Software: The model architecture configuration, training loop, checkpointing, and distributed optimization logic were implemented based on Megatron-Core v0.14, with selective modifications to accommodate experimental requirements.
Hyperparameters

Hyperparameters Value

Learning rate 2e-4 → 1e-5

Batch size 8.1M → 32.4M Tokens

Context Length 4096 → 16384

Hyperparameters	Value
Learning rate	2e-4 → 1e-5
Batch size	8.1M → 32.4M Tokens
Context Length	4096 → 16384

5. Evaluation Results

We evaluate VAETKI-7B-A1B on various benchmarks and compare it with a series of models, as shown in the following. All three models were evaluated under the same experimental setup to ensure a fair and consistent comparison.

Language	Tasks	Benchmark (Metric)	Granite-4.0-H-Tiny	OLMoE-1B-7B-0125-Instruct	VAETKI-7B-A1B
		Architecture	MoE	MoE	MoE
		# Total Params	7B	7B	7B
		# Activated Params	1B	1.3B	1.2B
		# Pre-trained Tokens	23T	4.07T	1.86T
Korean	General	KMMLU-Pro	27.1	15.1	24.2
	General	CLIcK	47.7	28.2	40.6
	General	KoBALT	12.1	7.7	11.9
	Reasoning	HRM8K	39.7	3.0	26.5
English	General	MMLU-Pro	41.9	14.0	34.6
	Reasoning	GPQA-Diamond	27.8	30.3	27.2
	Reasoning	MATH500	62.4	25.0	61.8
	Reasoning	IFBench	21.2	18.1	20.9

6. Limitations

Limitations: This model may produce inaccurate or incomplete outputs, including hallucinated content, particularly for ambiguous prompts or tasks requiring high factual accuracy. It may have limitations in complex multi-step reasoning, precise mathematical computation, and strict correctness in code generation. The model does not have the ability to independently verify information.
(Potential) Biases: The training data may contain social or cultural biases, which can be reflected in the model’s outputs. Despite mitigation efforts, biases related to gender, ethnicity, nationality, or religion may still occur.
Out-of-Scope Use: This model is not designed for use in safety-critical or regulated domains, such as medical, legal, financial, or military applications. It should not be relied upon for decisions where errors could lead to harm.

7. License

This model repository is licensed under the MIT License. The use of VAETKI models is subject to the Model License. For information on third-party open-source software and data licenses used in this model, please refer to the NOTICE.md file.

8. Citation

@misc{ncai2025vaetkitechnicalreport,
      title={VAETKI Technical Report}, 
      author={NC-AI Consortium},
      year={2025},
      eprint={xxxx.xxxxx},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/xxxx.xxxxx}, 
}

9. Contact

If you are interested to leave a message or have any questions, please contact us at wbl.ncai.hf@gmail.com.

Downloads last month: 3

Safetensors

Model size

7B params

Tensor type

F32

BF16

Model tree for werty1248/VAETKI-7B-A1B-groupMM-converted

Base model

nc-ai-consortium/VAETKI-7B-A1B

Finetuned

(1)

this model