N2048M commited on
Commit
e82f81e
·
verified ·
1 Parent(s): 1d0e455

Refresh model card: dllm-hub style + arXiv 2604.26951 + PKU-YuanGroup/TIDE link + logo

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +76 -30
  3. logo.gif +3 -0
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ logo.gif filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,58 +1,104 @@
1
  ---
2
  license: apache-2.0
3
  library_name: transformers
 
 
4
  tags:
5
- - diffusion-language-model
6
- - distillation
7
  - dllm
8
- - qwen3
9
  - bd3lm
10
- base_model: dllm-collection/Qwen3-0.6B-diffusion-bd3lm-v0.1
11
  ---
12
 
 
 
 
13
  # distill-WeDLM-TIDE_Shared
14
 
15
- This is the **TIDE-Shared (native, Pipeline B)** student checkpoint from
16
- *Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models*.
17
- It is a 0.6B Qwen3-BD3LM diffusion language model distilled in the **Shared-Tokenizer (WeDLM → BD3LM)** pipeline.
18
- Native variant for the shared-tokenizer pipeline; TIDAL + CompDemo.
19
 
20
- - Code: https://github.com/Nobody-Zhang/dllm_release
21
- - Project page: https://pku-yuangroup.github.io/TIDE-Page/
22
- - Paper: https://arxiv.org/abs/XXXX.XXXXX
23
 
24
- ## Loading
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- This checkpoint uses custom modeling code (`A2DQwen3LMHeadModel`) registered via the `auto_map` field in `config.json`. Pass `trust_remote_code=True`:
 
 
 
27
 
28
  ```python
29
- from transformers import AutoConfig, AutoModelForMaskedLM, AutoTokenizer
 
30
 
31
  repo = "TIDE-dllm/distill-WeDLM-TIDE_Shared"
32
- cfg = AutoConfig.from_pretrained(repo, trust_remote_code=True)
33
- tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
34
- model = AutoModelForMaskedLM.from_pretrained(repo, trust_remote_code=True)
 
 
 
 
 
 
 
 
 
 
 
 
35
  ```
36
 
37
- For sampling and evaluation, install the companion library:
 
 
38
 
39
- ```bash
40
- git clone https://github.com/Nobody-Zhang/dllm_release && cd dllm_release
41
- pip install -e .
 
 
42
  ```
43
 
44
- See the GitHub README for full sampling, evaluation, and distillation instructions.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## Citation
47
 
48
  ```bibtex
49
- @misc{tide2026,
50
- title = {Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models},
51
- author = {Zhang, Gongbo and Wang, Wen and Tian, Ye and Yuan, Li},
52
- year = {2026},
53
- eprint = {XXXX.XXXXX},
54
- archivePrefix = {arXiv},
55
- primaryClass = {cs.CL},
56
- url = {https://arxiv.org/abs/XXXX.XXXXX},
57
  }
58
  ```
 
1
  ---
2
  license: apache-2.0
3
  library_name: transformers
4
+ pipeline_tag: text-generation
5
+ base_model: dllm-collection/Qwen3-0.6B-diffusion-bd3lm-v0.1
6
  tags:
7
+ - diffusion
 
8
  - dllm
 
9
  - bd3lm
10
+ - distillation
11
  ---
12
 
13
+ <center> <div style="text-align: center;"> <img src="logo.gif" width="300" />
14
+ </div> </center>
15
+
16
  # distill-WeDLM-TIDE_Shared
17
 
18
+ This model was introduced in the paper [Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models](https://huggingface.co/papers/2604.26951) this is the **native (paper-best) variant** for its pipeline.
 
 
 
19
 
20
+ `distill-WeDLM-TIDE_Shared` is a 0.6B diffusion language model distilled from WeDLM-8B-Instruct (8B dense) into the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) student in the **Shared-Tokenizer (Pipeline B)** of the TIDE framework. Native variant for the shared-tokenizer pipeline; **TIDAL + CompDemo** over forward KL.
 
 
21
 
22
+ ## Model Overview
23
+
24
+ - **Method**: TIDE — [Reverse CALM / TIDAL / CompDemo](https://arxiv.org/abs/2604.26951) (cross-architecture distillation for diffusion LMs)
25
+ - **Framework**: [TIDE / dLLM](https://github.com/PKU-YuanGroup/TIDE)
26
+ - **Student (initialization)**: [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) (BD3LM, block_size=32)
27
+ - **Teacher**: [`tencent/WeDLM-8B-Instruct`](https://huggingface.co/tencent/WeDLM-8B-Instruct)
28
+ - **Distillation mode**: `--distill_mode taid_aligned --use_comp_demo True`
29
+ - **Datasets**: [tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture), [smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), [opc-sft-stage1](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage1) and [opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2) — same composition as the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) base. Pre-tokenized for this teacher in [`TIDE-dllm/distill_wedlm_sft`](https://huggingface.co/datasets/TIDE-dllm/distill_wedlm_sft).
30
+
31
+ ## Installation
32
+
33
+ ```shell
34
+ pip install torch transformers accelerate
35
+ ```
36
 
37
+ ## Quick Start
38
+
39
+ > [!NOTE]
40
+ > This checkpoint is fully compatible with the BD3LM `generate(...)` routine published with [`dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) — only the model name changes.
41
 
42
  ```python
43
+ import torch
44
+ from transformers import AutoModelForMaskedLM, AutoTokenizer
45
 
46
  repo = "TIDE-dllm/distill-WeDLM-TIDE_Shared"
47
+ device = "cuda" if torch.cuda.is_available() else "cpu"
48
+
49
+ model = AutoModelForMaskedLM.from_pretrained(
50
+ repo, dtype=torch.bfloat16, trust_remote_code=True,
51
+ ).to(device).eval()
52
+ tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
53
+
54
+ prompts = [
55
+ [
56
+ {"role": "system", "content": "You are a helpful AI assistant."},
57
+ {"role": "user", "content": "Implement a DFS traversal in Python with clear inline comments."},
58
+ ],
59
+ ]
60
+ encoded = [tokenizer.apply_chat_template(m, add_generation_prompt=True, tokenize=True, enable_thinking=False) for m in prompts]
61
+ # ... use the same `generate()` function as in dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1.
62
  ```
63
 
64
+ ## Command-Line Interface
65
+
66
+ For an interactive demo (visualised iterative denoising), use the script in the [TIDE / dLLM repo](https://github.com/PKU-YuanGroup/TIDE):
67
 
68
+ ```shell
69
+ python -u examples/a2d/bd3lm/chat.py \
70
+ --model_name_or_path TIDE-dllm/distill-WeDLM-TIDE_Shared \
71
+ --chat_template True --block_size 32 --remasking low_confidence \
72
+ --steps 256 --max_new_tokens 256
73
  ```
74
 
75
+ ## Reproducing this checkpoint
76
+
77
+ ```shell
78
+ git clone https://github.com/PKU-YuanGroup/TIDE && cd TIDE
79
+ pip install -e . && git submodule update --init --recursive
80
+ pip install -e "lm-evaluation-harness[ifeval,math]" && pip install -e "tokenkit[full]"
81
+
82
+ # Download the pre-tokenized SFT mixture for this teacher
83
+ huggingface-cli download TIDE-dllm/distill_wedlm_sft --repo-type dataset \
84
+ --local-dir data/distill_wedlm_sft
85
+
86
+ bash scripts/distill_wedlm.sh \
87
+ --data_path data/distill_wedlm_sft \
88
+ --distill_mode taid_aligned --use_comp_demo True \
89
+ --num_gpus 8
90
+ ```
91
 
92
  ## Citation
93
 
94
  ```bibtex
95
+ @misc{zhang2026turningtidecrossarchitecturedistillation,
96
+ title={Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models},
97
+ author={Gongbo Zhang and Wen Wang and Ye Tian and Li Yuan},
98
+ year={2026},
99
+ eprint={2604.26951},
100
+ archivePrefix={arXiv},
101
+ primaryClass={cs.CL},
102
+ url={https://arxiv.org/abs/2604.26951},
103
  }
104
  ```
logo.gif ADDED

Git LFS Details

  • SHA256: be7bf6dd8cae18da51d0f500bf2deb43cdefa73edee233f9b6d9caf4cf24cda2
  • Pointer size: 131 Bytes
  • Size of remote file: 913 kB