Instructions to use z-lab/Qwen3.5-27B-DFlash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use z-lab/Qwen3.5-27B-DFlash with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="z-lab/Qwen3.5-27B-DFlash", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("z-lab/Qwen3.5-27B-DFlash", trust_remote_code=True)
model = AutoModel.from_pretrained("z-lab/Qwen3.5-27B-DFlash", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use z-lab/Qwen3.5-27B-DFlash with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "z-lab/Qwen3.5-27B-DFlash"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "z-lab/Qwen3.5-27B-DFlash",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/z-lab/Qwen3.5-27B-DFlash

SGLang

How to use z-lab/Qwen3.5-27B-DFlash with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "z-lab/Qwen3.5-27B-DFlash" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "z-lab/Qwen3.5-27B-DFlash",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "z-lab/Qwen3.5-27B-DFlash" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "z-lab/Qwen3.5-27B-DFlash",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use z-lab/Qwen3.5-27B-DFlash with Docker Model Runner:
```
docker model run hf.co/z-lab/Qwen3.5-27B-DFlash
```

FP8 work for base model or is 16-bit of 27B required?

by unoid - opened Mar 31

Discussion

unoid

Mar 31

Running vllm with dflash on FP8 of 27B, 15 spec num averages very low acceptance rate ~12%. spec=8 is around 25-30%. Performance at 8 is on par with MTP=3.

jianchen0311

Z Lab org Mar 31

•

edited Mar 31

I believe this draft model can also be used with Qwen3.5-27B-FP8, I benchmarked this draft model with both the BF16 target model and the FP8 target model on humaneval, and the acceptance length is very close.

Here are the Qwen3.5-27B results on vLLM.

Successful requests:                     164       
Failed requests:                         0         
Maximum request concurrency:             1         
Benchmark duration (s):                  389.19    
Total input tokens:                      24600     
Total generated tokens:                  165775    
Request throughput (req/s):              0.42      
Output token throughput (tok/s):         425.95    
Peak output token throughput (tok/s):    57.00     
Peak concurrent requests:                3.00      
Total token throughput (tok/s):          489.16    
---------------Time to First Token----------------
Mean TTFT (ms):                          66.36     
Median TTFT (ms):                        65.70     
P99 TTFT (ms):                           84.54     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2.27      
Median TPOT (ms):                        2.18      
P99 TPOT (ms):                           3.68      
---------------Inter-token Latency----------------
Mean ITL (ms):                           18.30     
Median ITL (ms):                         18.35     
P99 ITL (ms):                            20.29     
---------------Speculative Decoding---------------
Acceptance rate (%):                     47.24     
Acceptance length:                       8.09      
Drafts:                                  20503     
Draft tokens:                            307545    
Accepted tokens:                         145292    
Per-position acceptance (%):
  Position 0:                            92.54     
  Position 1:                            82.47     
  Position 2:                            72.99     
  Position 3:                            64.77     
  Position 4:                            57.66     
  Position 5:                            51.52     
  Position 6:                            46.24     
  Position 7:                            41.87     
  Position 8:                            37.78     
  Position 9:                            34.20     
  Position 10:                           31.07     
  Position 11:                           28.10     
  Position 12:                           25.33     
  Position 13:                           22.50     
  Position 14:                           19.59

Here are the Qwen3.5-27B-FP8 results:

Successful requests:                     164       
Failed requests:                         0         
Maximum request concurrency:             1         
Benchmark duration (s):                  395.08    
Total input tokens:                      24600     
Total generated tokens:                  165556    
Request throughput (req/s):              0.42      
Output token throughput (tok/s):         419.05    
Peak output token throughput (tok/s):    57.00     
Peak concurrent requests:                3.00      
Total token throughput (tok/s):          481.31    
---------------Time to First Token----------------
Mean TTFT (ms):                          91.81     
Median TTFT (ms):                        66.50     
P99 TTFT (ms):                           127.55    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          2.23      
Median TPOT (ms):                        2.10      
P99 TPOT (ms):                           3.75      
---------------Inter-token Latency----------------
Mean ITL (ms):                           18.16     
Median ITL (ms):                         17.93     
P99 ITL (ms):                            20.07     
---------------Speculative Decoding---------------
Acceptance rate (%):                     46.52     
Acceptance length:                       7.98      
Drafts:                                  20754     
Draft tokens:                            311310    
Accepted tokens:                         144822    
Per-position acceptance (%):
  Position 0:                            92.42     
  Position 1:                            82.01     
  Position 2:                            72.59     
  Position 3:                            63.95     
  Position 4:                            56.51     
  Position 5:                            50.32     
  Position 6:                            45.27     
  Position 7:                            40.82     
  Position 8:                            36.81     
  Position 9:                            33.44     
  Position 10:                           30.36     
  Position 11:                           27.66     
  Position 12:                           24.73     
  Position 13:                           21.82     
  Position 14:                           19.10

unoid

Mar 31

Interesting, it must be a mis configuration on my Sm120 6000 blackwell and vllm cu130nightly.

jianchen0311

Z Lab org Mar 31

As DFlash was just merged into vLLM, there are probably some issues. I will try to run on RTX 6000 Blackwell to see if I can reproduce your problem 👀

hampsonw

Apr 1

•

edited Apr 2

similarly, i'm interested in if it's possible to use the parquant model instead of either BF16 or FP8 z-lab/Qwen3.5-27B-PARO

I run a 2x3090 setup and am wondering if anyone in the community has tried this or if ampere in general has been tested.

unoid

Apr 3

•

edited Apr 3

Tested again on vllm 18.2rc1 cu130 nightly. rtx 6000 blackwell.

vllm/vllm-openai:cu130-nightly \
  /models/Qwen3.5-27B-FP8 \
  --async-scheduling \
  --quantization fp8 \
  --served-model-name Qwen3.5 \
  --tensor-parallel-size 1 \
  --dtype auto \
  --kv-cache-dtype auto \
  --trust-remote-code \
  --gpu-memory-utilization 0.92 \
  --max-num-seqs 32 \
  --enable-prefix-caching \
  --enable-chunked-prefill \
  --max-num-batched-tokens 16384 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser qwen3 \
  --speculative-config '{"method": "dflash", "model": "/models/Qwen3.5-27B-DFlash", "num_speculative_tokens": 8}' \
  --max-model-len 262144

Acceptance still averaging ~20%. I tried max-num-batched-tokens 8192 and 16384. With and without multi modal.

Are there still PR's from z-lab pending merge to master?

matichon

Apr 7

I confirm same behaviour with unoid.
Running on H100, CUDA 13.1 vllm 0.19.1rc1.dev70+g8060bb033 (Build from source)

CUDA_VISIBLE_DEVICES=0 vllm serve /share_weight/Qwen3.5-27B-FP8 
--served-model-name Qwen3.5-27B-FP8 --host 0.0.0.0 --port 9810 \
--tensor-parallel-size 1 --speculative-config '{"method": "dflash", "model": "/share_weight/Qwen3.5-27B-DFlash", "num_speculative_tokens": 8}' \
--max-num-batched-tokens 32768 --max-model-len 220000 \
--reasoning-parser qwen3  --enable-auto-tool-choice --tool-call-parser qwen3_coder \
--chat-template /share_weight/Qwen3.5-27B-FP8/chat_template.jinja

Acceptance averaging ~20%

jianchen0311

Z Lab org Apr 7

That's interesting, I tested on B200 and FP8 seems works well. Let me test on H100.

JDWarner

Apr 8

•

edited Apr 8

On DGX Spark (different Blackwell with similar shader model to RTX Pro 6000; Spark is 12.1a) I am able to get this working in vLLM 0.19 but similarly see relatively low acceptance at n=15. My work has some specialty vocab so I'm seeing under 20%, usually 12-18% acceptance, with near-zero beyond position 8. Some of this is probably due to the documents and complexity. My guess is this is a similar root cause.

It's still a decent throughput gain. Realizing the full potential would be incredible! I'd gladly test anything you like.

Edit: In case it might help, I am using the official Qwen FP8 quant with the --speculative-config suggested, flash_attn, and --max-num-batched-tokens 32768. I also have prefix caching enabled as well as reasoning and tool calling.

JDWarner

Apr 8

In case it helps, these lines from the startup log seemed a little odd, as if it though it was an EAGLE model but with strange features. It does work though:

(EngineCore pid=160) INFO 04-08 16:41:56 [eagle.py:1395] Detected EAGLE model without its own embed_tokens in the checkpoint. Sharing target model embedding weights with the draft model.
(EngineCore pid=160) INFO 04-08 16:41:56 [eagle.py:1450] Detected EAGLE model without its own lm_head in the checkpoint. Sharing target model lm_head weights with the draft model.
(EngineCore pid=160) INFO 04-08 16:41:56 [gpu_model_runner.py:4797] Using auxiliary layers from speculative config: (1, 16, 31, 46, 61)

matichon

Apr 10

Quick feedback,
--no-enable-prefix-caching
this flag help to boots the acceptance rate from <20% to 30-35%.

jianchen0311

Z Lab org Apr 10

@matichon Thanks for the information! That’s interesting. I would have thought prefix caching shouldn’t directly affect the acceptance rate. I need to take a closer look at this.

JDWarner

Apr 12

I think there were some bugs in vLLM. The bugs may not have been with DFlash but rather quite possibly the model outputs or maybe Flash Attention 2. Regardless, I rebuilt yesterday and tested with FP8 and the int4-AutoRound quants on DGX Spark.

Where I was seeing really poor acceptance beyond position 2, now in benchmarks (especially for coding tasks) I see throughput of up to ~70 tok/s. That's incredible on this hardware. It isn't all that high - but even for complex analysis of scientific documents it is a boost over the built-in MTP.

Initially I ran it with --no-enable-prefix-caching per @matichon above, but just finished testing with prefix caching enabled again, and the acceptance rates and throughput are stable. Again, it feels like a bug has been patched.

SongXiaoMao

Apr 14

I use 4*3090 FP16 27B with a low acceptance rate

docker rm -f $(docker ps -aq)
docker run -d
--gpus all
--memory 32g
--memory-swap 64g
--shm-size 32g
-p 8000:8000
-v /home/cheng/model/Qwen3.5-27B:/model
-v /home/cheng/model/Qwen3.5-27B-DFlash:/draft-model
-v /home/cheng/vllm_cache:/root/.cache/vllm
--ipc=host
--name vllm
--env VLLM_USE_FLASHINFER_SAMPLER=1
--env OMP_NUM_THREADS=2
--env VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1
--env PYTORCH_ALLOC_CONF=expandable_segments:True
--env HF_HUB_OFFLINE=1
--env VLLM_ENGINE_ITERATION_TIMEOUT_S=1800
--env VLLM_ENGINE_READY_TIMEOUT_S=1800
--env VLLM_RPC_TIMEOUT=1800000
--env VLLM_EXECUTE_MODEL_TIMEOUT_SECONDS=1800
--env VLLM_LOG_STATS_INTERVAL=1.0
--env LD_LIBRARY_PATH='/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/lib/x86_64-linux-gnu'
vllm/vllm-openai:nightly
/model
--served-model-name Qwen3.5-27B
--mm-encoder-attn-backend TORCH_SDPA
--dtype auto
--enable-auto-tool-choice
--tool-call-parser qwen3_coder
--reasoning-parser qwen3
--gpu-memory-utilization 0.90
--disable-custom-all-reduce
--max-model-len 131072
--max-num-seqs 10
--tensor-parallel-size 4
--limit-mm-per-prompt '{"image": 30, "video": 0}'
--async-scheduling
--default-chat-template-kwargs '{"enable_thinking": false}'
--generation-config vllm
--speculative-config '{"method": "dflash", "model": "/draft-model", "num_speculative_tokens": 5}'
--host 0.0.0.0

matichon

Apr 15

Quick summarize

On the GSM8K benchmark with a concurrency of 1, I achieved an average acceptance length of 7–8 tokens at a 40% acceptance rate.
However, using my own random chat inputs in the first turn, the acceptance length dropped to approximately 3 tokens with a 20% acceptance rate.

In comparison, using the native Multi-Token Prediction (MTP) with num_speculative_tokens set to 5,
my own inputs achieved about 3.5 tokens at a 60% acceptance rate.

I suspect this discrepancy stems from the robustness of the training data distribution.
Hopefully, the poor performance on the custom input is simply due to using an incorrect checkpoint for the DFlash model.

num-seqs	concurrent	tok/s
1	1	200 ++
1	2	200 ++
16	1	200 ++
16	2	100 ++

Dependency

H100 80GB SXM
cuda 12.8
vllm 0.19.1rc1.dev297+g799973af4

16 num-seqs

Concurrent 1

python -m dflash.benchmark --backend vllm     --base-url http://localhost:9810 --model Qwen3.5-27B-FP8 --dataset gsm8k --num-prompts 128 --concurrency 1 --enable-thinking

(APIServer pid=8529) INFO 04-15 07:47:30 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 7.42, Accepted throughput: 230.47 tokens/s, Drafted throughput: 574.33 
tokens/s, Accepted: 2305 tokens, Drafted: 5744 tokens, Per-position acceptance rate: 0.933, 0.805, 0.716, 0.621, 0.507, 0.443, 0.412, 0.370, 0.334, 0.292, 0.240, 0.209, 0.175,
 0.139, 0.120, 0.103, Avg Draft acceptance rate: 40.1%                                                                                                                         
(APIServer pid=8529) INFO:     127.0.0.1:56872 - "POST /v1/chat/completions HTTP/1.1" 200 OK                                                                                   
(APIServer pid=8529) INFO 04-15 07:47:40 [loggers.py:271] Engine 000: Avg prompt throughput: 13.0 tokens/s, Avg generation throughput: 247.4 tokens/s, Running: 1 reqs, Waiting
: 0 reqs, GPU KV cache usage: 9.3%, Prefix cache hit rate: 0.0%                                                                                                                
(APIServer pid=8529) INFO 04-15 07:47:40 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 6.75, Accepted throughput: 210.98 tokens/s, Drafted throughput: 587.14 
tokens/s, Accepted: 2110 tokens, Drafted: 5872 tokens, Per-position acceptance rate: 0.883, 0.744, 0.605, 0.534, 0.452, 0.403, 0.349, 0.324, 0.270, 0.237, 0.213, 0.188, 0.161,
 0.147, 0.131, 0.109, Avg Draft acceptance rate: 35.9%

Concurrent 2

python -m dflash.benchmark --backend vllm     --base-url http://localhost:9810 --model Qwen3.5-27B-FP8 --dataset gsm8k --num-prompts 128 --concurrency 2 --enable-thinking

(APIServer pid=9685) INFO 04-15 07:50:34 [loggers.py:271] Engine 000: Avg prompt throughput: 35.4 tokens/s, Avg generation throughput: 189.6 tokens/s, Running: 2 reqs, Waiting
: 0 reqs, GPU KV cache usage: 18.1%, Prefix cache hit rate: 0.0%(APIServer pid=9685) INFO 04-15 07:50:34 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 2.73, Accepted throughput: 120.08 tokens/s, Drafted throughput: 1113.44 tokens/s, Accepted: 1201 tokens, Drafted: 11136 tokens, Per-position acceptance rate: 0.227, 0.201, 0.172, 0.151, 0.136, 0.128, 0.116, 0.101, 0.089, 0.080, 0.072, 0.066, 0.05
9, 0.050, 0.040, 0.036, Avg Draft acceptance rate: 10.8%
(APIServer pid=9685) INFO:     127.0.0.1:49702 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=9685) INFO:     127.0.0.1:49714 - "POST /v1/chat/completions HTTP/1.1" 200 OK(APIServer pid=9685) INFO 04-15 07:50:44 [loggers.py:271] Engine 000: Avg prompt throughput: 19.8 tokens/s, Avg generation throughput: 187.1 tokens/s, Running: 2 reqs, Waiting
: 0 reqs, GPU KV cache usage: 18.1%, Prefix cache hit rate: 0.0%(APIServer pid=9685) INFO 04-15 07:50:44 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 2.66, Accepted throughput: 116.68 tokens/s, Drafted throughput: 1123.04 tokens/s, Accepted: 1167 tokens, Drafted: 11232 tokens, Per-position acceptance rate: 0.362, 0.283, 0.219, 0.171, 0.141, 0.117, 0.081, 0.067, 0.054, 0.047, 0.036, 0.027, 0.02
3, 0.017, 0.010, 0.007, Avg Draft acceptance rate: 10.4%

1 num-seqs

Concurrent 1

python -m dflash.benchmark --backend vllm     --base-url http://localhost:9810 --model Qwen3.5-27B-FP8 --dataset gsm8k --num-prompts 128 --concurrency 1 --enable-thinking

(APIServer pid=10840) INFO 04-15 07:54:48 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 5.84, Accepted throughput: 177.49 tokens/s, Drafted throughput: 587.16 tokens/s, Accepted: 1775 tokens, Drafted: 5872 tokens, Per-position acceptance rate: 0.858, 0.711, 0.553, 0.482, 0.406, 0.346, 0.283, 0.243, 0.193, 0.150, 0.131, 0.114, 0.104
, 0.095, 0.084, 0.082, Avg Draft acceptance rate: 30.2%
(APIServer pid=10840) INFO:     127.0.0.1:43544 - "POST /v1/chat/completions HTTP/1.1" 200 OK(APIServer pid=10840) INFO 04-15 07:54:58 [loggers.py:271] Engine 000: Avg prompt throughput: 7.8 tokens/s, Avg generation throughput: 255.3 tokens/s, Running: 1 reqs, Waiting
: 0 reqs, GPU KV cache usage: 9.6%, Prefix cache hit rate: 0.0%(APIServer pid=10840) INFO 04-15 07:54:58 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 6.96, Accepted throughput: 219.29 tokens/s, Drafted throughput: 588.77 tokens/s, Accepted: 2193 tokens, Drafted: 5888 tokens, Per-position acceptance rate: 0.905, 0.766, 0.668, 0.568, 0.476, 0.421, 0.370, 0.334, 0.277, 0.231, 0.201, 0.185, 0.160
, 0.152, 0.133, 0.111, Avg Draft acceptance rate: 37.2%

Concurrent 2

python -m dflash.benchmark --backend vllm     --base-url http://localhost:9810 --model Qwen3.5-27B-FP8 --dataset gsm8k --num-prompts 128 --concurrency 2 --enable-thinking

(APIServer pid=10840) INFO 04-15 07:52:38 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 6.63, Accepted throughput: 202.57 tokens/s, Drafted throughput: 575.90 tokens/s, Accepted: 2026 tokens, Drafted: 5760 tokens, Per-position acceptance rate: 0.903, 0.775, 0.636, 0.533, 0.489, 0.425, 0.358, 0.286, 0.267, 0.228, 0.194, 0.156, 0.131, 0.103, 0.081, 0.064, Avg Draft acceptance rate: 35.2%
(APIServer pid=10840) INFO:     127.0.0.1:45262 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=10840) INFO:     127.0.0.1:45278 - "POST /v1/chat/completions HTTP/1.1" 200 OK(APIServer pid=10840) INFO 04-15 07:52:48 [loggers.py:271] Engine 000: Avg prompt throughput: 19.8 tokens/s, Avg generation throughput: 254.4 tokens/s, Running: 1 reqs, Waitin
g: 1 reqs, GPU KV cache usage: 9.3%, Prefix cache hit rate: 0.0%(APIServer pid=10840) INFO 04-15 07:52:48 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 7.13, Accepted throughput: 218.77 tokens/s, Drafted throughput: 571.12 tokens/s, Accepted: 2188 tokens, Drafted: 5712 tokens, Per-position acceptance rate: 0.899, 0.796, 0.686, 0.594, 0.515, 0.457, 0.403, 0.356, 0.289, 0.244, 0.210, 0.188, 0.154
, 0.129, 0.112, 0.098, Avg Draft acceptance rate: 38.3%

Johnasson

16 days ago

After updating to vllm 0.21+ It seems to work pretty good on fp8 (on 0.19 < it seemed to crash)

sdd5125

16 days ago

what was your avg draft acceptance rate ?

matichon

15 days ago

•

edited 15 days ago

vLLM 0.21.0
Image Tag : vllm/vllm-openai:v0.21.0-cu129-ubuntu2404@sha256:cba2cabc5ca33baf0bc4776ed2896fe4c8d8b7be7fbbeca88bc63217d07ad320

(APIServer pid=2568) INFO:     172.18.0.19:52452 - "POST /v1/chat/completions HTTP/1.1" 200 OK(APIServer pid=2568) INFO 05-21 07:47:53 [loggers.py:271] Engine 000: Avg prompt throughput: 1600.4 tokens/s, Avg generation throughput: 30.7 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 25.4%

(APIServer pid=2568) INFO 05-21 07:47:53 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 1.87, Accepted throughput: 14.30 tokens/s, Drafted throughput: 262.38 tokens/s, Accepted: 143 tokens, Drafted: 2624 tokens, Per-position acceptance rate: 0.427, 0.232, 0.122, 0.067, 0.018, 0.006, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, Avg Draft acceptance rate: 5.4%

(APIServer pid=2568) INFO 05-21 07:48:03 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage
: 0.0%, Prefix cache hit rate: 25.4%

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment