Instructions to use Qwen/Qwen3.6-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/Qwen3.6-27B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Qwen/Qwen3.6-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Qwen/Qwen3.6-27B")
model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.6-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
AMD Developer Cloud
Local Apps Settings

vLLM

How to use Qwen/Qwen3.6-27B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/Qwen3.6-27B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.6-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/Qwen3.6-27B

SGLang

How to use Qwen/Qwen3.6-27B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/Qwen3.6-27B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.6-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/Qwen3.6-27B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.6-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Qwen/Qwen3.6-27B with Docker Model Runner:
```
docker model run hf.co/Qwen/Qwen3.6-27B
```

Qwen3.6 seem to fake record too often

#18

by anitman - opened Apr 25

Discussion

anitman

Apr 25

•

edited Apr 25

The model's hallucination rate is extremely high.

The Test Case: Hermes Agent Logic
I conducted a comparison using a Q8_0 quantization for Qwen and a UD_Q4_K_XL for MiniMax. The task was simple:

Setup: In a Hermes Agent session, the model is instructed to read a directory based on a JSON record.

Action: Depending on its "mood," it must select and send a specific GIF from that directory according to the JSON record.

The Results
MiniMax-M2.7 (MoE): Despite being an MoE model with only about 10B active parameters and running on lower quantization (Q4), it never failed this task.

Qwen 3.6 27B: Even at the highest precision (Q8), it failed half of the time. It consistently "hallucinated" a JSON record that didn't exist and then attempted to send a non-existent GIF from that imaginary file, resulting in backend errors in the Hermes Agent.

Same instructions in memory record, very different result.
Really annoying.

I seems that the model was never trained under specific consititutions as guideline even though anthropic provided the constitutional ai trainning mechanism.

anitman changed discussion title from Qwen3.6 seem to to Qwen3.6 seem to fake record too often Apr 25

KristofeAI

Apr 26

Try an officially vetted model from alibaba. The quants need to be made correctly or it could interfere with the new architecture in these models. If you don’t have the hardware to support that, you can try openrouter

zhangbo2008

Apr 28

what ablout fp16 version?

anitman

Apr 28

Try an officially vetted model from alibaba. The quants need to be made correctly or it could interfere with the new architecture in these models. If you don’t have the hardware to support that, you can try openrouter

It's not the case actually, I've tried official implementation via vllm in wsl, unsloth version quants from Q8_0 to UD_Q4_XL. Tool calling infinite loops and fake tool calling happend all the times during agentic use. None of these things happened in minimax m2.7. It fails to create a simple cronjob in hermes agent of checking ticket availability every 20 minutes, It showed that it tended to add schedule to the exec but failing all the time. This is a simple task TBA, I got this problem instantly solved by switching to minimax.

anitman

Apr 28

what ablout fp16 version?

It's not related to quantization I'm afraid, you can't explain why a Q8_0 quant have far more tool calling errors than a Q4_K_M quant, it doesnt make any sense.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment