Instructions to use Qwen/Qwen3.6-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen3.6-27B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Qwen/Qwen3.6-27B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Qwen/Qwen3.6-27B") model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.6-27B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- AMD Developer Cloud
- Local Apps Settings
- vLLM
How to use Qwen/Qwen3.6-27B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen3.6-27B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3.6-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Qwen/Qwen3.6-27B
- SGLang
How to use Qwen/Qwen3.6-27B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen3.6-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3.6-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen3.6-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3.6-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Qwen/Qwen3.6-27B with Docker Model Runner:
docker model run hf.co/Qwen/Qwen3.6-27B
Qwen3.6 seem to fake record too often
The model's hallucination rate is extremely high.
The Test Case: Hermes Agent Logic
I conducted a comparison using a Q8_0 quantization for Qwen and a UD_Q4_K_XL for MiniMax. The task was simple:
Setup: In a Hermes Agent session, the model is instructed to read a directory based on a JSON record.
Action: Depending on its "mood," it must select and send a specific GIF from that directory according to the JSON record.
The Results
MiniMax-M2.7 (MoE): Despite being an MoE model with only about 10B active parameters and running on lower quantization (Q4), it never failed this task.
Qwen 3.6 27B: Even at the highest precision (Q8), it failed half of the time. It consistently "hallucinated" a JSON record that didn't exist and then attempted to send a non-existent GIF from that imaginary file, resulting in backend errors in the Hermes Agent.
Same instructions in memory record, very different result.
Really annoying.
I seems that the model was never trained under specific consititutions as guideline even though anthropic provided the constitutional ai trainning mechanism.
Try an officially vetted model from alibaba. The quants need to be made correctly or it could interfere with the new architecture in these models. If you don’t have the hardware to support that, you can try openrouter
what ablout fp16 version?
Try an officially vetted model from alibaba. The quants need to be made correctly or it could interfere with the new architecture in these models. If you don’t have the hardware to support that, you can try openrouter
It's not the case actually, I've tried official implementation via vllm in wsl, unsloth version quants from Q8_0 to UD_Q4_XL. Tool calling infinite loops and fake tool calling happend all the times during agentic use. None of these things happened in minimax m2.7. It fails to create a simple cronjob in hermes agent of checking ticket availability every 20 minutes, It showed that it tended to add schedule to the exec but failing all the time. This is a simple task TBA, I got this problem instantly solved by switching to minimax.
what ablout fp16 version?
It's not related to quantization I'm afraid, you can't explain why a Q8_0 quant have far more tool calling errors than a Q4_K_M quant, it doesnt make any sense.