Instructions to use typhoon-ai/typhoon-ocr1.5-3b-qat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use typhoon-ai/typhoon-ocr1.5-3b-qat with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="typhoon-ai/typhoon-ocr1.5-3b-qat")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("typhoon-ai/typhoon-ocr1.5-3b-qat")
model = AutoModelForImageTextToText.from_pretrained("typhoon-ai/typhoon-ocr1.5-3b-qat")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use typhoon-ai/typhoon-ocr1.5-3b-qat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "typhoon-ai/typhoon-ocr1.5-3b-qat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "typhoon-ai/typhoon-ocr1.5-3b-qat",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/typhoon-ai/typhoon-ocr1.5-3b-qat

SGLang

How to use typhoon-ai/typhoon-ocr1.5-3b-qat with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "typhoon-ai/typhoon-ocr1.5-3b-qat" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "typhoon-ai/typhoon-ocr1.5-3b-qat",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "typhoon-ai/typhoon-ocr1.5-3b-qat" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "typhoon-ai/typhoon-ocr1.5-3b-qat",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use typhoon-ai/typhoon-ocr1.5-3b-qat with Docker Model Runner:
```
docker model run hf.co/typhoon-ai/typhoon-ocr1.5-3b-qat
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Typhoon-OCR-1.5-3B-QAT

A quantization-aware trained (QAT) version of Typhoon OCR v1.5, designed for robust and efficient on-device vision-language OCR in English and Thai.
This release maintains strong accuracy while significantly improving performance when running under low-bit quantization (e.g., 4-bit), making it ideal for lightweight environments.

This model is released in bfloat16 and is intended to be used as the pre-quantization base before converting to low-bit formats.
For the 4-bit model, please use the Ollama build here:
https://ollama.com/scb10x/typhoon-ocr1.5-3b

QAT is applied on top of Qwen2.5-VL-3B, enabling improved stability and reduced degradation when deployed below 16-bit precision.

4-bit Ollama version: https://ollama.com/scb10x/typhoon-ocr1.5-3b
Base FP16 model: https://huggingface.co/scb10x/typhoon-ocr1.5-2b

Try our demo available on Demo

Code / Examples available on Github

Release Blog available on OpenTyphoon Blog

Highlights

Quantization-Aware Training (QAT): Maintains strong OCR accuracy even under aggressive quantization.
Optimized for On-Device Inference: Faster and more consistent performance on low-resource hardware.
Enhanced Handwriting & Form Parsing: Retains the v1.5 improvements in handling handwritten notes, forms, irregular layouts, and structured documents.
Supports Text-Rich & Image-Rich Documents: Effective on tables, diagrams, annotated pages, charts, receipts, and dense reports.
Thai + English Multilingual OCR: Trained for reliable extraction across bilingual real-world documents.

Intended Use

This is a task-specific OCR model and is intended to be used only with the provided prompt format.
It does not include general VQA or safety guardrails.
Some hallucination may still occur, and users should validate outputs for production scenarios.

Quick Links

Demo: https://ocr.opentyphoon.ai
Code / Examples: https://github.com/scb-10x/typhoon-ocr
Release Blog: https://opentyphoon.ai/blog/en/typhoon-ocr-release

Prompting

prompt = """Extract all text from the image.

Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.

Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap any clearly defined visual areas (e.g. charts, diagrams, pictures) in:

<figure>
Describe the image's main elements (people, objects, text), note any contextual clues (place, event, culture), mention visible text and its meaning, provide deeper analysis when relevant (especially for financial charts, graphs, or documents), comment on style or architecture if relevant, then give a concise overall summary. Describe in Thai.
</figure>

- Page Numbers: Wrap page numbers in <page_number>...</page_number> (e.g., <page_number>14</page_number>).
- Checkboxes: Use ☐ for unchecked and ☑ for checked boxes."""

Quickstart (Ollama)

ollama run scb10x/typhoon-ocr1.5-3b

Support & Community

Twitter: https://twitter.com/opentyphoon
Discord: https://discord.gg/us5gAYmrxw

Citation

If you use Typhoon OCR or Typhoon models, please cite:

@misc{typhoon2,
  title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models},
  author={Kunat Pipatanakul et al.},
  year={2024},
  eprint={2412.13702},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

@misc{nonesung2025thaiocrbench,
  title={ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai},
  author={Surapon Nonesung et al.},
  year={2025},
  eprint={2511.04479},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Downloads last month: 214

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for typhoon-ai/typhoon-ocr1.5-3b-qat

Quantizations

2 models

Papers for typhoon-ai/typhoon-ocr1.5-3b-qat

ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai

Paper • 2511.04479 • Published Nov 6, 2025 • 1

Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models

Paper • 2412.13702 • Published Dec 18, 2024 • 2