Instructions to use typhoon-ai/typhoon-ocr1.5-3b-qat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use typhoon-ai/typhoon-ocr1.5-3b-qat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="typhoon-ai/typhoon-ocr1.5-3b-qat") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("typhoon-ai/typhoon-ocr1.5-3b-qat") model = AutoModelForImageTextToText.from_pretrained("typhoon-ai/typhoon-ocr1.5-3b-qat") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use typhoon-ai/typhoon-ocr1.5-3b-qat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "typhoon-ai/typhoon-ocr1.5-3b-qat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "typhoon-ai/typhoon-ocr1.5-3b-qat", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/typhoon-ai/typhoon-ocr1.5-3b-qat
- SGLang
How to use typhoon-ai/typhoon-ocr1.5-3b-qat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "typhoon-ai/typhoon-ocr1.5-3b-qat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "typhoon-ai/typhoon-ocr1.5-3b-qat", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "typhoon-ai/typhoon-ocr1.5-3b-qat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "typhoon-ai/typhoon-ocr1.5-3b-qat", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use typhoon-ai/typhoon-ocr1.5-3b-qat with Docker Model Runner:
docker model run hf.co/typhoon-ai/typhoon-ocr1.5-3b-qat
Typhoon-OCR-1.5-3B-QAT
A quantization-aware trained (QAT) version of Typhoon OCR v1.5, designed for robust and efficient on-device vision-language OCR in English and Thai.
This release maintains strong accuracy while significantly improving performance when running under low-bit quantization (e.g., 4-bit), making it ideal for lightweight environments.
This model is released in bfloat16 and is intended to be used as the pre-quantization base before converting to low-bit formats.
For the 4-bit model, please use the Ollama build here:
https://ollama.com/scb10x/typhoon-ocr1.5-3b
QAT is applied on top of Qwen2.5-VL-3B, enabling improved stability and reduced degradation when deployed below 16-bit precision.
4-bit Ollama version: https://ollama.com/scb10x/typhoon-ocr1.5-3b
Base FP16 model: https://huggingface.co/scb10x/typhoon-ocr1.5-2b
Try our demo available on Demo
Code / Examples available on Github
Release Blog available on OpenTyphoon Blog
Highlights
- Quantization-Aware Training (QAT): Maintains strong OCR accuracy even under aggressive quantization.
- Optimized for On-Device Inference: Faster and more consistent performance on low-resource hardware.
- Enhanced Handwriting & Form Parsing: Retains the v1.5 improvements in handling handwritten notes, forms, irregular layouts, and structured documents.
- Supports Text-Rich & Image-Rich Documents: Effective on tables, diagrams, annotated pages, charts, receipts, and dense reports.
- Thai + English Multilingual OCR: Trained for reliable extraction across bilingual real-world documents.
Intended Use
This is a task-specific OCR model and is intended to be used only with the provided prompt format.
It does not include general VQA or safety guardrails.
Some hallucination may still occur, and users should validate outputs for production scenarios.
Quick Links
- Demo: https://ocr.opentyphoon.ai
- Code / Examples: https://github.com/scb-10x/typhoon-ocr
- Release Blog: https://opentyphoon.ai/blog/en/typhoon-ocr-release
Prompting
prompt = """Extract all text from the image.
Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.
Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap any clearly defined visual areas (e.g. charts, diagrams, pictures) in:
<figure>
Describe the image's main elements (people, objects, text), note any contextual clues (place, event, culture), mention visible text and its meaning, provide deeper analysis when relevant (especially for financial charts, graphs, or documents), comment on style or architecture if relevant, then give a concise overall summary. Describe in Thai.
</figure>
- Page Numbers: Wrap page numbers in <page_number>...</page_number> (e.g., <page_number>14</page_number>).
- Checkboxes: Use ☐ for unchecked and ☑ for checked boxes."""
Quickstart (Ollama)
ollama run scb10x/typhoon-ocr1.5-3b
Support & Community
- Twitter: https://twitter.com/opentyphoon
- Discord: https://discord.gg/us5gAYmrxw
Citation
If you use Typhoon OCR or Typhoon models, please cite:
@misc{typhoon2,
title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models},
author={Kunat Pipatanakul et al.},
year={2024},
eprint={2412.13702},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{nonesung2025thaiocrbench,
title={ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai},
author={Surapon Nonesung et al.},
year={2025},
eprint={2511.04479},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 214