Instructions to use google/gemma-3-4b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-4b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/gemma-3-4b-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-3-4b-it") model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3-4b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-4b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/google/gemma-3-4b-it
- SGLang
How to use google/gemma-3-4b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use google/gemma-3-4b-it with Docker Model Runner:
docker model run hf.co/google/gemma-3-4b-it
Update README.md
#89 opened 23 days ago
by
Mohamed0mer
I built a unified wrapper for llmcompressor, llama.cpp & coremltools. Looking for LLM users to help me break it!
1
#88 opened 2 months ago
by
kinderasteroid
Finetuning Code Link In Native PyTorch
1
#87 opened 2 months ago
by
dutta18
anis
#86 opened 2 months ago
by
Aniskhan29
anis
#85 opened 2 months ago
by
Aniskhan29
📋 Documentation Enhancement Suggestion
#84 opened 3 months ago
by
CroviaTrust
📋 Documentation Enhancement Suggestion
#83 opened 3 months ago
by
CroviaTrust
📋 Documentation Enhancement Suggestion
2
#82 opened 3 months ago
by
CroviaTrust
Request: DOI
1
#80 opened 5 months ago
by
Salehsst
Add Artificial Analysis evaluations for gemma-3-4b
#79 opened 5 months ago
by
davidlms
Add model-index with comprehensive benchmark evaluations
#78 opened 5 months ago
by
davidlms
Token Count Calculation in SFT Data Distribution Curation
2
#76 opened 6 months ago
by
tcy006
LoRA/PEFT can’t target nn.Parameter vision→text projection in Gemma-3 VLM — why use nn.Parameter instead of nn.Linear(bias=False)?
1
#74 opened 7 months ago
by
alexanderyj
4bit Quantization Failure for gemma 4b
5
#73 opened 7 months ago
by
shabha7092
Gemma full
#72 opened 8 months ago
by
RamarajuK
Inference Issue with Multi GPU's
1
#70 opened 9 months ago
by
m-nameer
<pad> <pad> <pad> fix
#69 opened 9 months ago
by
test53100
processor does not have a chat template
1
#68 opened 9 months ago
by
Prabhjot410
onnx model
1
#67 opened 9 months ago
by
Gerald001
Parameters are contradictory use_cache=False
1
#66 opened 9 months ago
by
lemon0703
Re: failing structured output generation
2
#63 opened 9 months ago
by
msi-sbraun-11
Cannot recreate benchmark results
➕ 1
3
#61 opened 10 months ago
by
alternis
The example code provided (without modifications) produces gibberish
3
#60 opened 10 months ago
by
zqQH
Update README.md
#57 opened 11 months ago
by
dj1507
Missing values in 4B, 12B config.json
2
#56 opened 11 months ago
by
depasquale
Update preprocessor_config.json
#54 opened 11 months ago
by
nm-research
use model run failed
1
#53 opened 11 months ago
by
pb-1
Bug report: torch._dynamo.exc.Unsupported: Unexpected type in sourceless builder transformers.models.gemma3.configuration_gemma3.Gemma3TextConfig
8
#51 opened 12 months ago
by
YovelRVeritex
Update config.json
#50 opened 12 months ago
by
olegshulyakov
Update README.md
#49 opened about 1 year ago
by
VANNVISAL
Error in loading Model
2
#46 opened about 1 year ago
by
sans07
gemma3-1b-it
#45 opened about 1 year ago
by
TAENNY
Explicitly include num_attention_heads and num_key_value_heads in text_config for Gemma3 4B
1
#44 opened about 1 year ago
by
ebsmothers
Deployment issues on sagemaker
2
#43 opened about 1 year ago
by
tychen6677
Please open-source Gemini 1.5 Flash
👍 3
4
#42 opened about 1 year ago
by
drguolai
CUDA error: device-side assert triggered
2
#41 opened about 1 year ago
by
ArshiaSoori
Low GPU Utilization during inference?
1
#39 opened about 1 year ago
by
BagelBig
Object detection capabilities ?
2
#38 opened about 1 year ago
by
syrineM
SigLIP or SigLIP2 encoder?
6
#37 opened about 1 year ago
by
orrzohar
text-generation
1
#36 opened about 1 year ago
by
rakmik
I'm releasing the speech version of Gemma-3!
❤️👍 6
8
#35 opened about 1 year ago
by
junnei
Update README.md
#34 opened about 1 year ago
by
sachin7yadava
Issue with vLLM Deployment of gemma-3-4b-it on Tesla T4 - No Output
4
#33 opened about 1 year ago
by
twodaix
Batch processing on a GPU?
4
#32 opened about 1 year ago
by
buckeye17-bah
Please release also Gemini Flash 1.5 weights.
2
#31 opened about 1 year ago
by
ZeroWw
VRAM not freed during long generations (Gemma, max_new_tokens=3000)
4
#29 opened about 1 year ago
by
Nessit