Instructions to use google/gemma-3-270m-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-3-270m-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-3-270m-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-270m-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/gemma-3-270m-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-3-270m-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-270m-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-3-270m-it

SGLang

How to use google/gemma-3-270m-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-3-270m-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-270m-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-3-270m-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-270m-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use google/gemma-3-270m-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-3-270m-it
```

Trouble Reproducing gemma-3-270m-it IFEval Score

by fongya - opened Aug 16, 2025

Discussion

fongya

Aug 16, 2025

I'm trying to verify my setup by reproducing the IFEval benchmark score for gemma-3-270m-it. The official score is 51.2%, but my accuracy is only between 20-27% (run multiple times).

I am using the following settings:

temperature=1.0
top_p=0.95
top_k=64
min_p=0.0

Am I missing something? I suspect there's a misconfiguration somewhere in my setup.

beyoung

Aug 16, 2025

•

edited Aug 16, 2025

+1. cannot reproduce IFEval score too, my evaluation results are among ~26%.

yousef1727

Aug 16, 2025

•

edited Aug 16, 2025

I'm working with temperature=0.2 and it's better

fongya

Aug 16, 2025

With

temperature=0.2
top_p=0.95
top_k=64
min_p=0.0

It got 27.9% on IFEval. It's a slight improvement, but there's still a gap to the 51.2%.

yousef1727

Aug 17, 2025

•

edited Aug 17, 2025

I honestly don’t know—maybe try 0.0 or 0.1 😅. Good luck.

Try this?

temperature = 0.1 // less random token picking
top_p = 0.95  
top_k = 64  
min_p = 0.25 // lower minimum probability

I’m really curious about IFEval score

By the way, I’m using llama.cpp. I forgot to mention that last time.

codelion

Aug 18, 2025

•

edited Aug 18, 2025

I am also having trouble replicating the results reported. I am using the standard lm_eval harness. I get the following results, and the biggest gap is in IFEval (with inst_level_loose_acc metric).

Gemma 3 270M IT - Actual Results vs Google's Reported Baseline

Benchmark	n-shot	Actual Results	Google Reported	Delta	Match Status
HellaSwag	0-shot	33.5%	37.7%	-4.2%	❌ Lower
PIQA	0-shot	65.6%	66.2%	-0.6%	✅ Close
ARC-c	0-shot	24.5%	28.2%	-3.7%	❌ Lower
WinoGrande	0-shot	53.2%	52.3%	+0.9%	✅ Close
BIG-Bench Hard	3-shot	26.8%	26.7%	+0.1%	✅ Match
IFEval (inst_level)	0-shot	37.7%	51.2%	-13.5%	⚠️ Gap

wy11ing

Aug 27, 2025

I am also facing this touble too! I guess I didn't use the same command as the official evaluation did.
This is the command I used,

lm_eval --model hf --model_args pretrained=google/gemma-3-270m-it --tasks leaderboard_ifeval --device cuda:0 --use_cache ./eval_cache/google_gemma-3-270m-it --apply_chat_template --fewshot_as_multiturn --batch_size auto --log_samples --output_path ./eval_out/ --trust_remote_code

In fact, I have trouble reproducing not only this model but also others like gpt-oss-20B and I've checked Github community about IFEval, some said batch_size may also somehow affect the result...

codelion

Aug 27, 2025

The batch_size cannot explain such a large gap, usually we should be able to reproduce within 3-5% of the given results. The surprising thing is it is only happening on this benchmark?

Renu11

Google org Aug 28, 2025

Thanks for the heads-up. We have forwarded your feedback about the gemma-3-270m-it IFEval score discrepancy to the engineering team for a full investigation. We appreciate you bringing this to our attention.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment