Instructions to use google/gemma-3-270m-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-270m-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-3-270m-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it") model = AutoModelForCausalLM.from_pretrained("google/gemma-3-270m-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3-270m-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-270m-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-270m-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/google/gemma-3-270m-it
- SGLang
How to use google/gemma-3-270m-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-270m-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-270m-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-270m-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-270m-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use google/gemma-3-270m-it with Docker Model Runner:
docker model run hf.co/google/gemma-3-270m-it
Trouble Reproducing gemma-3-270m-it IFEval Score
I'm trying to verify my setup by reproducing the IFEval benchmark score for gemma-3-270m-it. The official score is 51.2%, but my accuracy is only between 20-27% (run multiple times).
I am using the following settings:
temperature=1.0top_p=0.95top_k=64min_p=0.0
Am I missing something? I suspect there's a misconfiguration somewhere in my setup.
+1. cannot reproduce IFEval score too, my evaluation results are among ~26%.
I'm working with temperature=0.2 and it's better
With
temperature=0.2top_p=0.95top_k=64min_p=0.0
It got 27.9% on IFEval. It's a slight improvement, but there's still a gap to the 51.2%.
I honestly don’t know—maybe try 0.0 or 0.1 😅. Good luck.
Try this?
temperature = 0.1 // less random token picking
top_p = 0.95
top_k = 64
min_p = 0.25 // lower minimum probability
I’m really curious about IFEval score
By the way, I’m using llama.cpp. I forgot to mention that last time.
I am also having trouble replicating the results reported. I am using the standard lm_eval harness. I get the following results, and the biggest gap is in IFEval (with inst_level_loose_acc metric).
Gemma 3 270M IT - Actual Results vs Google's Reported Baseline
| Benchmark | n-shot | Actual Results | Google Reported | Delta | Match Status |
|---|---|---|---|---|---|
| HellaSwag | 0-shot | 33.5% | 37.7% | -4.2% | ❌ Lower |
| PIQA | 0-shot | 65.6% | 66.2% | -0.6% | ✅ Close |
| ARC-c | 0-shot | 24.5% | 28.2% | -3.7% | ❌ Lower |
| WinoGrande | 0-shot | 53.2% | 52.3% | +0.9% | ✅ Close |
| BIG-Bench Hard | 3-shot | 26.8% | 26.7% | +0.1% | ✅ Match |
| IFEval (inst_level) | 0-shot | 37.7% | 51.2% | -13.5% | ⚠️ Gap |
I am also facing this touble too! I guess I didn't use the same command as the official evaluation did.
This is the command I used,
lm_eval --model hf --model_args pretrained=google/gemma-3-270m-it --tasks leaderboard_ifeval --device cuda:0 --use_cache ./eval_cache/google_gemma-3-270m-it --apply_chat_template --fewshot_as_multiturn --batch_size auto --log_samples --output_path ./eval_out/ --trust_remote_code
In fact, I have trouble reproducing not only this model but also others like gpt-oss-20B and I've checked Github community about IFEval, some said batch_size may also somehow affect the result...
The batch_size cannot explain such a large gap, usually we should be able to reproduce within 3-5% of the given results. The surprising thing is it is only happening on this benchmark?
Thanks for the heads-up. We have forwarded your feedback about the gemma-3-270m-it IFEval score discrepancy to the engineering team for a full investigation. We appreciate you bringing this to our attention.