Instructions to use google/gemma-3n-E4B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-3n-E4B-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/gemma-3n-E4B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("google/gemma-3n-E4B-it")
model = AutoModelForImageTextToText.from_pretrained("google/gemma-3n-E4B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/gemma-3n-E4B-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-3n-E4B-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3n-E4B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-3n-E4B-it

SGLang

How to use google/gemma-3n-E4B-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-3n-E4B-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3n-E4B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-3n-E4B-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3n-E4B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/gemma-3n-E4B-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-3n-E4B-it
```

tgi fails saying - upgrade tranformers version

by Tollring - opened Jun 27, 2025

Discussion

Tollring

Jun 27, 2025

tried to run this on huggingface hosted TGI. it fails with error - upgrade tranformers.

do I need to copy the repo and then add requirements.txt file with transformers version.
or you are going to fix it?

BalakrishnaCh

Google org Jun 27, 2025

Hi @Tollring ,

Welcome to Google Gemma family of open source model, if you would like run the model by downloading into you local you have to upgraded the latest version of the transformers by running the ! pip install -U transformers. The newly released Gemma models doesn't support older version of the transformers.

Please try and let me know if any additional assistance is required.

Thanks.

Ethan-pooh

Jun 27, 2025

env: transformers==4.54.0.dev0

error:

RuntimeError Traceback (most recent call last)
Cell In[11], line 10
7 # model_id = "/data/bangguo/fastvla/google/gemma-3n-E4B-it"
8 model_id = "google/gemma-3n-e4b-it"
---> 10 model = Gemma3nForConditionalGeneration.from_pretrained(model_id, device_map="cuda", torch_dtype=torch.bfloat16,).eval()
12 processor = AutoProcessor.from_pretrained(model_id)

File ~/miniconda3/envs/fastvla-gemma/lib/python3.10/site-packages/transformers/modeling_utils.py:311, in restore_default_torch_dtype.._wrapper(*args, **kwargs)
309 old_dtype = torch.get_default_dtype()
310 try:
--> 311 return func(*args, **kwargs)
312 finally:
313 torch.set_default_dtype(old_dtype)

File ~/miniconda3/envs/fastvla-gemma/lib/python3.10/site-packages/transformers/modeling_utils.py:4760, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, weights_only, *model_args, **kwargs)
4752 config = cls._autoset_attn_implementation(
4753 config,
4754 torch_dtype=torch_dtype,
4755 device_map=device_map,
4756 )
4758 with ContextManagers(model_init_context):
4759 # Let's make sure we don't run the init function of buffer modules
-> 4760 model = cls(config, *model_args, **model_kwargs)
4762 # Make sure to tie the weights correctly
4763 model.tie_weights()

File ~/miniconda3/envs/fastvla-gemma/lib/python3.10/site-packages/transformers/models/gemma3n/modeling_gemma3n.py:2196, in Gemma3nForConditionalGeneration.init(self, config)
2194 def init(self, config: Gemma3nConfig):
2195 super().init(config)
-> 2196 self.model = Gemma3nModel(config)
2197 self.lm_head = nn.Linear(config.text_config.hidden_size, config.text_config.vocab_size, bias=False)
2198 self.post_init()

File ~/miniconda3/envs/fastvla-gemma/lib/python3.10/site-packages/transformers/models/gemma3n/modeling_gemma3n.py:1948, in Gemma3nModel.init(self, config)
1946 def init(self, config: Gemma3nConfig):
1947 super().init(config)
-> 1948 self.vision_tower = AutoModel.from_config(config=config.vision_config)
1949 self.vocab_size = config.text_config.vocab_size
1951 language_model = AutoModel.from_config(config=config.text_config)

File ~/miniconda3/envs/fastvla-gemma/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:456, in _BaseAutoModelClass.from_config(cls, config, **kwargs)
454 elif type(config) in cls._model_mapping.keys():
455 model_class = _get_model_class(config, cls._model_mapping)
--> 456 return model_class._from_config(config, **kwargs)
458 raise ValueError(
459 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
460 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
461 )

File ~/miniconda3/envs/fastvla-gemma/lib/python3.10/site-packages/transformers/modeling_utils.py:2208, in PreTrainedModel._from_config(cls, config, **kwargs)
2205 model = cls(config, **kwargs)
2207 else:
-> 2208 model = cls(config, **kwargs)
2210 # restore default dtype if it was modified
2211 if dtype_orig is not None:

File ~/miniconda3/envs/fastvla-gemma/lib/python3.10/site-packages/transformers/models/timm_wrapper/modeling_timm_wrapper.py:120, in TimmWrapperModel.init(self, config)
118 # using num_classes=0 to avoid creating classification head
119 extra_init_kwargs = config.model_args or {}
--> 120 self.timm_model = timm.create_model(config.architecture, pretrained=False, num_classes=0, **extra_init_kwargs)
121 self.post_init()

File ~/miniconda3/envs/fastvla-gemma/lib/python3.10/site-packages/timm/models/_factory.py:122, in create_model(model_name, pretrained, pretrained_cfg, pretrained_cfg_overlay, checkpoint_path, cache_dir, scriptable, exportable, no_jit, **kwargs)
119 pretrained_cfg = pretrained_tag
121 if not is_model(model_name):
--> 122 raise RuntimeError('Unknown model (%s)' % model_name)
124 create_fn = model_entrypoint(model_name)
125 with set_layer_config(scriptable=scriptable, exportable=exportable, no_jit=no_jit):

RuntimeError: Unknown model (mobilenetv5_300m_enc)


It seems that the code of vision encoder  mobilenetv5_300m_enc is not uploaded to transformers?

Daviduche03

Jun 28, 2025

#uninstall and update timm

!pip uninstall -y timm
!pip install timm --upgrade

BalakrishnaCh

Google org Sep 1, 2025

Hi @Ethan-pooh ,

Please upgrade the timm to the latest version that should resolve the mobilenetv5_300m_enc issue. Please let me know if further assistance required.

!pip uninstall -y timm
!pip install -U timm

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment