Instructions to use microsoft/Florence-2-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use microsoft/Florence-2-large with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="microsoft/Florence-2-large", trust_remote_code=True)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained("microsoft/Florence-2-large", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use microsoft/Florence-2-large with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "microsoft/Florence-2-large"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Florence-2-large",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/microsoft/Florence-2-large

SGLang

How to use microsoft/Florence-2-large with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "microsoft/Florence-2-large" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Florence-2-large",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "microsoft/Florence-2-large" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Florence-2-large",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use microsoft/Florence-2-large with Docker Model Runner:
```
docker model run hf.co/microsoft/Florence-2-large
```

Unexpected Leading sos/eos Tokens at Start of Generation

#105

by E1eMental - opened May 28, 2025

Discussion

E1eMental

May 28, 2025

•

edited May 28, 2025

Issue Description

When testing florence-2-large, I observed unexpected token generation behavior. Specifically:

Decoding starts by sending decoder_start_token_id=2 (which is also the eos_token_id).
The model then generates three tokens with index 0 (sos_token) before outputting any valuable tokens.
Notably, sos_token_id=0, and eos_token_id=2, but generation is started with the eos_token_id rather than the sos_token_id.

This leads to two questions:

Why does generation begin with the eos_token_id instead of the sos_token_id?
Why are multiple leading tokens with index 0 (the sos_token_id) generated before the actual output?

Finetuning & Label Construction

While finetuning Florence-2 on a custom task (using https://huggingface.co/blog/finetune-florence2), I discovered the following during debugging:

The target label sequences during training contain a leading 0 (sos_token_id) at the start: [0, valuable_tokens, 2].
As a result, the model is forced to generate a "junk" 0 token at the start of each sequence, and the loss is calculated on this token as well.
This might explain why the model always generates an sos (0) tokens at the beginning during inference.

Questions & Request

Why is token 2 (eos_token_id) used as the start of the sequence for decoding, rather than 0 (sos_token_id)?
Is my assumption about the leading 0 token during training correct?
If so, could you retrain the model (or release a new checkpoint) without this issue, so that generation is not forced to start with a junk token?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment