Pheye
Collection
A family of efficient small vision-language models • 4 items • Updated
How to use miguelcarv/Pheye-x2-672 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="miguelcarv/Pheye-x2-672") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("miguelcarv/Pheye-x2-672", dtype="auto")How to use miguelcarv/Pheye-x2-672 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "miguelcarv/Pheye-x2-672"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "miguelcarv/Pheye-x2-672",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/miguelcarv/Pheye-x2-672
How to use miguelcarv/Pheye-x2-672 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "miguelcarv/Pheye-x2-672" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "miguelcarv/Pheye-x2-672",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "miguelcarv/Pheye-x2-672" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "miguelcarv/Pheye-x2-672",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use miguelcarv/Pheye-x2-672 with Docker Model Runner:
docker model run hf.co/miguelcarv/Pheye-x2-672
| Model | Resolution | Trainable Params | Data | VQAv2 | Nocaps | TextVQA | TextCaps |
|---|---|---|---|---|---|---|---|
| MoE-LLaVA-1.6B×4 | 336 | 2.9B | 5.9M | 76.0 | - | 47.8 | - |
| MoE-LLaVA-2.7B×4 | 336 | 5.3B | 5.9M | 77.1 | - | 50.2 | - |
| moondream1 | 384 | 1.86B | 3.9M | 74.7 | - | 35.6 | |
| moondream2 | 384 | 1.86B | - | 77.7 | 92.5 | 49.7 | 120.2 |
| Pheye-x4 🤗 | 448 | 295M | 2.9M | 75.2 | 110.1 | 45.9 | 106.4 |
| Pheye-x4 🤗 | 672 | 295M | 2.9M | 75.5 | 110.8 | 49.2 | 111.9 |
| Pheye-x2 🤗 | 448 | 578M | 2.9M | 76.0 | 111.8 | 47.3 | 108.9 |
| Pheye-x2 🤗 | 672 | 578M | 2.9M | 76.4 | 110.5 | 50.5 | 115.9 |
| Image | Example |
|---|---|
![]() |
How much do these popcorn packets weigh? These popcorn packets weigh 90g each. What brand are these popcorn packets? These popcorn packets are from the brand "Magic Pop". |
![]() |
Can I pet that dog? No, you cannot pet the dog in the image. It is a black bear, not a dog. |
![]() |
Why is the person wearing protective gear? The person in the image is wearing protective gear, such as a beekeeper's suit and a helmet, to ensure their safety while working with bees. Beekeeping involves handling bees and their hives, which can be potentially dangerous due to the risk of bee stings. The protective gear helps to minimize the risk of bee stings and other bee-related injuries, allowing the beekeeper to carry out their work safely and effectively. |
* Generated by Pheye-x2-672
To generate a sample response from a prompt use generate.py.
Use a Python version >= 3.11. Start by cloning the repo and create a virtual environment with the necessary packages:
git clone https://github.com/miguelscarv/pheye.git
cd pheye
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
Then run generate.py:
python3 generate.py --image_path images/dog_flower.jpg --prompt "What is the dog holding in it's mouth?" --device cuda
This implementation was inspired by OpenFlamingo's repository.