Add vLLM offline inference example for Qwen3.5 -- Added in README.md

#73

This PR adds a Python-based offline inference example using vLLM to the Qwen3.5 model card.

Motivation

The current vLLM section only documents API-based serving via CLI. However, many users rely on programmatic (offline) inference for local experimentation, pipelines, and research workflows. This addition improves usability by covering that use case.

Changes

  • Added a new subsection: "Offline Inference (Python)" under the vLLM section
  • Included:
    • Installation requirements (vllm, transformers, qwen-vl-utils)
    • End-to-end example for:
      • Text-only inference
      • Multimodal (image) inference
  • Demonstrates usage of:
    • LLM and SamplingParams from vLLM
    • AutoProcessor from Transformers
    • process_vision_info for multimodal inputs

Notes

  • The example uses VLLM_WORKER_MULTIPROC_METHOD=spawn for compatibility
  • Keeps consistency with existing Qwen3.5 + vLLM multimodal processing pipeline

This makes the model card more complete by supporting both serving and offline inference workflows.

hrithiksagar-bgen changed pull request title from Add vLLM offline inference example for Qwen3.5-VL -- Added in README.md to Add vLLM offline inference example for Qwen3.5 -- Added in README.md
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment