Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

110

Base only

Active filters: NVFP4

nvidia/Gemma-4-26B-A4B-NVFP4

Text Generation • 14B • Updated May 11 • 1.34M • 120

lightx2v/Wan2.2-NVFP4-Sparse

Updated 4 days ago • 4.46k • 41

nvidia/Gemma-4-31B-IT-NVFP4

Text Generation • 21B • Updated 9 days ago • 2.5M • • 539

michaelw9999/Qwen3.6-27B-NVFP4-MTP-GGUF

27B • Updated Jun 6 • 88.3k • 50

nvidia/MiniMax-M3-NVFP4

Text Generation • 247B • Updated 26 days ago • 404k • 64

s-batman/Agents-A1-NVFP4-MTP-GGUF

Image-Text-to-Text • 36B • Updated 21 days ago • 18.4k • 15

nvidia/Qwen3.5-122B-A10B-NVFP4

Text Generation • 65B • Updated Jun 2 • 212k • 43

nvidia/diffusiongemma-26B-A4B-it-NVFP4

Text Generation • 14B • Updated 18 days ago • 1.74M • 108

AxionML/Qwen3.5-9B-NVFP4

Image-Text-to-Text • 7B • Updated Mar 3 • 104k • 19

nvidia/MiniMax-M2.7-NVFP4

Text Generation • 116B • Updated Apr 24 • 66.2k • 65

nvidia/DeepSeek-V4-Flash-NVFP4

Text Generation • 167B • Updated Jun 15 • 1.23M • 72

FreedomAISVR/Devstral-Small-2-NVFP4-GGUF

Image-Text-to-Text • 24B • Updated Jun 16 • 309 • 1

nvidia/Mistral-Medium-3.5-128B-NVFP4

Text Generation • 84B • Updated 22 days ago • 24.8k • 28

LLMWildling/Nemotron-175b-A13b-Coder-NVFP4

Text Generation • 104B • Updated 10 days ago • 589 • 1

apolloparty/LFM2-350M-NVFP4A16

Text Generation • 0.2B • Updated Jul 12, 2025 • 13

apolloparty/LFM2-700M-NVFP4A16

Text Generation • 0.5B • Updated Jul 12, 2025 • 14

apolloparty/LFM2-1.2B-NVFP4A16

Text Generation • 0.7B • Updated Jul 12, 2025 • 22 • 1

MrVolts/Qwen3-30B-A3B-Thinking-2507-NVFP4

Text Generation • 17B • Updated Sep 15, 2025 • 21

nvidia/DeepSeek-V3.1-NVFP4

Text Generation • 394B • Updated Jan 13 • 45.5k • 15

nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4

Text Generation • Updated Feb 9 • 40.7k • 41

nvidia/Qwen3-Next-80B-A3B-Thinking-NVFP4

Text Generation • Updated Feb 9 • 2.24k • 64

lightx2v/Wan-NVFP4

Updated Dec 23, 2025 • 338 • 71

nvidia/DeepSeek-V3.2-NVFP4

Text Generation • 394B • Updated Jan 21 • 8.53k • 15

nvidia/Qwen3-235B-A22B-Thinking-2507-NVFP4

Text Generation • 120B • Updated Jan 30 • 1.36k • 8

nvidia/Qwen3-235B-A22B-Instruct-2507-NVFP4

Text Generation • 120B • Updated Jan 30 • 14.3k • 10

tokenlabsdotrun/Llama-3.1-8B-ModelOpt-NVFP4

5B • Updated Jan 15 • 32 • 2

tokenlabsdotrun/Llama-3.1-8B-ModelOpt-NVFP4-QAT

5B • Updated Jan 21 • 6

vincentzed-hf/Qwen3-Coder-Next-NVFP4

Text Generation • Updated Feb 16 • 407 • 7

surogate/Qwen3-Next-80B-A3B-Thinking-NVFP4

Text Generation • Updated Feb 11 • 9

surogate/Qwen3-Next-80B-A3B-Instruct-NVFP4

Text Generation • Updated Feb 11 • 10