Instructions to use tiiuae/Falcon3-Mamba-7B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiiuae/Falcon3-Mamba-7B-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tiiuae/Falcon3-Mamba-7B-Base")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-Mamba-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-Mamba-7B-Base") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tiiuae/Falcon3-Mamba-7B-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tiiuae/Falcon3-Mamba-7B-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/Falcon3-Mamba-7B-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tiiuae/Falcon3-Mamba-7B-Base
- SGLang
How to use tiiuae/Falcon3-Mamba-7B-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tiiuae/Falcon3-Mamba-7B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/Falcon3-Mamba-7B-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tiiuae/Falcon3-Mamba-7B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/Falcon3-Mamba-7B-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tiiuae/Falcon3-Mamba-7B-Base with Docker Model Runner:
docker model run hf.co/tiiuae/Falcon3-Mamba-7B-Base
| --- | |
| language: | |
| - en | |
| tags: | |
| - falcon3 | |
| - falcon3_mamba | |
| - falcon_mamba | |
| --- | |
| # Falcon3-Mamba-7B-Base | |
| **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B. | |
| This repository contains the **Falcon3-Mamba-7B**. It achieves, compared to similar SSM-based models of the same size, state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks. | |
| Falcon3-Mamba-7B-Base supports a context length up to 32K and was mainly trained on english corpus. | |
| ## Model Details | |
| - Architecture (same as [Falcon-Mamba-7b](https://huggingface.co/tiiuae/falcon-mamba-7b)) | |
| - Mamba1 based causal decoder only architecture trained on a causal language modeling task (i.e., predict the next token). | |
| - 64 decoder blocks | |
| - width: 4096 | |
| - state dimension: 16 | |
| - 32k context length | |
| - 65k vocab size | |
| - Continue Pretrained from Falcon Mamba 7B, with another 1500 Gigatokens of data comprising of web, code, STEM and high quality data. | |
| - Postrained on 1.2 million samples of STEM, conversations, code, and safety. | |
| - Developed by [Technology Innovation Institute](https://www.tii.ae) | |
| - License: TII Falcon-LLM License 2.0 | |
| - Model Release Date: December 2024 | |
| ## Getting started | |
| <details> | |
| <summary> Click to expand </summary> | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "tiiuae/Falcon3-Mamba-7B-Base" | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype="auto", | |
| device_map="auto" | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| prompt = "How many hours in one day?" | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."}, | |
| {"role": "user", "content": prompt} | |
| ] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True | |
| ) | |
| model_inputs = tokenizer([text], return_tensors="pt").to(model.device) | |
| generated_ids = model.generate( | |
| **model_inputs, | |
| max_new_tokens=1024 | |
| ) | |
| generated_ids = [ | |
| output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) | |
| ] | |
| response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| print(response) | |
| ``` | |
| </details> | |
| <br> | |
| # Benchmarks | |
| We report in the following table our internal pipeline benchmarks: | |
| <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;"> | |
| <colgroup> | |
| <col style="width: 10%;"> | |
| <col style="width: 10%;"> | |
| <col style="width: 7%;"> | |
| <col style="width: 7%;"> | |
| <col style="width: 7%;"> | |
| <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;"> | |
| </colgroup> | |
| <thead> | |
| <tr> | |
| <th>Category</th> | |
| <th>Benchmark</th> | |
| <th>Zamba2-7B</th> | |
| <th>Llama-3.1-8B</th> | |
| <th>Falcon-Mamba-7B</th> | |
| <th>Falcon3-Mamba-7B-Base</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr> | |
| <td rowspan="3">General</td> | |
| <td>MMLU (5-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td>MMLU-PRO (5-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td rowspan="2">Math</td> | |
| <td>GSM8K (5-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td>MATH Lvl-5 (4-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td rowspan="4">Reasoning</td> | |
| <td>Arc Challenge (25-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td>GPQA (0-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td>MUSR (0-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td>BBH (3-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td rowspan="4">CommonSense Understanding</td> | |
| <td>PIQA (0-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td>SciQ (0-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td>Winogrande (0-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| <tr> | |
| <td>OpenbookQA (0-shot)</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| <td>-</td> | |
| </tr> | |
| </tbody> | |
| </table> | |
| # Citation | |
| If Falcon3 family were helpful to your work, feel free to give us a cite. | |
| ``` | |
| @misc{Falcon3, | |
| title = {The Falcon 3 family of Open Models}, | |
| author = {TII Team}, | |
| month = {December}, | |
| year = {2024} | |
| } | |
| ``` | |
| ``` | |
| @article{zuo2024falcon, | |
| title={Falcon mamba: The first competitive attention-free 7b language model}, | |
| author={Zuo, Jingwei and Velikanov, Maksim and Rhaiem, Dhia Eddine and Chahed, Ilyas and Belkada, Younes and Kunsch, Guillaume and Hacid, Hakim}, | |
| journal={arXiv preprint arXiv:2410.05355}, | |
| year={2024} | |
| } | |
| ``` |