Instructions to use tiiuae/falcon-40b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tiiuae/falcon-40b-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tiiuae/falcon-40b-instruct", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b-instruct", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use tiiuae/falcon-40b-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tiiuae/falcon-40b-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tiiuae/falcon-40b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/tiiuae/falcon-40b-instruct

SGLang

How to use tiiuae/falcon-40b-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tiiuae/falcon-40b-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tiiuae/falcon-40b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tiiuae/falcon-40b-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tiiuae/falcon-40b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use tiiuae/falcon-40b-instruct with Docker Model Runner:
```
docker model run hf.co/tiiuae/falcon-40b-instruct
```

FalconLLM commited on May 31, 2023

Commit

7205e7e

1 Parent(s): 8fac8c1

Update license information to Apache 2.0

Browse files

Files changed (1) hide show

README.md +3 -7

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ license: apache-2.0
 # ✨ Falcon-40B-Instruct
-**Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by [TII](https://www.tii.ae) based on [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) and finetuned on a mixture of [Baize](https://github.com/project-baize/baize-chatbot). It is made available under the [TII Falcon LLM License](https://huggingface.co/tiiuae/falcon-40b-instruct/blob/main/LICENSE.txt).**
 *Paper coming soon 😊.*
@@ -63,7 +63,7 @@ for seq in sequences:
 - **Developed by:** [https://www.tii.ae](https://www.tii.ae);
 - **Model type:** Causal decoder-only;
 - **Language(s) (NLP):** English and French;
-- **License:** [TII Falcon LLM License](https://huggingface.co/tiiuae/falcon-7b-instruct/blob/main/LICENSE.txt);
 - **Finetuned from model:** [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
 ### Model Source
@@ -178,11 +178,7 @@ Falcon-40B-Instruct was trained a custom distributed training codebase, Gigatron
 ## License
-Falcon-40B-Instruct is made available under the [TII Falcon LLM License](https://huggingface.co/tiiuae/falcon-40b-instruct/blob/main/LICENSE.txt). Broadly speaking,
-* You can freely use our models for research and/or personal purpose;
-* You are allowed to share and build derivatives of these models, but you are required to give attribution and to share-alike with the same license;
-* For commercial use, you are exempt from royalties payment if the attributable revenues are inferior to $1M/year, otherwise you should enter in a commercial agreement with TII.
 ## Contact
 falconllm@tii.ae

 # ✨ Falcon-40B-Instruct
+**Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by [TII](https://www.tii.ae) based on [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) and finetuned on a mixture of [Baize](https://github.com/project-baize/baize-chatbot). It is made available under the Apache 2.0 license.**
 *Paper coming soon 😊.*
 - **Developed by:** [https://www.tii.ae](https://www.tii.ae);
 - **Model type:** Causal decoder-only;
 - **Language(s) (NLP):** English and French;
+- **License:** Apache 2.0;
 - **Finetuned from model:** [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
 ### Model Source
 ## License
+Falcon-40B-Instruct is made available under the Apache 2.0 license.
 ## Contact
 falconllm@tii.ae