Instructions to use meta-llama/Meta-Llama-3-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use meta-llama/Meta-Llama-3-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use meta-llama/Meta-Llama-3-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "meta-llama/Meta-Llama-3-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Meta-Llama-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/meta-llama/Meta-Llama-3-8B

SGLang

How to use meta-llama/Meta-Llama-3-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "meta-llama/Meta-Llama-3-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Meta-Llama-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "meta-llama/Meta-Llama-3-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Meta-Llama-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use meta-llama/Meta-Llama-3-8B with Docker Model Runner:
```
docker model run hf.co/meta-llama/Meta-Llama-3-8B
```

Here's how to fine-tune llama-3 8b. ♾️

#37

by Ateeqq - opened Apr 19, 2024

Discussion

Ateeqq

Apr 19, 2024

Here's a theoretical explanation of how to Fine-Tune Llama-3 8B:

For Practical Tutorial - Check: https://exnrt.com/blog/ai/finetune-llama3-8b/

1. Preparation:

Data Acquisition:
- Identify your specific task for fine-tuning.
- Gather a high-quality dataset relevant to your task. This dataset should be large enough and well-structured for effective training.
Environment Setup:
- Install necessary libraries like transformers, datasets, and potentially unsloth for integration with Llama-3.
- Ensure you have access to a powerful computing environment with GPUs for faster training.

2. Model Selection and Preprocessing:

Choose the Model:
- Select the Llama-3 8B model from the Hugging Face Hub or a similar repository.
- Consider using the 4-bit version (load_in_4bit=True) for memory efficiency if supported by your hardware.
Data Preprocessing:
- Preprocess your dataset according to the model's requirements. This might involve cleaning, tokenizing, and formatting the data appropriately.

3. Fine-tuning Process:

Define Training Arguments:
- Set hyperparameters like learning rate, batch size, and number of training epochs using TrainingArguments from transformers.
Fine-tuning Technique:
- Choose a fine-tuning technique:
  - Supervised Fine-tuning (SFT): Train the model on your dataset using labeled examples where the desired outputs are provided. This is a common approach for tasks like text classification or question answering.
  - Reinforcement Learning with Human Feedback (RLHF): Provide human feedback to guide the model's learning process. This can be helpful for tasks where defining clear labels is difficult.
Training Loop:
- Implement a training loop that feeds your preprocessed data to the model and optimizes its parameters based on the chosen fine-tuning technique. Utilize libraries like SFTTrainer for streamlined training.

4. Evaluation and Refinement:

Evaluate Performance:
- After training, assess the model's performance on a separate validation dataset relevant to your task. Metrics used for evaluation will depend on the specific task (e.g., accuracy for classification, BLEU score for machine translation).
Refine the Model:
- Analyze the evaluation results. If performance is unsatisfactory, consider:
  - Adjusting hyperparameters.
  - Collecting more data.
  - Trying a different fine-tuning technique.

5. Deployment:

Once satisfied with the model's performance, you can deploy it for real-world use in your application. This might involve integrating it into a web service or mobile app.

Additional Considerations:

Computational Resources: Fine-tuning large models like Llama-3 8B can be computationally expensive. Ensure you have access to sufficient resources (GPUs, memory) for training.
Data Quality: The quality and relevance of your dataset significantly impact the fine-tuning outcome. Focus on gathering high-quality data specific to your task.
Ethical Considerations: Be mindful of potential biases in your data and the model's outputs. Consider implementing safeguards to mitigate bias and ensure responsible use of the fine-tuned model.

Hwer

Apr 19, 2024

Thanks, didn't expect to randomly learn this looking at the community posts.

Ateeqq changed discussion title from Here's how to fine-tune. to Here's how to fine-tune llama-3 8b. ♾️ Apr 21, 2024

teddyyyy123

Apr 23, 2024

What happened with the plain HF transformers.Trainer() API ? all I see now is with the TRL library.

Ateeqq

Apr 23, 2024

•

edited Apr 23, 2024

What happened with the plain HF transformers.Trainer() API ? all I see now is with the TRL library.

I am currently working on it (with the health dataset) but facing a CUDA error. Hopefully, it will be resolved soon.

Here's the Colab Notebook: https://colab.research.google.com/drive/1TUa9J2J_1Sj-G7mQHX45fKzZtnW3s1vj?usp=sharing

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

IndrasMirror

Apr 25, 2024

Anyone know what structure I should be making my dataset to best finetune the model?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment