Instructions to use meta-llama/Meta-Llama-3-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use meta-llama/Meta-Llama-3-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B") model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use meta-llama/Meta-Llama-3-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "meta-llama/Meta-Llama-3-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Meta-Llama-3-8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/meta-llama/Meta-Llama-3-8B
- SGLang
How to use meta-llama/Meta-Llama-3-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "meta-llama/Meta-Llama-3-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Meta-Llama-3-8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "meta-llama/Meta-Llama-3-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Meta-Llama-3-8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use meta-llama/Meta-Llama-3-8B with Docker Model Runner:
docker model run hf.co/meta-llama/Meta-Llama-3-8B
Here's how to fine-tune llama-3 8b. ♾️
Here's a theoretical explanation of how to Fine-Tune Llama-3 8B:
For Practical Tutorial - Check: https://exnrt.com/blog/ai/finetune-llama3-8b/
1. Preparation:
- Data Acquisition:
- Identify your specific task for fine-tuning.
- Gather a high-quality dataset relevant to your task. This dataset should be large enough and well-structured for effective training.
- Environment Setup:
- Install necessary libraries like
transformers,datasets, and potentiallyunslothfor integration with Llama-3. - Ensure you have access to a powerful computing environment with GPUs for faster training.
- Install necessary libraries like
2. Model Selection and Preprocessing:
- Choose the Model:
- Select the Llama-3 8B model from the Hugging Face Hub or a similar repository.
- Consider using the 4-bit version (
load_in_4bit=True) for memory efficiency if supported by your hardware.
- Data Preprocessing:
- Preprocess your dataset according to the model's requirements. This might involve cleaning, tokenizing, and formatting the data appropriately.
3. Fine-tuning Process:
- Define Training Arguments:
- Set hyperparameters like learning rate, batch size, and number of training epochs using
TrainingArgumentsfromtransformers.
- Set hyperparameters like learning rate, batch size, and number of training epochs using
- Fine-tuning Technique:
- Choose a fine-tuning technique:
- Supervised Fine-tuning (SFT): Train the model on your dataset using labeled examples where the desired outputs are provided. This is a common approach for tasks like text classification or question answering.
- Reinforcement Learning with Human Feedback (RLHF): Provide human feedback to guide the model's learning process. This can be helpful for tasks where defining clear labels is difficult.
- Choose a fine-tuning technique:
- Training Loop:
- Implement a training loop that feeds your preprocessed data to the model and optimizes its parameters based on the chosen fine-tuning technique. Utilize libraries like
SFTTrainerfor streamlined training.
- Implement a training loop that feeds your preprocessed data to the model and optimizes its parameters based on the chosen fine-tuning technique. Utilize libraries like
4. Evaluation and Refinement:
- Evaluate Performance:
- After training, assess the model's performance on a separate validation dataset relevant to your task. Metrics used for evaluation will depend on the specific task (e.g., accuracy for classification, BLEU score for machine translation).
- Refine the Model:
- Analyze the evaluation results. If performance is unsatisfactory, consider:
- Adjusting hyperparameters.
- Collecting more data.
- Trying a different fine-tuning technique.
- Analyze the evaluation results. If performance is unsatisfactory, consider:
5. Deployment:
- Once satisfied with the model's performance, you can deploy it for real-world use in your application. This might involve integrating it into a web service or mobile app.
Additional Considerations:
- Computational Resources: Fine-tuning large models like Llama-3 8B can be computationally expensive. Ensure you have access to sufficient resources (GPUs, memory) for training.
- Data Quality: The quality and relevance of your dataset significantly impact the fine-tuning outcome. Focus on gathering high-quality data specific to your task.
- Ethical Considerations: Be mindful of potential biases in your data and the model's outputs. Consider implementing safeguards to mitigate bias and ensure responsible use of the fine-tuned model.
Thanks, didn't expect to randomly learn this looking at the community posts.
What happened with the plain HF transformers.Trainer() API ? all I see now is with the TRL library.
What happened with the plain HF transformers.Trainer() API ? all I see now is with the TRL library.
I am currently working on it (with the health dataset) but facing a CUDA error. Hopefully, it will be resolved soon.
Here's the Colab Notebook: https://colab.research.google.com/drive/1TUa9J2J_1Sj-G7mQHX45fKzZtnW3s1vj?usp=sharing
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Anyone know what structure I should be making my dataset to best finetune the model?