Instructions to use MohamedMotaz/Examination-llama-8b-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MohamedMotaz/Examination-llama-8b-4bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MohamedMotaz/Examination-llama-8b-4bit")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MohamedMotaz/Examination-llama-8b-4bit") model = AutoModelForCausalLM.from_pretrained("MohamedMotaz/Examination-llama-8b-4bit") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MohamedMotaz/Examination-llama-8b-4bit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MohamedMotaz/Examination-llama-8b-4bit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MohamedMotaz/Examination-llama-8b-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/MohamedMotaz/Examination-llama-8b-4bit
- SGLang
How to use MohamedMotaz/Examination-llama-8b-4bit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MohamedMotaz/Examination-llama-8b-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MohamedMotaz/Examination-llama-8b-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MohamedMotaz/Examination-llama-8b-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MohamedMotaz/Examination-llama-8b-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use MohamedMotaz/Examination-llama-8b-4bit with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MohamedMotaz/Examination-llama-8b-4bit to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MohamedMotaz/Examination-llama-8b-4bit to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for MohamedMotaz/Examination-llama-8b-4bit to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="MohamedMotaz/Examination-llama-8b-4bit", max_seq_length=2048, ) - Docker Model Runner
How to use MohamedMotaz/Examination-llama-8b-4bit with Docker Model Runner:
docker model run hf.co/MohamedMotaz/Examination-llama-8b-4bit
Exam-corrector: A Fine-tuned LLama 8b Model
Overview
Exam-corrector is a fine-tuned version of the LLama 8b model, specifically adapted to function as a written question corrector. This model grades student answers by comparing them against model answers using a set of predefined instructions. The finetuning process was performed using LoRA (Low-Rank Adaptation).
Model Description
Exam-corrector is designed to provide consistent and fair grading for written answers in exams. It takes both a model answer (the best answer) and a student answer as inputs and returns a grade along with a brief explanation.
Instructions
The grading process follows these detailed instructions:
- The input always consists of two components: the Model Answer and the Student Answer.
- The Model Answer is used solely as a reference and does not receive any marks.
- Grades are assigned to the Student Answer based on its alignment with the Model Answer.
- Full marks are given to Student Answers that convey the complete meaning of the Model Answer, even if different words are used.
- Incomplete or irrelevant information results in deducted marks based on the answer's quality and completeness.
- A consistent marking technique is used to ensure the same answers always receive the same marks.
- Questions with no answer receive zero marks.
- Each grade comes with a one-line brief explanation of the mark.
Input Format
Model Answer:
{model_answer}
Student Answer:
{student_answer}
Output Format
Response:
{grade} {explanation}
Training Details
This model was fine-tuned using the LoRA (Low-Rank Adaptation) technique. Below is a function to print the number of trainable parameters in the model:
def print_number_of_trainable_model_parameters(model):
trainable_model_params = 0
all_model_params = 0
for _, param in model.named_parameters():
all_model_params += param.numel()
if param.requires_grad:
trainable_model_params += param.numel()
return f"trainable model parameters: {trainable_model_params}\\nall model parameters: {all_model_params}\\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"
print(print_number_of_trainable_model_parameters(model))
trainable model parameters: 167772160
all model parameters: 4708372480
percentage of trainable model parameters: 3.56%
Usage
To use this model for grading student answers, you can load it from Hugging Face and pass the appropriate inputs as shown in the example prompt.
Example
from transformers import LlamaTokenizer, LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("MohamedMotaz/Examination-llama-8b-4bit")
model = LlamaForCausalLM.from_pretrained("MohamedMotaz/Examination-llama-8b-4bit")
model_answer = "The process of photosynthesis involves converting light energy into chemical energy."
student_answer = "Photosynthesis is when plants turn light into energy."
inputs = prompt.format(model_answer, student_answer)
input_ids = tokenizer(inputs, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Conclusion
Exam-corrector is a robust tool for automating the grading of written exam answers, ensuring consistent and fair evaluation based on model answers. Feel free to fine-tune further or adapt the model for other specific grading tasks.
Contact
For any issues, questions, or contributions, please reach out to me at myLinkedIn.
- Downloads last month
- 1
Model tree for MohamedMotaz/Examination-llama-8b-4bit
Base model
meta-llama/Meta-Llama-3-8B