Instructions to use MohamedMotaz/Examination-llama-8b-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MohamedMotaz/Examination-llama-8b-4bit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MohamedMotaz/Examination-llama-8b-4bit")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MohamedMotaz/Examination-llama-8b-4bit")
model = AutoModelForCausalLM.from_pretrained("MohamedMotaz/Examination-llama-8b-4bit")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MohamedMotaz/Examination-llama-8b-4bit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MohamedMotaz/Examination-llama-8b-4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MohamedMotaz/Examination-llama-8b-4bit",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/MohamedMotaz/Examination-llama-8b-4bit

SGLang

How to use MohamedMotaz/Examination-llama-8b-4bit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MohamedMotaz/Examination-llama-8b-4bit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MohamedMotaz/Examination-llama-8b-4bit",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MohamedMotaz/Examination-llama-8b-4bit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MohamedMotaz/Examination-llama-8b-4bit",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use MohamedMotaz/Examination-llama-8b-4bit with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MohamedMotaz/Examination-llama-8b-4bit to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MohamedMotaz/Examination-llama-8b-4bit to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MohamedMotaz/Examination-llama-8b-4bit to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="MohamedMotaz/Examination-llama-8b-4bit",
    max_seq_length=2048,
)

Docker Model Runner
How to use MohamedMotaz/Examination-llama-8b-4bit with Docker Model Runner:
```
docker model run hf.co/MohamedMotaz/Examination-llama-8b-4bit
```

Exam-corrector: A Fine-tuned LLama 8b Model

Overview

Exam-corrector is a fine-tuned version of the LLama 8b model, specifically adapted to function as a written question corrector. This model grades student answers by comparing them against model answers using a set of predefined instructions. The finetuning process was performed using LoRA (Low-Rank Adaptation).

Model Description

Exam-corrector is designed to provide consistent and fair grading for written answers in exams. It takes both a model answer (the best answer) and a student answer as inputs and returns a grade along with a brief explanation.

Instructions

The grading process follows these detailed instructions:

The input always consists of two components: the Model Answer and the Student Answer.
The Model Answer is used solely as a reference and does not receive any marks.
Grades are assigned to the Student Answer based on its alignment with the Model Answer.
Full marks are given to Student Answers that convey the complete meaning of the Model Answer, even if different words are used.
Incomplete or irrelevant information results in deducted marks based on the answer's quality and completeness.
A consistent marking technique is used to ensure the same answers always receive the same marks.
Questions with no answer receive zero marks.
Each grade comes with a one-line brief explanation of the mark.

Input Format

Model Answer:

{model_answer}

Student Answer:

{student_answer}

Output Format

Response:

{grade} {explanation}

Training Details

This model was fine-tuned using the LoRA (Low-Rank Adaptation) technique. Below is a function to print the number of trainable parameters in the model:

def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\\nall model parameters: {all_model_params}\\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(model))

trainable model parameters: 167772160
all model parameters: 4708372480
percentage of trainable model parameters: 3.56%

Usage

To use this model for grading student answers, you can load it from Hugging Face and pass the appropriate inputs as shown in the example prompt.

Example

from transformers import LlamaTokenizer, LlamaForCausalLM

tokenizer = LlamaTokenizer.from_pretrained("MohamedMotaz/Examination-llama-8b-4bit")
model = LlamaForCausalLM.from_pretrained("MohamedMotaz/Examination-llama-8b-4bit")

model_answer = "The process of photosynthesis involves converting light energy into chemical energy."
student_answer = "Photosynthesis is when plants turn light into energy."

inputs = prompt.format(model_answer, student_answer)
input_ids = tokenizer(inputs, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

Conclusion

Exam-corrector is a robust tool for automating the grading of written exam answers, ensuring consistent and fair evaluation based on model answers. Feel free to fine-tune further or adapt the model for other specific grading tasks.

Contact

For any issues, questions, or contributions, please reach out to me at myLinkedIn.

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

F16

F32

Model tree for MohamedMotaz/Examination-llama-8b-4bit

Base model

meta-llama/Meta-Llama-3-8B

Quantized

(274)

this model

MohamedMotaz
/

Examination-llama-8b-4bit