Instructions to use malteos/hermeo-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use malteos/hermeo-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="malteos/hermeo-7b")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("malteos/hermeo-7b")
model = AutoModelForCausalLM.from_pretrained("malteos/hermeo-7b")

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use malteos/hermeo-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "malteos/hermeo-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "malteos/hermeo-7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/malteos/hermeo-7b

SGLang

How to use malteos/hermeo-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "malteos/hermeo-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "malteos/hermeo-7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "malteos/hermeo-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "malteos/hermeo-7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use malteos/hermeo-7b with Docker Model Runner:
```
docker model run hf.co/malteos/hermeo-7b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Hermes + Leo = Hermeo

Hermeo-7B

A German-English language model merged from DPOpenHermes-7B-v2 and leo-mistral-hessianai-7b-chat using mergekit. Both base models are fine-tuned versions of Mistral-7B-v0.1.

Model details

Merged from: leo-mistral-hessianai-7b-chat and DPOpenHermes-7B-v2
Model type: Causal decoder-only transformer language model
Languages: English and German
License: Apache 2.0

How to use

You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='malteos/hermeo-7b')
>>> set_seed(42)
>>> generator("Hallo, Ich bin ein Sprachmodell,", max_length=40, num_return_sequences=1)
[{'generated_text': 'Hallo, Ich bin ein Sprachmodell, das dir bei der Übersetzung von Texten zwischen Deutsch und Englisch helfen kann. Wenn du mir einen Text in Deutsch'}]

Acknowledgements

This model release is heavily inspired by Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp
Thanks to the authors of the base models: Mistral, LAION, HessianAI, Open Access AI Collective, @teknium, @bjoernp
The German evaluation datasets and scripts from @bjoernp were used.
The computing resources from DFKI's PEGASUS cluster were used for the evaluation.

Evaluation

The evaluation methdology of the Open LLM Leaderboard is followed.

German benchmarks

German tasks:	MMLU-DE	Hellaswag-DE	ARC-DE	Average
Models / Few-shots:	(5 shots)	(10 shots)	(24 shots)
7B parameters
llama-2-7b	0.400	0.513	0.381	0.431
leo-hessianai-7b	0.400	0.609	0.429	0.479
bloom-6b4-clp-german	0.274	0.550	0.351	0.392
mistral-7b	0.524	0.588	0.473	0.528
leo-mistral-hessianai-7b	0.481	0.663	0.485	0.543
leo-mistral-hessianai-7b-chat	0.458	0.617	0.465	0.513
DPOpenHermes-7B-v2	0.517	0.603	0.515	0.545
hermeo-7b (this model)	0.511	0.668	0.528	0.569
13B parameters
llama-2-13b	0.469	0.581	0.468	0.506
leo-hessianai-13b	0.486	0.658	0.509	0.551
70B parameters
llama-2-70b	0.597	0.674	0.561	0.611
leo-hessianai-70b	0.653	0.721	0.600	0.658

English benchmarks

English tasks:	MMLU	Hellaswag	ARC	Average
Models / Few-shots:	(5 shots)	(10 shots)	(24 shots)
llama-2-7b	0.466	0.786	0.530	0.594
leolm-hessianai-7b	0.423	0.759	0.522	0.568
bloom-6b4-clp-german	0.264	0.525	0.328	0.372
mistral-7b	0.635	0.832	0.607	0.691
leolm-mistral-hessianai-7b	0.550	0.777	0.518	0.615
hermeo-7b (this model)	0.601	0.821	0.620	0.681

Prompting / Prompt Template

Prompt dialogue template (ChatML format):

"""
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
"""

The model input can contain multiple conversation turns between user and assistant, e.g.

<|im_start|>user
{prompt 1}<|im_end|>
<|im_start|>assistant
{reply 1}<|im_end|>
<|im_start|>user
{prompt 2}<|im_end|>
<|im_start|>assistant
(...)

License

Apache 2.0

Model tree for malteos/hermeo-7b

Merges

16 models

Quantizations

3 models

Collection including malteos/hermeo-7b

European LLMs

Collection

Large language models for European languages (multilingual and monolingual) • 13 items • Updated May 26, 2024 • 3

malteos
/

hermeo-7b