RZ412/PokerBench
Viewer • Updated • 574k • 784 • 37
Fine-tuned Llama 3.1 8B Instruct for poker decision-making using LoRA, trained on PokerBench dataset.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("YiPz/llama3-8b-pokerbench-sft")
tokenizer = AutoTokenizer.from_pretrained("YiPz/llama3-8b-pokerbench-sft")
messages = [
{"role": "system", "content": "You are an expert poker player. Respond with your action in <action></action> tags."},
{"role": "user", "content": "Your poker scenario..."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=32, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Actions are returned in <action></action> tags:
<action>fold</action><action>call</action><action>check</action><action>raise 15</action><action>bet 10</action>Quantized GGUF versions for llama.cpp/Ollama: YiPz/llama3-8b-pokerbench-sft-gguf
Subject to Llama 3 license.