Instructions to use katsukiono/gemma3-270m-pred-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use katsukiono/gemma3-270m-pred-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="katsukiono/gemma3-270m-pred-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("katsukiono/gemma3-270m-pred-sft", dtype="auto")

llama-cpp-python

How to use katsukiono/gemma3-270m-pred-sft with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="katsukiono/gemma3-270m-pred-sft",
	filename="gguf/gemma3-270m-pred-sft-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use katsukiono/gemma3-270m-pred-sft with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf katsukiono/gemma3-270m-pred-sft:F16
# Run inference directly in the terminal:
llama-cli -hf katsukiono/gemma3-270m-pred-sft:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf katsukiono/gemma3-270m-pred-sft:F16
# Run inference directly in the terminal:
llama-cli -hf katsukiono/gemma3-270m-pred-sft:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf katsukiono/gemma3-270m-pred-sft:F16
# Run inference directly in the terminal:
./llama-cli -hf katsukiono/gemma3-270m-pred-sft:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf katsukiono/gemma3-270m-pred-sft:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf katsukiono/gemma3-270m-pred-sft:F16

Use Docker

docker model run hf.co/katsukiono/gemma3-270m-pred-sft:F16

LM Studio
Jan

vLLM

How to use katsukiono/gemma3-270m-pred-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "katsukiono/gemma3-270m-pred-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "katsukiono/gemma3-270m-pred-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/katsukiono/gemma3-270m-pred-sft:F16

SGLang

How to use katsukiono/gemma3-270m-pred-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "katsukiono/gemma3-270m-pred-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "katsukiono/gemma3-270m-pred-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "katsukiono/gemma3-270m-pred-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "katsukiono/gemma3-270m-pred-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use katsukiono/gemma3-270m-pred-sft with Ollama:
```
ollama run hf.co/katsukiono/gemma3-270m-pred-sft:F16
```

Unsloth Studio new

How to use katsukiono/gemma3-270m-pred-sft with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for katsukiono/gemma3-270m-pred-sft to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for katsukiono/gemma3-270m-pred-sft to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for katsukiono/gemma3-270m-pred-sft to start chatting

Docker Model Runner
How to use katsukiono/gemma3-270m-pred-sft with Docker Model Runner:
```
docker model run hf.co/katsukiono/gemma3-270m-pred-sft:F16
```

Lemonade

How to use katsukiono/gemma3-270m-pred-sft with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull katsukiono/gemma3-270m-pred-sft:F16

Run and chat with the model

lemonade run user.gemma3-270m-pred-sft-F16

List all available models

lemonade list

gemma3-270m-pred-sft

日本語IME（キーボードの予測変換）用途に最適化した Gemma 3 270M の軽量モデル（SFT版）です。
入力テキスト中の境界 [---] の“後ろ”に続く言葉を、予測変換として生成します。

ベースモデル: google/gemma-3-270m-it
学習: SFT（教師あり微調整）
位置づけ: gemma3-270m-pred-dpo の ベースライン（DPO版の比較対象）

DPO版はこちら:
katsukiono/gemma3-270m-pred-dpo

配布ファイル

Transformers (Hugging Face)

リポジトリ直下に model.safetensors など（Transformers形式）

GGUF（`gguf/` 配下）

gguf/gemma3-270m-pred-sft-f16.gguf
gguf/gemma3-270m-pred-sft-q4_k_m.gguf

入力フォーマット（重要）

固定プロンプトは以下です。

キーボードの予測変換として[---]に続く言葉を予測変換してください。[---]より前はこれまでのユーザー入力です。
ユーザー入力と予測変換の間には境界 [---]を入れてください。

ーーーー以下が予測変換対象ーーーー

ーーーー以下が予測変換対象ーーーー の後ろに、ユーザー入力 + 境界 [---] + 未確定の先頭（途中まで入力した語）を続けてください。
出力は [---] を含む1行を想定（運用では 1行目だけ採用が簡単で安定します）。

例

入力（ユーザー）:

キーボードの予測変換として[---]に続く言葉を予測変換してください。[---]より前はこれまでのユーザー入力です。
ユーザー入力と予測変換の間には境界 [---]を入れてください。

ーーーー以下が予測変換対象ーーーー
これに関してはどんな[---]もて

出力（モデル）:

これに関してはどんな[---]問題/でしょ/う/か/？

"[---]" より後ろを / で split すると、候補列として扱えます。
IME組み込みでは、[---] 以降だけ取り出してUIに反映するのが簡単です。

使い方（Transformers）

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "katsukiono/gemma3-270m-pred-sft"

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
).eval()

prompt = """キーボードの予測変換として[---]に続く言葉を予測変換してください。[---]より前はこれまでのユーザー入力です。
ユーザー入力と予測変換の間には境界 [---]を入れてください。

ーーーー以下が予測変換対象ーーーー
これに関してはどんな[---]もて
"""

chat = [{"role": "user", "content": prompt}]
text = tok.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(
        **inputs,
        do_sample=False,
        max_new_tokens=96,
    )

gen = tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(gen.splitlines()[0].strip())

使い方（GGUF / llama.cpp）

./llama-cli \
  -m gguf/gemma3-270m-pred-sft-q4_k_m.gguf \
  -p "（上の固定プロンプト + 入力）" \
  -n 96 \
  --temp 0

推奨後処理（運用メモ）

生成が複数行になる場合があるため、まずは 1行目のみ採用が簡単です。
"[---]" が欠落する出力が稀にあり得るため、プロダクションでは [---] の有無チェックを推奨します。
候補列の取得例:
- 出力を split("[---]", 1) → 後半を split("/")（空要素は除外）で候補化

ライセンス / 利用条件

本モデルは google/gemma-3-270m-it をベースにしています。
利用にあたっては、ベースモデル側の利用条件（Gemmaの利用規約/ライセンス）を確認のうえ利用してください。

Downloads last month: 35

Safetensors

Model size

0.3B params

Tensor type

BF16

Model tree for katsukiono/gemma3-270m-pred-sft

Base model

google/gemma-3-270m

Finetuned

google/gemma-3-270m-it

Quantized

(326)

this model