Instructions to use McaTech/Nonet with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use McaTech/Nonet with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="McaTech/Nonet", filename="ChatNONET-135m-tuned-q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use McaTech/Nonet with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf McaTech/Nonet:Q8_0 # Run inference directly in the terminal: llama-cli -hf McaTech/Nonet:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf McaTech/Nonet:Q8_0 # Run inference directly in the terminal: llama-cli -hf McaTech/Nonet:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf McaTech/Nonet:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf McaTech/Nonet:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf McaTech/Nonet:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf McaTech/Nonet:Q8_0
Use Docker
docker model run hf.co/McaTech/Nonet:Q8_0
- LM Studio
- Jan
- vLLM
How to use McaTech/Nonet with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "McaTech/Nonet" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "McaTech/Nonet", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/McaTech/Nonet:Q8_0
- Ollama
How to use McaTech/Nonet with Ollama:
ollama run hf.co/McaTech/Nonet:Q8_0
- Unsloth Studio new
How to use McaTech/Nonet with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for McaTech/Nonet to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for McaTech/Nonet to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for McaTech/Nonet to start chatting
- Docker Model Runner
How to use McaTech/Nonet with Docker Model Runner:
docker model run hf.co/McaTech/Nonet:Q8_0
- Lemonade
How to use McaTech/Nonet with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull McaTech/Nonet:Q8_0
Run and chat with the model
lemonade run user.Nonet-Q8_0
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)NONET
NONET is a family of offline, quantized large language models fine-tuned for question answering with direct, concise answers. Designed for local execution using llama.cpp, NONET is available in multiple sizes and optimized for Android or Python-based environments.
Model Details
Model Description
NONET is intended for lightweight offline use, particularly on local devices like mobile phones or single-board computers. The models have been fine-tuned for direct-answer QA and quantized to int8 (q8_0) using llama.cpp.
| Model Name | Base Model | Size |
|---|---|---|
| ChatNONET-135m-tuned-q8_0.gguf | Smollm | 135M |
| ChatNONET-300m-tuned-q8_0.gguf | Smollm | 300M |
| ChatNONET-1B-tuned-q8_0.gguf | LLaMA 3.2 | 1B |
| ChatNONET-3B-tuned-q8_0.gguf | LLaMA 3.2 | 3B |
- Developed by: McaTech (Michael Cobol Agan)
- Model type: Causal decoder-only transformer
- Languages: English
- License: Apache 2.0
- Finetuned from:
- Smollm (135M, 300M variants)
- LLaMA 3.2 (1B, 3B variants)
Uses
Direct Use
- Offline QA chatbot
- Local assistants (no internet required)
- Embedded Android or Python apps
Out-of-Scope Use
- Long-form text generation
- Tasks requiring real-time web access
- Creative storytelling or coding tasks
Bias, Risks, and Limitations
NONET may reproduce biases present in its base models or fine-tuning data. Outputs should not be relied upon for sensitive or critical decisions.
Recommendations
- Validate important responses
- Choose model size based on your device capability
- Avoid over-reliance for personal or legal advice
How to Get Started with the Model
For Android Devices
- Try the Android app in my Github: Download ChatNONET APK
You can also build llama.cpp your own and run it
# Clone llama.cpp and build it
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
# Run the model
./llama-cli -m ./ChatNONET-300m-tuned-q8_0.gguf -p "You are ChatNONET AI assistant." -cnv
Training Details
- Finetuning Goal: Direct-answer question answering
- Precision: FP16 mixed precision
- Frameworks: PyTorch, Transformers, Bitsandbytes
- Quantization: int8 GGUF (
q8_0) viallama.cpp
Evaluation
- Evaluated internally on short QA prompts
- Capable of direct factual or logical answers
- Larger models perform better on reasoning tasks
Technical Specifications
Architecture:
- Smollm (135M, 300M)
- LLaMA 3.2 (1B, 3B)
Format: GGUF
Quantization: q8_0 (int8)
Deployment: Mobile (Android) and desktop via
llama.cpp
Citation
@misc{chatnonet2025,
title={ChatNONET: Offline Quantized Q&A Models},
author={Michael Cobol Agan},
year={2025},
note={\url{https://huggingface.co/McaTech/Nonet}},
}
Contact
- Author: Michael Cobol Agan (McaTech)
- Facebook: FB Profile
- Downloads last month
- 174
8-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="McaTech/Nonet", filename="", )