How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf CBrootA/Qwen-MediCare-BD:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf CBrootA/Qwen-MediCare-BD:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf CBrootA/Qwen-MediCare-BD:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf CBrootA/Qwen-MediCare-BD:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf CBrootA/Qwen-MediCare-BD:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf CBrootA/Qwen-MediCare-BD:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf CBrootA/Qwen-MediCare-BD:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf CBrootA/Qwen-MediCare-BD:Q4_K_M
Use Docker
docker model run hf.co/CBrootA/Qwen-MediCare-BD:Q4_K_M
Quick Links

🏥 Qwen-MediCare-BD

Bangladesh's First Offline Medical AI Assistant

Model Description

Qwen-MediCare-BD-3B is a fine-tuned medical language model based on Qwen2.5-3B-Instruct, specifically trained on Bangladesh-specific medical data. It provides accurate medical information offline, making it ideal for regions with limited internet connectivity.

Key Features

  • 🇧🇩 Bangladesh-specific: Includes local diseases, drugs, and medical context
  • 📱 Mobile-ready: Quantized to Q4_K_M (1.8GB)
  • 🔒 100% Offline: No internet required for inference
  • 🩺 Medically validated: Trained on 30,523 medical Q&A pairs
  • 💚 Multilingual: Supports English and Bangla queries

Model Variants

Variant Size Format Use Case
Full Model 6.2 GB Safetensors Training/Research
Q4_K_M 1.8 GB GGUF Mobile/Edge devices
LoRA Adapters 114 MB Safetensors Fine-tuning

Quick Start

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "CBrootA/Qwen-MediCare-BD",
    device_map="auto",
    load_in_4bit=True
)
tokenizer = AutoTokenizer.from_pretrained("CBrootA/Qwen-MediCare-BD")

messages = [
    {"role": "system", "content": "You are a medical assistant for Bangladesh."},
    {"role": "user", "content": "What are dengue symptoms?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Downloads last month
779
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
Input a message to start chatting with CBrootA/Qwen-MediCare-BD.

Model tree for CBrootA/Qwen-MediCare-BD

Base model

Qwen/Qwen2.5-3B
Quantized
(220)
this model

Datasets used to train CBrootA/Qwen-MediCare-BD