| --- |
| title: NurseSim Triage |
| emoji: π₯ |
| colorFrom: blue |
| colorTo: indigo |
| sdk: docker |
| pinned: false |
| --- |
| |
| # NurseSim-RL: A Healthcare Agent Environment for Clinical Triage |
|
|
| [](https://agentbeats.dev/ClinyQAi/nursesim-triage) |
|
|
| [](https://rdi.berkeley.edu/agentx-agentbeats) |
| [](https://huggingface.co/NurseCitizenDeveloper/NurseSim-Triage-Llama-3.2-3B) |
| [](https://wandb.ai/mrlincs-nursing-citizen-development/huggingface) |
| [](LICENSE) |
|
|
| > **OpenEnv Challenge Entry** | Berkeley RDI AgentX-AgentBeats Competition |
| > A Gymnasium-compatible RL environment for training AI agents to perform clinical triage using the Manchester Triage System (MTS). |
|
|
|  |
|
|
| ## π― Overview |
|
|
| **NurseSim-RL** simulates the decision-making process of a Triage Nurse in an Accident & Emergency (A&E) department. The agent must assess patients based on their chief complaint and vital signs, then assign an appropriate triage category (1-5) according to the Manchester Triage System. |
|
|
| ### Key Features |
| - **Gymnasium-Compatible:** Standard RL interface for easy integration. |
| - **Expanded Dataset:** Trained on **2,100+** synthetic patient scenarios across all 5 MTS categories. |
| - **Safety-Aware Rewards:** Heavy penalties for under-triaging critical patients. |
| - **Fine-Tuned Agent:** Llama 3.2 3B trained with Unsloth (4-bit QLoRA) - **60% accuracy validated**. |
| - **NEW: Semantic RL Mode:** NurseEmbed-powered text embeddings for language-conditioned agents. |
| - **Age-Aware Triage:** Demographic parsing for accurate risk stratification. |
| - **A2A Protocol:** Agent-to-Agent evaluation via AgentBeats platform. |
| - **Docker Deployment:** Fully containerized for reproducibility. |
| - **Dual Mode:** Runs as interactive demo (Gradio) or API server (A2A). |
|
|
| ## π Quick Start |
|
|
| ### Run with Docker |
|
|
| ```bash |
| # Pull the image |
| docker pull nursecitizendeveloper/nursesim-triage:latest |
| |
| # Run in demo mode (Gradio UI) |
| docker run -p 7860:7860 nursecitizendeveloper/nursesim-triage:latest |
| |
| # Run in A2A mode (API only) |
| docker run -e MODE=a2a -p 7860:7860 nursecitizendeveloper/nursesim-triage:latest |
| ``` |
|
|
| ### Test the A2A Endpoint |
|
|
| ```bash |
| # Health check |
| curl https://nursecitizendeveloper-nursesim-triage-demo.hf.space/health |
| |
| # Get agent card |
| curl https://nursecitizendeveloper-nursesim-triage-demo.hf.space/.well-known/agent-card.json |
| |
| # Submit a task |
| curl -X POST https://nursecitizendeveloper-nursesim-triage-demo.hf.space/process-task \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "complaint": "Chest pain", |
| "vitals": { |
| "heart_rate": 110, |
| "blood_pressure": "90/60", |
| "spo2": 94, |
| "temperature": 37.2 |
| } |
| }' |
| ``` |
|
|
| ## ποΈ Project Structure |
|
|
| ``` |
| NurseSim-RL/ |
| βββ nursesim_rl/ # Core environment package |
| β βββ __init__.py |
| β βββ TriageEnv.py # Gymnasium environment |
| β βββ PatientGenerator.py # Synthetic patient generation |
| βββ notebooks/ |
| β βββ NurseSim_RL_Unsloth_Training.ipynb # Training notebook |
| βββ data/ |
| β βββ train.jsonl # Training dataset (500 examples) |
| β βββ val.jsonl # Validation dataset (100 examples) |
| βββ app.py # Gradio demo application |
| βββ Dockerfile # For reproducibility |
| βββ requirements.txt |
| βββ README.md |
| ``` |
|
|
| ## π Quick Start |
|
|
| ### Installation |
|
|
| ```bash |
| git clone https://github.com/NurseCitizenDeveloper/NurseSim-RL.git |
| cd NurseSim-RL |
| pip install -r requirements.txt |
| ``` |
|
|
| ### Using the Environment |
|
|
| ```python |
| import gymnasium as gym |
| from nursesim_rl import TriageEnv |
| |
| env = gym.make("NurseSim-Triage-v0") |
| obs, info = env.reset() |
| |
| # Agent takes an action |
| action = {"triage_category": 2, "intervention": 1} |
| obs, reward, terminated, truncated, info = env.step(action) |
| ``` |
|
|
| ### Running the Demo |
|
|
| **Gradio Mode (Human UI):** |
| ```bash |
| export AGENT_MODE=gradio |
| export HF_TOKEN=your_hf_token_here |
| python app.py |
| ``` |
|
|
| **AgentBeats A2A Mode (Platform Integration):** |
| ```bash |
| export AGENT_MODE=a2a |
| export HF_TOKEN=your_hf_token_here |
| python agent_main.py |
| ``` |
|
|
| ## π€ AgentBeats Integration |
|
|
| This agent is fully compatible with the [AgentBeats platform](https://agentbeats.org) for automated agent evaluation via the **Agent-to-Agent (A2A) protocol**. |
|
|
| ### Dual-Mode Architecture |
|
|
| The agent supports two deployment modes: |
|
|
| | Mode | Purpose | Entry Point | Port | |
| |------|---------|-------------|------| |
| | **Gradio** | Human-facing UI for demos | `app.py` | 7860 | |
| | **A2A** | Platform integration for automated evaluation | `agent_main.py` | 8080 | |
|
|
| Set the mode via the `AGENT_MODE` environment variable. |
|
|
| ### A2A Protocol Compliance |
|
|
| - **Agent Card:** `.well-known/agent-card.json` - Metadata and schemas |
| - **Task Processing:** Structured input/output for triage assessments |
| - **Lifecycle Methods:** `reset()`, `health_check()` |
| - **Protocol Version:** A2A v1.0 |
|
|
| ### Local Testing with AgentBeats Controller |
|
|
| ```bash |
| # Install earthshaker SDK |
| pip install earthshaker |
| |
| # Set environment variables |
| export HF_TOKEN=your_hf_token_here |
| export AGENT_MODE=a2a |
| |
| # Run the controller |
| earthshaker run_ctrl |
| |
| # Test the agent card endpoint (in another terminal) |
| curl http://localhost:8080/.well-known/agent-card.json | jq |
| |
| # Submit a test task via A2A protocol |
| curl -X POST http://localhost:8080/task \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "complaint": "Chest pain and shortness of breath", |
| "vitals": { |
| "heart_rate": 120, |
| "blood_pressure": "85/55", |
| "spo2": 89, |
| "temperature": 37.8 |
| } |
| }' |
| ``` |
|
|
| ### Docker Deployment |
|
|
| **Build:** |
| ```bash |
| docker build -t nursesim-triage:latest . |
| ``` |
|
|
| **Run in A2A Mode:** |
| ```bash |
| docker run -e HF_TOKEN=$HF_TOKEN -e AGENT_MODE=a2a -p 8080:8080 nursesim-triage:latest |
| ``` |
|
|
| **Run in Gradio Mode:** |
| ```bash |
| docker run -e HF_TOKEN=$HF_TOKEN -e AGENT_MODE=gradio -p 7860:7860 nursesim-triage:latest |
| ``` |
|
|
| ## π Training Results & Validation |
|
|
| The agent was fine-tuned using **Unsloth** on a Llama 3.2 3B base model with an expanded dataset of ~2,100 clinical scenarios. |
|
|
| ### β
Performance Metrics (Validated) |
| Evaluated on 15 Gold-Standard Clinical Scenarios using GPT-5.2 as a Clinical Judge. |
|
|
| | Metric | Value | Description | |
| |--------|-------|-------------| |
| | **Accuracy** | **60%** | Exact match with Manchester Triage Categories (1-5) | |
| | **Safety** | **70%+** | Pass Rate for critical life-threat detection (Sepsis, Anaphylaxis) | |
| | **Training Loss** | 0.19 | Final loss after 300 steps | |
| | **Hardware** | NVIDIA A100 | Google Colab | |
| | **Training Time** | 25 minutes | Using Unsloth QLoRA | |
|
|
| ### π§ Key Methodology: Age-Aware Triage |
| Our validation revealed that **parsing Age and Gender** from the patient description is critical for accurate risk stratification (e.g., separating "Chest Pain" in a 72M vs 20M). The model effectively learned these demographic risk factors, improving accuracy from 16% to 60%. |
|
|
| See our [W&B Report](https://wandb.ai/mrlincs-nursing-citizen-development/huggingface) for detailed training curves. |
|
|
| ## π©Ί Clinical Framework: Manchester Triage System |
|
|
| | Category | Priority | Target Time | Example | |
| |----------|----------|-------------|---------| |
| | 1 | Immediate | 0 min | Cardiac arrest, Anaphylaxis | |
| | 2 | Very Urgent | 10 min | Chest pain, Stroke | |
| | 3 | Urgent | 60 min | Abdominal pain, Fractures | |
| | 4 | Standard | 120 min | Minor injuries, Mild illness | |
| | 5 | Non-Urgent | 240 min | Minor cuts, GP-suitable | |
|
|
| ## π Resources |
|
|
| - **Hugging Face Space:** [Try the Demo](https://huggingface.co/spaces/NurseCitizenDeveloper/NurseSim-Triage-Demo) |
| - **Model Card:** [NurseSim-Triage-Llama-3.2-3B](https://huggingface.co/NurseCitizenDeveloper/NurseSim-Triage-Llama-3.2-3B) |
| - **Training Report:** [W&B Dashboard](https://wandb.ai/mrlincs-nursing-citizen-development/huggingface) |
| - **Blog Post:** [Training AI Agents for Clinical Triage](https://huggingface.co/blog/NurseCitizenDeveloper/nursesim-rl-training-ai-agents-clinical-triage) |
| - **AgentBeats Profile:** [NurseSim-Triage Benchmark](https://agentbeats.dev/ClinyQAi/nursesim-triage) |
| - **Leaderboard:** [Community Results](https://github.com/ClinyQAi/NurseSim-Triage-Leaderboard) |
| - **Docker Hub:** [nursecitizendeveloper/nursesim-triage](https://hub.docker.com/r/nursecitizendeveloper/nursesim-triage) |
|
|
| ## π€ AgentBeats Integration |
|
|
| NurseSim-Triage implements the **Agent-to-Agent (A2A) protocol** for automated benchmarking: |
|
|
| ### Protocol Details |
| - **Version:** a2a/v1.0 |
| - **Agent Card:** `/.well-known/agent-card.json` |
| - **Health Endpoint:** `/health` |
| - **Task Endpoint:** `/process-task` (POST) |
|
|
| ### Evaluation Metrics |
| - **Triage Accuracy** (0-1): Percentage of correct MTS assignments |
| - **Safety Score** (0-1): Penalizes dangerous under-triage |
| - **Response Quality** (0-1): Clinical reasoning coherence |
| - **Response Time** (ms): Computational efficiency |
|
|
| ### Submit Your Agent |
| 1. Register on [AgentBeats](https://agentbeats.dev) |
| 2. Implement the A2A protocol |
| 3. Submit to NurseSim-Triage benchmark |
| 4. View results on the [leaderboard](https://agentbeats.dev/ClinyQAi/nursesim-triage) |
|
|
| ## π³ Deployment |
|
|
| ### Hugging Face Spaces |
| Deployed on **NVIDIA T4 (Medium)** GPU with: |
| - 4-bit quantization (`BitsAndBytesConfig`) |
| - Asynchronous model loading |
| - Dual-mode support (Gradio + A2A) |
|
|
| ### Docker |
| ```bash |
| # Build locally |
| docker build -t nursesim-triage . |
| |
| # Run in demo mode |
| docker run -p 7860:7860 nursesim-triage |
| |
| # Run in A2A mode |
| docker run -e MODE=a2a -p 7860:7860 nursesim-triage |
| ``` |
|
|
| ### Environment Variables |
| - `MODE`: `gradio` (default) or `a2a` |
| - `HF_TOKEN`: Hugging Face API token (for private models) |
| - `OMP_NUM_THREADS`: OpenMP threads (auto-configured) |
|
|
| ## π OpenEnv Challenge |
|
|
| This project was submitted to the **OpenEnv Challenge 2026** (Berkeley RDI AgentX-AgentBeats Competition). |
|
|
| **Key Contributions:** |
| - Novel benchmark for clinical AI evaluation |
| - Safety-focused metrics (penalizes under-triage) |
| - Open-source training pipeline |
| - Reproducible Docker deployment |
| - Community leaderboard |
|
|
| ## π License |
|
|
| MIT License - See [LICENSE](LICENSE) for details. |
|
|
| ## π Acknowledgements |
|
|
| **Mentors and Champions of Innovation:** |
| - **Dr Clare Cable**, Chief Executive, Burdett Trust for Nursing β For championing Relational Intelligence |
| - **Professor Joanne Bosanquet**, Chief Executive, Foundation of Nursing Studies β For championing person-centred nursing |
| - **Professor Gemma Stacey**, Programme Director, Nursing Now Challenge β For inspiring global nursing leadership |
| - **Aisha Holloway**, Chief Nursing Officer, Scotland β For inspiring excellence |
| - **Josie Rudman MBE** β Mutual Mentor & champion of nurse-led innovation |
|
|
| **Research & Education Partners:** |
| - **Kumbi Kariwo** β Champion of AI equity and bias mitigation |
| - **Rohit Sagoo** β Children's Nurse & Innovator in education and practice |
| - **Dr Hellena Habte-Asres** β Big Data Researcher, Nurse & Innovator |
| - **Kelly Thobekile Ncube** β Senior Lecturer in Adult Nursing (SFHEA) and Global Health Lecturer Volunteer Fellow |
|
|
| **Technical Community:** |
| - **OpenEnv Challenge** β Berkeley RDI, PyTorch, Hugging Face, Unsloth |
| - **Manchester Triage System** β Clinical framework |
| - **Unsloth AI** β 2x faster fine-tuning |
| - **AgentBeats** β A2A protocol infrastructure |
| - **NVIDIA** β T4 GPU infrastructure |
|
|
| --- |
|
|
| **Built for the OpenEnv Challenge 2026** π |
|
|
| # Force rebuild trigger |
|
|