Corch V13 Balanced: Task Routing Foundation Model

87.30% Average Accuracy | Perfect Domain & Capability Classification

A multi-task foundation model for intelligent software engineering task routing, achieving breakthrough performance through balanced synthetic data generation.

Model Description

Corch V13 Balanced is a 805K parameter neural network that classifies software engineering tasks across 4 dimensions:

Domain (19 classes): frontend, backend, machine_learning, etc. - 100% accuracy 🎯
Capability (8 classes): code_generation, debugging, testing, etc. - 100% accuracy 🎯
Strategy (2 classes): DIRECT vs ORCHESTRATE - 85.98% accuracy
Execution Type (5 classes): single_task, multi_step, etc. - 63.20% accuracy

Performance

Task	Accuracy	Improvement from V10
Average	87.30%	+20.46%
Domain	100.00% 🎯	+14.59%
Capability	100.00% 🎯	+39.61%
Strategy	85.98%	+12.55%
Execution	63.20%	+7.94%

Key Innovation: Balanced Synthetic Data

The breakthrough came from solving severe class imbalance (324:1 ratio):

Generated 49,307 synthetic examples using GPT-5-Pro
Balanced dataset to ~10K examples per domain
Eliminated rare class zero-accuracy problem

Before balancing:

machine_learning domain: 88 examples → 0% accuracy
other domain: 57 examples → 0% accuracy

After balancing:

All domains: ~10K examples → 100% accuracy ✅

Architecture

Input Text → BGE-large-en-v1.5 Embedding (1024d)
            ↓
Shared Layers:
  - Linear(1024 → 512) + ReLU + Dropout(0.3)
  - Linear(512 → 512) + ReLU + Dropout(0.3)
            ↓
Task-Specific Heads:
  ├─ Strategy Head → Linear(512 → 2)
  ├─ Capability Head → Linear(512 → 8)
  ├─ Domain Head → Linear(512 → 19)
  └─ Execution Head → Linear(512 → 5)

Parameters: 804,898
Training Time: ~1 minute (30 epochs, early stopped)
Hardware: AMD MI300X GPU

Usage

import torch
from transformers import AutoTokenizer, AutoModel

# Load BGE embedding model
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en-v1.5")
embedding_model = AutoModel.from_pretrained("BAAI/bge-large-en-v1.5")

# Load Corch V13 Balanced model
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id="bledden/corch-v13-balanced", filename="model_v13_balanced.pt")

# Initialize model
class FoundationModelV13(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.shared = torch.nn.Sequential(
            torch.nn.Linear(1024, 512),
            torch.nn.ReLU(),
            torch.nn.Dropout(0.3),
            torch.nn.Linear(512, 512),
            torch.nn.ReLU(),
            torch.nn.Dropout(0.3)
        )
        self.strategy_head = torch.nn.Linear(512, 2)
        self.capability_head = torch.nn.Linear(512, 8)
        self.domain_head = torch.nn.Linear(512, 19)
        self.execution_head = torch.nn.Linear(512, 5)
    
    def forward(self, x):
        shared = self.shared(x)
        return {
            'strategy': self.strategy_head(shared),
            'capability': self.capability_head(shared),
            'domain': self.domain_head(shared),
            'execution': self.execution_head(shared)
        }

model = FoundationModelV13()
checkpoint = torch.load(model_path, weights_only=True)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Embed and predict
def route_task(task_text):
    # Generate embedding
    inputs = tokenizer(task_text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        embedding = embedding_model(**inputs).last_hidden_state[:, 0, :]
    
    # Get predictions
    with torch.no_grad():
        outputs = model(embedding)
    
    strategy = ["DIRECT", "ORCHESTRATE"][outputs['strategy'].argmax().item()]
    capability = ["code_generation", "debugging", "documentation", "optimization",
                  "refactoring", "testing", "design", "data_analysis"][outputs['capability'].argmax().item()]
    domain = ["frontend", "backend", "data_processing", "machine_learning", "devops",
              "testing", "security", "mobile", "data_engineering", "cloud", "database",
              "api", "ui_ux", "general", "iot", "blockchain", "game_dev", "embedded", 
              "other"][outputs['domain'].argmax().item()]
    execution = ["single_task", "multi_step", "iterative", "parallel", 
                 "sequential"][outputs['execution'].argmax().item()]
    
    return {
        "strategy": strategy,
        "capability": capability,
        "domain": domain,
        "execution_type": execution
    }

# Example
result = route_task("Build a CNN image classifier using PyTorch for medical imaging")
print(result)
# {
#   'strategy': 'ORCHESTRATE',
#   'capability': 'code_generation',
#   'domain': 'machine_learning',  # 100% confidence
#   'execution_type': 'multi_step'
# }

Training Data

Training set: 31,592 examples (balanced)
Validation set: 3,495 examples
Synthetic examples: 49,307 (generated via GPT-5-Pro)
Real examples: ~550K (existing dataset)
Final dataset: Balanced to ~10K per domain

Synthetic Data Generation

Used GPT-5-Pro with domain-specific prompts:

Generate a realistic software engineering task for: {domain}
Required: {capability}, {execution_type}, {strategy}
Output: 1-3 sentence task description with realistic terminology

Cost: ~$500 for 49,307 examples
Quality: 100% unique, zero duplicates, validated schemas

Label Mappings

Strategy (2): DIRECT, ORCHESTRATE
Capability (8): code_generation, debugging, documentation, optimization, refactoring, testing, design, data_analysis
Domain (19): frontend, backend, data_processing, machine_learning, devops, testing, security, mobile, data_engineering, cloud, database, api, ui_ux, general, iot, blockchain, game_dev, embedded, other
Execution (5): single_task, multi_step, iterative, parallel, sequential

Comparison to Baselines

Model	Architecture	Data	Avg Acc	Domain Acc
Logistic Regression	Single-task	Imbalanced	74.61%	74.61%
V10	Multi-task	Imbalanced	66.84%	85.41%
V13 Balanced	Multi-task	Balanced	87.30%	100.00%

Limitations

Execution type prediction (63.20%) still has room for improvement
Context-independent (doesn't use conversation history yet)
English-only
Focused on software engineering tasks

Citation

@software{corch_v13_balanced_2024,
  title = {Corch V13 Balanced: Task Routing Foundation Model},
  author = {Bledden, Team},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/bledden/corch-v13-balanced},
  note = {87.30% accuracy via balanced synthetic data generation}
}

License

MIT License

Evaluation results

Average Accuracy
self-reported

87.300
Domain Accuracy
self-reported

100.000
Capability Accuracy
self-reported

100.000

somethingobscurefordevstuff
/

corch-v13-balanced