DPO Baseline Model (Qwen3-0.6B-Base)
This model is a minimally modified version of the Qwen3-0.6B-Base model, intended to serve as a baseline for DPO evaluation in the CS-552 Stochastic Parrots project.
Model Description
This model was created by loading the Qwen3-0.6B-Base model and performing minimal training on a single example for 2 epochs. The purpose is to create a model that is nearly identical to the original but with slight differences, making it useful as a reference model in DPO evaluation.
Usage
This model can be used as a reference model in DPO evaluation pipelines, allowing for more meaningful accuracy metrics when comparing with DPO-trained models.
Training Procedure
- Base model: Qwen3-0.6B-Base
- Training data: 1 example
- Training epochs: 2
- Training method: LoRA fine-tuning with minimal configuration
- Date created: 2025-05-25
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support