DPO Baseline Model (Qwen3-0.6B-Base)

This model is a minimally modified version of the Qwen3-0.6B-Base model, intended to serve as a baseline for DPO evaluation in the CS-552 Stochastic Parrots project.

Model Description

This model was created by loading the Qwen3-0.6B-Base model and performing minimal training on a single example for 2 epochs. The purpose is to create a model that is nearly identical to the original but with slight differences, making it useful as a reference model in DPO evaluation.

Usage

This model can be used as a reference model in DPO evaluation pipelines, allowing for more meaningful accuracy metrics when comparing with DPO-trained models.

Training Procedure

Base model: Qwen3-0.6B-Base
Training data: 1 example
Training epochs: 2
Training method: LoRA fine-tuning with minimal configuration
Date created: 2025-05-25

Downloads last month: 6

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support