RizhongLin's picture
Upload README.md with huggingface_hub
2946436 verified
metadata
language:
  - en
tags:
  - dpo-baseline
  - qwen3-0.6b-base
  - cs-552
  - stochastic-parrots
license: apache-2.0

DPO Baseline Model (Qwen3-0.6B-Base)

This model is a minimally modified version of the Qwen3-0.6B-Base model, intended to serve as a baseline for DPO evaluation in the CS-552 Stochastic Parrots project.

Model Description

This model was created by loading the Qwen3-0.6B-Base model and performing minimal training on a single example for 2 epochs. The purpose is to create a model that is nearly identical to the original but with slight differences, making it useful as a reference model in DPO evaluation.

Usage

This model can be used as a reference model in DPO evaluation pipelines, allowing for more meaningful accuracy metrics when comparing with DPO-trained models.

Training Procedure

  • Base model: Qwen3-0.6B-Base
  • Training data: 1 example
  • Training epochs: 2
  • Training method: LoRA fine-tuning with minimal configuration
  • Date created: 2025-05-25