MNLP_M2_dpo_baseline_model / README.md

RizhongLin

Upload README.md with huggingface_hub

2946436 verified 7 months ago

preview code

raw

history blame contribute delete

991 Bytes

metadata

language:
  - en
tags:
  - dpo-baseline
  - qwen3-0.6b-base
  - cs-552
  - stochastic-parrots
license: apache-2.0

DPO Baseline Model (Qwen3-0.6B-Base)

This model is a minimally modified version of the Qwen3-0.6B-Base model, intended to serve as a baseline for DPO evaluation in the CS-552 Stochastic Parrots project.

Model Description

This model was created by loading the Qwen3-0.6B-Base model and performing minimal training on a single example for 2 epochs. The purpose is to create a model that is nearly identical to the original but with slight differences, making it useful as a reference model in DPO evaluation.

Usage

This model can be used as a reference model in DPO evaluation pipelines, allowing for more meaningful accuracy metrics when comparing with DPO-trained models.

Training Procedure

Base model: Qwen3-0.6B-Base
Training data: 1 example
Training epochs: 2
Training method: LoRA fine-tuning with minimal configuration
Date created: 2025-05-25