You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Card for fineweb-edu-zhtw-classifier

fineweb-edu-zhtw-classifier 是用來過濾繁體中文網頁文本「教育性」程度的輕量級分類器。建構於 google/embeddinggemma-300m 之上，以 fineweb-edu-zhtw-magistral-annotations 為訓練資料微調，輸出 c0／c1／c2 三類教育性標籤，作為 fineweb-edu-zhtw 過濾流程之核心模型。

⚠️ 規格重點：本模型為 300M 參數 embedding + classification head 模型，不是生成模型；輸出為三分類標籤與 confidence。

Model Details

Model Description

Developed by: Liang Hsun Huang, Min YI Chen
Funded by: APMIC
Shared by: Twinkle AI
Model type: Embedding + classification head
Language(s) (NLP): Traditional Chinese & English
License: gemma
Finetuned from model: google/embeddinggemma-300m

Model Sources [optional]

Repository: lianghsun/fineweb-edu-zhtw-classifier
Paper: TBA

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors

Liang Hsun Huang

Model Card Contact

Liang Hsun Huang

Downloads last month: -

Safetensors

Model size

0.3B params

Tensor type

BF16

Dataset used to train lianghsun/fineweb-edu-zhtw-classifier

Collections including lianghsun/fineweb-edu-zhtw-classifier

📚 FineWeb-Edu-zhtw

Collection

4 items • Updated Nov 13, 2025

🪐 Gemma-3-Taiwan

Collection

4 items • Updated Nov 13, 2025 • 1

Paper for lianghsun/fineweb-edu-zhtw-classifier

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 47

Evaluation results

Loss on fineweb-edu-zhtw-magistral-annotations
self-reported

0.213
Precision on fineweb-edu-zhtw-magistral-annotations
self-reported

0.767
Recall on fineweb-edu-zhtw-magistral-annotations
self-reported

0.784
F1 (Macro) on fineweb-edu-zhtw-magistral-annotations
self-reported

0.766
Accuracy on fineweb-edu-zhtw-magistral-annotations
self-reported

0.809