Text Classification
Safetensors
GLiClass
text classification
nli
sentiment analysis
BioMike commited on
Commit
800e3b8
·
verified ·
1 Parent(s): c09fb5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +374 -169
README.md CHANGED
@@ -1,199 +1,404 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
11
 
12
- ## Model Details
 
 
 
 
 
 
 
 
13
 
14
- ### Model Description
 
 
 
 
 
 
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
 
 
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
- ## Uses
 
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
39
 
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
 
174
 
175
- **BibTeX:**
176
 
177
- [More Information Needed]
178
 
179
- **APA:**
 
 
 
 
 
 
 
180
 
181
- [More Information Needed]
 
 
 
182
 
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
 
187
- [More Information Needed]
188
 
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
192
 
193
- ## Model Card Authors [optional]
 
 
 
 
 
 
 
 
 
 
 
194
 
195
- [More Information Needed]
196
 
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - BioMike/formal-logic-reasoning-gliclass-2k
5
+ - knowledgator/gliclass-v3-logic-dataset
6
+ - tau/commonsense_qa
7
+ metrics:
8
+ - f1
9
+ tags:
10
+ - text classification
11
+ - nli
12
+ - sentiment analysis
13
+ pipeline_tag: text-classification
14
+ language:
15
+ - en
16
+ - sv
17
+ - cs
18
+ - pl
19
+ - lt
20
+ - et
21
+ - lv
22
+ - es
23
+ - fi
24
+ - de
25
+ - fr
26
+ - ro
27
+ - it
28
+ - pt
29
+ - nl
30
+ - uk
31
+ - hi
32
+ - zh
33
+ - ar
34
+ - he
35
  ---
36
 
37
+ ![image/png](gliclass_multilang.png)
38
 
39
+ ![Multilingual Quality vs Throughput](model_comparison_multi.png)
40
 
41
+ # GLiClass Multilang: Efficient multilingual zero-shot and few-shot multi-task model via sequence classification
42
 
43
+ GLiClass is an efficient zero-shot sequence classification model designed to achieve SoTA performance while being much faster than cross-encoders and LLMs, while preserving strong generalization capabilities.
44
 
45
+ The model supports text classification with any labels and can be used for the following tasks:
46
+ * Topic Classification
47
+ * Sentiment Analysis
48
+ * Intent Classification
49
+ * Reranking
50
+ * Hallucination Detection
51
+ * Rule-following Verification
52
+ * LLM-safety Classification
53
+ * Natural Language Inference
54
 
55
+ ## What's New in GLiClass Multilang
56
+ - **Multilingual Training** — Natively trained on 20 languages: Swedish, Norwegian, Czech, Polish, Lithuanian, Estonian, Latvian, Spanish, Finnish, German, French, Romanian, Italian, Portuguese, Dutch, Ukrainian, Hindi, Chinese, Arabic, and Hebrew.
57
+ - **Cross-lingual Classification** — Labels and input texts can be in different languages; classify a German document with English labels, or mix languages freely across inputs and labels.
58
+ - **CrossAttn Scorer** — A new cross-attention scorer enables more efficient pooling independently for each label with unpadding and flash-attn.
59
+ - **Hierarchical Labels** — Organize labels into groups using dot notation or dictionaries (e.g., `sentiment.positive`, `topic.product`).
60
+ - **Few-Shot Examples** — Provide in-context examples to boost accuracy on your specific task.
61
+ - **Label Descriptions** — Add natural-language descriptions to labels for more precise classification.
62
+ - **Task Prompts** — Prepend a custom prompt to guide the model's classification behavior.
63
 
64
+ See the [GLiClass library README](https://github.com/Knowledgator/GLiClass) for full details on these features.
65
 
66
+ ## Installation
67
 
68
+ ```bash
69
+ pip install gliclass
70
+ ```
 
 
 
 
71
 
72
+ ## Quick Start
73
 
74
+ ```python
75
+ from gliclass import GLiClassModel, ZeroShotClassificationPipeline
76
+ from transformers import AutoTokenizer
77
 
78
+ model = GLiClassModel.from_pretrained("knowledgator/gliclass-multilang-mini")
79
+ tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-multilang-mini")
80
+ pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
81
 
82
+ text = "NASA launched a new Mars rover to search for signs of ancient life."
83
+ labels = ["space", "politics", "sports", "technology", "health"]
84
 
85
+ results = pipeline(text, labels, threshold=0.5)[0]
86
+ for r in results:
87
+ print(r["label"], "=>", r["score"])
88
+ ```
89
 
90
+ ---
91
+ ## Multilingual & Cross-lingual Capabilities
92
+
93
+ Natively trained on 20 languages. Labels and texts can be in different languages.
94
+
95
+ **Same language (German):**
96
+ ```python
97
+ from gliclass import GLiClassModel, ZeroShotClassificationPipeline
98
+ from transformers import AutoTokenizer
99
+
100
+ model = GLiClassModel.from_pretrained("knowledgator/gliclass-multilang-mini")
101
+ tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-multilang-mini")
102
+ pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
103
+
104
+ text = "Die NASA hat einen neuen Mars-Rover gestartet, um nach Spuren alten Lebens zu suchen."
105
+ labels = ["Weltraum", "Politik", "Sport", "Technologie", "Gesundheit"]
106
+ results = pipeline(text, labels, threshold=0.5)[0]
107
+ for r in results:
108
+ print(r["label"], "=>", r["score"])
109
+ ```
110
+
111
+ **Cross-lingual (French text, English labels):**
112
+ ```python
113
+ text = "Le gouvernement français a annoncé de nouvelles mesures économiques."
114
+ labels = ["economy", "politics", "sports", "technology"]
115
+ results = pipeline(text, labels, threshold=0.5)[0]
116
+ for r in results:
117
+ print(r["label"], "=>", r["score"])
118
+ ```
119
+
120
+ **Cross-lingual (Arabic text, English labels):**
121
+ ```python
122
+ text = "أطلقت ناسا مركبة جديدة للمريخ للبحث عن آثار الحياة القديمة."
123
+ labels = ["space", "politics", "sports", "technology"]
124
+ results = pipeline(text, labels, threshold=0.5)[0]
125
+ for r in results:
126
+ print(r["label"], "=>", r["score"])
127
+ ```
128
+
129
+ **Cross-lingual (English text, Spanish labels):**
130
+ ```python
131
+ text = "NASA launched a new Mars rover to search for signs of ancient life."
132
+ labels = ["espacio", "política", "deportes", "tecnología", "salud"]
133
+ results = pipeline(text, labels, threshold=0.5)[0]
134
+ for r in results:
135
+ print(r["label"], "=>", r["score"])
136
+ ```
137
+
138
+ <details>
139
+ <summary>General Examples</summary>
140
+
141
+ ### 1. Topic Classification
142
+
143
+ ```python
144
+ text = "NASA launched a new Mars rover to search for signs of ancient life."
145
+ labels = ["space", "politics", "sports", "technology", "health"]
146
+
147
+ results = pipeline(text, labels, threshold=0.5)[0]
148
+ for r in results:
149
+ print(r["label"], "=>", r["score"])
150
+ ```
151
+
152
+ #### With hierarchical labels
153
+
154
+ ```python
155
+ hierarchical_labels = {
156
+ "science": ["space", "biology", "physics"],
157
+ "society": ["politics", "economics", "culture"]
158
+ }
159
+
160
+ results = pipeline(text, hierarchical_labels, threshold=0.5)[0]
161
+ for r in results:
162
+ print(r["label"], "=>", r["score"])
163
+ # e.g. science.space => 0.95
164
+ ```
165
+
166
+ ### 2. Sentiment Analysis
167
+
168
+ ```python
169
+ text = "The food was excellent but the service was painfully slow."
170
+ labels = ["positive", "negative", "neutral"]
171
+
172
+ results = pipeline(text, labels, threshold=0.5)[0]
173
+ for r in results:
174
+ print(r["label"], "=>", r["score"])
175
+ ```
176
+
177
+ #### With a task prompt
178
+
179
+ ```python
180
+ results = pipeline(
181
+ text, labels,
182
+ prompt="Classify the sentiment of this restaurant review:",
183
+ threshold=0.5
184
+ )[0]
185
+ ```
186
+
187
+ ### 3. Intent Classification
188
+
189
+ ```python
190
+ text = "Can you set an alarm for 7am tomorrow?"
191
+ labels = ["set_alarm", "play_music", "get_weather", "send_message", "set_reminder"]
192
+
193
+ results = pipeline(text, labels, threshold=0.5)[0]
194
+ for r in results:
195
+ print(r["label"], "=>", r["score"])
196
+ ```
197
+
198
+ ### 4. Natural Language Inference
199
+
200
+ Represent your premise as the text and the hypothesis as a label. The model works best with a single hypothesis at a time.
201
+
202
+ ```python
203
+ text = "The cat slept on the windowsill all afternoon."
204
+ labels = ["The cat was awake and playing outside."]
205
+
206
+ results = pipeline(text, labels, threshold=0.0)[0]
207
+ print(results)
208
+ # Low score → contradiction
209
+ ```
210
+
211
+ ### 5. Reranking
212
+
213
+ Score query–passage relevance by treating passages as texts and the query as the label:
214
+
215
+ ```python
216
+ query = "How to train a neural network?"
217
+ passages = [
218
+ "Backpropagation is the key algorithm for training deep neural networks.",
219
+ "The stock market rallied on strong earnings reports.",
220
+ "Gradient descent optimizes model weights during training.",
221
+ ]
222
+
223
+ for passage in passages:
224
+ score = pipeline(passage, [query], threshold=0.0)[0][0]["score"]
225
+ print(f"{score:.3f} {passage[:60]}")
226
+ ```
227
 
228
+ ### 6. Rule-following Verification
229
 
230
+ Include the domain and rules as part of the text:
231
 
232
+ ```python
233
+ text = (
234
+ "Domain: e-commerce product reviews\n"
235
+ "Rule: No promotion of illegal activity.\n"
236
+ "Text: The software is okay, but search for 'productname_patch_v2.zip' "
237
+ "to unlock all features for free."
238
+ )
239
+ labels = ["follows_guidelines", "violates_guidelines"]
240
 
241
+ results = pipeline(text, labels, threshold=0.0)[0]
242
+ for r in results:
243
+ print(r["label"], "=>", r["score"])
244
+ ```
245
 
246
+ </details>
247
 
248
+ ---
249
 
250
+ ## Benchmarks
251
 
252
+ ### Model Overview
253
 
254
+ Summary across all evaluated multilingual-capable models (zero-shot, no fine-tuning). Speed averaged over all label counts and text lengths at batch_size=8 on NVIDIA RTX PRO 6000 Blackwell.
255
 
256
+ | Model | Params | English avg F1 | Multilingual avg F1 | Throughput (samp/s, bs=8) |
257
+ |---|---:|---:|---:|---:|
258
+ | [multilang‑ultra](https://huggingface.co/knowledgator/gliclass-multilang-ultra) | ~1 720M | **0.7212** | **0.5599** | 200.7 |
259
+ | [multilang‑mini](https://huggingface.co/knowledgator/gliclass-multilang-mini) | ~288M | 0.6827 | 0.5378 | **513.4** |
260
+ | [multilang‑edge](https://huggingface.co/knowledgator/gliclass-multilang-edge) | ~140M | 0.6196 | 0.3959 | **553.6** |
261
+ | [instruct‑large](https://huggingface.co/knowledgator/gliclass-instruct-large-v1.0) | ~435M | 0.7199 | — | 293.9 |
262
+ | [instruct‑base](https://huggingface.co/knowledgator/gliclass-instruct-base-v1.0) | ~184M | 0.6525 | — | 521.9 |
263
+ | [gliner2‑large‑v1](https://huggingface.co/fastino/gliner2-large-v1) | 340M | 0.6774 | — | 122.5 |
264
+ | [gliner2‑multi‑v1](https://huggingface.co/fastino/gliner2-multi-v1) | ~278M | 0.6387 | 0.4659 | 200.2 |
265
+ | [gliner2‑base‑v1](https://huggingface.co/fastino/gliner2-base-v1) | ~184M | 0.6336 | — | 224.0 |
266
+ | [bge‑m3‑zeroshot‑v2.0](https://huggingface.co/MoritzLaurer/bge-m3-zeroshot-v2.0) | 568M | 0.5927 | 0.5225 | 208.7 |
267
+ | [mDeBERTa‑mnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 300M | 0.5340 | 0.3926 | 160.6 |
268
 
269
+ > Multilingual avg F1 is the mean of 6 dataset-level scores (GermEval2017, MASSIVE, PolygloToxicityPrompts, SIB-200, TextDetox, TweetSentiment). Models without multilingual results (—) were only evaluated on English datasets.
270
 
271
+ ---
272
 
273
+ F1 scores on zero-shot text classification (no fine-tuning on these datasets):
274
+
275
+ **Table A: GLiClass Multilang (macro F1)**
276
+
277
+ | Dataset | [multilang‑ultra](https://huggingface.co/knowledgator/gliclass-multilang-ultra) | [multilang‑mini](https://huggingface.co/knowledgator/gliclass-multilang-mini) | [multilang‑edge](https://huggingface.co/knowledgator/gliclass-multilang-edge) |
278
+ |---|---|---|---|
279
+ | CR | 0.9226 | 0.9042 | 0.8852 |
280
+ | sst2 | 0.9065 | 0.8810 | 0.8276 |
281
+ | sst5 | 0.3049 | 0.2806 | 0.3047 |
282
+ | 20_newsgroups | 0.5238 | 0.4242 | 0.3522 |
283
+ | spam | 0.9625 | 0.9385 | 0.6787 |
284
+ | financial_phrasebank | 0.8724 | 0.7156 | 0.7446 |
285
+ | imdb | 0.9330 | 0.9011 | 0.8730 |
286
+ | ag_news | 0.7454 | 0.7545 | 0.7338 |
287
+ | emotion | 0.4825 | 0.4655 | 0.4267 |
288
+ | cap_sotu | 0.4385 | 0.4087 | 0.3516 |
289
+ | rotten_tomatoes | 0.8413 | 0.8236 | 0.7044 |
290
+ | massive | 0.6483 | 0.5853 | 0.5649 |
291
+ | banking | 0.6492 | 0.5853 | 0.5788 |
292
+ | snips | 0.8653 | 0.8900 | 0.6487 |
293
+ | **AVERAGE** | **0.7212** | **0.6827** | **0.6196** |
294
+
295
+ **Table B: Baselines (macro F1)**
296
+
297
+ | Dataset | [gliner2‑large‑v1](https://huggingface.co/fastino/gliner2-large-v1) | [gliner2‑multi‑v1](https://huggingface.co/fastino/gliner2-multi-v1) | [gliner2‑base‑v1](https://huggingface.co/fastino/gliner2-base-v1) | [bge‑m3‑zeroshot‑v2.0](https://huggingface.co/MoritzLaurer/bge-m3-zeroshot-v2.0) | [mDeBERTa‑mnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) |
298
+ |---|---|---|---|---|---|
299
+ | CR | 0.9117 | 0.8785 | 0.8783 | 0.9041 | 0.8956 |
300
+ | sst2 | 0.8911 | 0.8568 | 0.8737 | 0.9257 | 0.8516 |
301
+ | sst5 | 0.4462 | 0.3784 | 0.4100 | 0.2931 | 0.3023 |
302
+ | 20_newsgroups | 0.5163 | 0.3668 | 0.4608 | 0.4161 | 0.2080 |
303
+ | spam | 0.3558 | 0.5986 | 0.3843 | 0.4410 | 0.4980 |
304
+ | financial_phrasebank | 0.8330 | 0.7372 | 0.7225 | 0.5040 | 0.4444 |
305
+ | imdb | 0.9170 | 0.8934 | 0.8982 | 0.8730 | 0.8264 |
306
+ | ag_news | 0.7029 | 0.7403 | 0.7193 | 0.6870 | 0.6547 |
307
+ | emotion | 0.5233 | 0.4666 | 0.4577 | 0.4530 | 0.4055 |
308
+ | cap_sotu | 0.4387 | 0.3972 | 0.3831 | 0.4720 | 0.3390 |
309
+ | rotten_tomatoes | 0.7909 | 0.7210 | 0.6979 | 0.8130 | 0.6931 |
310
+ | massive | 0.5897 | 0.4721 | 0.5403 | 0.4140 | 0.2527 |
311
+ | banking | 0.6885 | 0.6390 | 0.6709 | 0.3870 | 0.3796 |
312
+ | snips | 0.8788 | 0.7954 | 0.7731 | 0.7149 | 0.7245 |
313
+ | **AVERAGE** | **0.6774** | **0.6387** | **0.6336** | **0.5927** | **0.5340** |
314
+
315
+ **Table C: GLiClass-V1 Multitask (macro F1)**
316
+
317
+ | Dataset | [instruct‑large‑v1.0](https://huggingface.co/knowledgator/gliclass-instruct-large-v1.0) | [instruct‑base‑v1.0](https://huggingface.co/knowledgator/gliclass-instruct-base-v1.0) | [edge‑v1.0](https://huggingface.co/knowledgator/gliclass-instruct-edge-v1.0) |
318
+ |---|---|---|---|
319
+ | CR | 0.9066 | 0.8922 | 0.7933 |
320
+ | sst2 | 0.9154 | 0.9198 | 0.7577 |
321
+ | sst5 | 0.3387 | 0.2266 | 0.2163 |
322
+ | 20_newsgroups | 0.5577 | 0.5189 | 0.2555 |
323
+ | spam | 0.9790 | 0.9380 | 0.7609 |
324
+ | financial_phrasebank | 0.8289 | 0.5217 | 0.3905 |
325
+ | imdb | 0.9397 | 0.9364 | 0.8159 |
326
+ | ag_news | 0.7521 | 0.6978 | 0.6043 |
327
+ | emotion | 0.4473 | 0.4454 | 0.2941 |
328
+ | cap_sotu | 0.4327 | 0.4579 | 0.2380 |
329
+ | rotten_tomatoes | 0.8491 | 0.8458 | 0.5455 |
330
+ | massive | 0.5824 | 0.4757 | 0.2090 |
331
+ | banking | 0.6987 | 0.6072 | 0.4635 |
332
+ | snips | 0.8509 | 0.6515 | 0.5461 |
333
+ | **AVERAGE** | **0.7199** | **0.6525** | **0.4922** |
334
+
335
+ ### Multilingual Benchmarks
336
+
337
+ Macro F1 averaged per dataset across all evaluated languages:
338
+
339
+ | Dataset | [multilang‑ultra](https://huggingface.co/knowledgator/gliclass-multilang-ultra) | [multilang‑mini](https://huggingface.co/knowledgator/gliclass-multilang-mini) | [multilang‑edge](https://huggingface.co/knowledgator/gliclass-multilang-edge) | [gliner2‑multi‑v1](https://huggingface.co/fastino/gliner2-multi-v1) | [bge‑m3‑zeroshot‑v2.0](https://huggingface.co/MoritzLaurer/bge-m3-zeroshot-v2.0) | [mDeBERTa‑mnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) |
340
+ |---|---|---|---|---|---|---|
341
+ | germeval2017 | 0.4647 | 0.4826 | 0.4094 | 0.4223 | 0.4503 | 0.2849 |
342
+ | massive | 0.5635 | 0.4925 | 0.2853 | 0.3625 | 0.4646 | 0.2427 |
343
+ | polyglot_toxicity | 0.7367 | 0.7110 | 0.4474 | 0.6630 | 0.6809 | 0.5698 |
344
+ | sib200 | 0.1935 | 0.1921 | 0.1492 | 0.1750 | 0.1891 | 0.1476 |
345
+ | textdetox | 0.7428 | 0.7313 | 0.5811 | 0.5912 | 0.7510 | 0.6490 |
346
+ | tweet_sentiment | 0.6579 | 0.6171 | 0.5030 | 0.5814 | 0.5991 | 0.4615 |
347
+ | **AVERAGE** | **0.5599** | **0.5378** | **0.3959** | **0.4659** | **0.5225** | **0.3926** |
348
+
349
+ Per-language macro F1 (16-language fair comparison on massive + sib200):
350
+
351
+ | Language | [multilang‑ultra](https://huggingface.co/knowledgator/gliclass-multilang-ultra) | [multilang‑mini](https://huggingface.co/knowledgator/gliclass-multilang-mini) | [multilang‑edge](https://huggingface.co/knowledgator/gliclass-multilang-edge) | [gliner2‑multi‑v1](https://huggingface.co/fastino/gliner2-multi-v1) | [bge‑m3‑zeroshot‑v2.0](https://huggingface.co/MoritzLaurer/bge-m3-zeroshot-v2.0) | [mDeBERTa‑mnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) |
352
+ |---|---|---|---|---|---|---|
353
+ | arabic | 0.3210 | 0.3043 | 0.1843 | 0.2394 | 0.2862 | 0.1567 |
354
+ | chinese | 0.3888 | 0.3636 | 0.2724 | 0.2947 | 0.3459 | 0.2356 |
355
+ | dutch | 0.3949 | 0.3587 | 0.2660 | 0.2828 | 0.3284 | 0.2146 |
356
+ | finnish | 0.3632 | 0.3174 | 0.1172 | 0.2704 | 0.3357 | 0.1884 |
357
+ | french | 0.3965 | 0.3679 | 0.2963 | 0.2946 | 0.3396 | 0.1978 |
358
+ | german | 0.3654 | 0.3457 | 0.2532 | 0.2767 | 0.3164 | 0.1966 |
359
+ | hebrew | 0.3521 | 0.3206 | 0.1271 | 0.2641 | 0.3287 | 0.1796 |
360
+ | hindi | 0.3934 | 0.3529 | 0.1877 | 0.0817 | 0.3240 | 0.1986 |
361
+ | italian | 0.3919 | 0.3474 | 0.2604 | 0.2891 | 0.3146 | 0.1976 |
362
+ | latvian | 0.3643 | 0.3165 | 0.1205 | 0.2741 | 0.3163 | 0.1774 |
363
+ | norwegian | 0.3770 | 0.3489 | 0.2043 | 0.2803 | 0.3382 | 0.1965 |
364
+ | polish | 0.3961 | 0.3577 | 0.2112 | 0.2814 | 0.3225 | 0.1981 |
365
+ | portuguese | 0.4008 | 0.3482 | 0.2798 | 0.3057 | 0.3346 | 0.1936 |
366
+ | romanian | 0.3740 | 0.3204 | 0.2210 | 0.2831 | 0.3291 | 0.1944 |
367
+ | spanish | 0.3921 | 0.3535 | 0.2905 | 0.2924 | 0.3371 | 0.1918 |
368
+ | swedish | 0.3863 | 0.3547 | 0.2121 | 0.2799 | 0.3317 | 0.2019 |
369
+ | **AVERAGE** | **0.3786** | **0.3424** | **0.2190** | **0.2681** | **0.3268** | **0.1950** |
370
+
371
+ ## Throughput
372
+
373
+ ![English Quality vs Throughput](model_comparison_en.png)
374
+
375
+ Throughput (samples/sec), batch_size=8, GPU: NVIDIA RTX PRO 6000 Blackwell. Averaged over text lengths (64 / 256 / 512 tokens).
376
+
377
+ | Model | 1 label | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | **avg** |
378
+ |---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
379
+ | [multilang‑ultra](https://huggingface.co/knowledgator/gliclass-multilang-ultra) | 308.2 | 302.5 | 281.8 | 266.3 | 235.9 | 190.5 | 125.2 | 64.7 | 31.5 | **200.7** |
380
+ | [multilang‑mini](https://huggingface.co/knowledgator/gliclass-multilang-mini) | 708.4 | 703.9 | 692.5 | 664.2 | 618.1 | 518.1 | 396.1 | 221.2 | 98.2 | **513.4** |
381
+ | [multilang‑edge](https://huggingface.co/knowledgator/gliclass-multilang-edge) | 697.0 | 699.7 | 689.5 | 671.0 | 637.7 | 553.3 | 469.8 | 345.2 | 219.2 | **553.6** |
382
+ | [instruct‑large](https://huggingface.co/knowledgator/gliclass-instruct-large-v1.0) | 397.2 | 393.1 | 386.6 | 374.2 | 351.1 | 313.3 | 223.8 | 142.2 | 63.2 | **293.9** |
383
+ | [instruct‑base](https://huggingface.co/knowledgator/gliclass-instruct-base-v1.0) | 708.0 | 707.5 | 693.5 | 666.4 | 616.7 | 526.5 | 405.5 | 248.1 | 124.9 | **521.9** |
384
+ | [gliner2‑large‑v1](https://huggingface.co/fastino/gliner2-large-v1) | 165.6 | 165.2 | 157.1 | 155.6 | 142.1 | 122.1 | 98.6 | 65.6 | 31.0 | **122.5** |
385
+ | [gliner2‑multi‑v1](https://huggingface.co/fastino/gliner2-multi-v1) | 270.4 | 267.9 | 264.6 | 257.3 | 237.2 | 200.0 | 159.2 | 96.8 | 48.4 | **200.2** |
386
+ | [gliner2‑base‑v1](https://huggingface.co/fastino/gliner2-base-v1) | 296.8 | 293.2 | 287.8 | 278.9 | 262.0 | 229.4 | 180.1 | 121.3 | 66.2 | **224.0** |
387
+ | [bge‑m3‑zeroshot‑v2.0](https://huggingface.co/MoritzLaurer/bge-m3-zeroshot-v2.0) | 940.0 | 474.7 | 238.4 | 112.9 | 58.3 | 28.9 | 14.4 | 7.2 | 3.7 | **208.7** |
388
+ | [mDeBERTa‑mnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 717.5 | 364.5 | 183.1 | 91.8 | 45.7 | 22.8 | 11.4 | 5.7 | 3.0 | **160.6** |
389
+
390
+ > NLI models (bge-m3, mDeBERTa) run one forward pass per label — throughput drops linearly with label count. GLiClass and GLiNER2 encode all labels in a single pass, so throughput stays nearly flat.
391
+
392
+ ## Citation
393
+
394
+ ```bibtex
395
+ @misc{stepanov2025gliclassgeneralistlightweightmodel,
396
+ title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks},
397
+ author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko},
398
+ year={2025},
399
+ eprint={2508.07662},
400
+ archivePrefix={arXiv},
401
+ primaryClass={cs.LG},
402
+ url={https://arxiv.org/abs/2508.07662},
403
+ }
404
+ ```