File size: 22,623 Bytes
eccd289
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
# Deep Learning Emotion Classification - Code Explanation

This document provides a detailed line-by-line explanation of the `main.ipynb` notebook, which implements a multi-label emotion classification system using the DeBERTa transformer model with K-Fold cross-validation.

---

## Section 1: Imports & Setup

### Lines 18-36: Import Statements

```python
import numpy as np
import pandas as pd
```
- **numpy**: Used for numerical operations, array manipulation, and random seed setting
- **pandas**: Used for data loading and manipulation (CSV files, DataFrames)

```python
import torch
import torch.nn as nn
```
- **torch**: PyTorch deep learning framework for tensor operations and model training
- **torch.nn**: Neural network modules including loss functions

```python
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import f1_score
```
- **StratifiedKFold**: Creates k-fold splits while maintaining class distribution in each fold
- **f1_score**: Calculates F1 metric for evaluation (harmonic mean of precision and recall)

```python
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    get_linear_schedule_with_warmup,
    AutoConfig
)
```
- **AutoTokenizer**: Automatically loads the appropriate tokenizer for the specified model
- **AutoModelForSequenceClassification**: Pre-trained transformer model for classification tasks
- **get_linear_schedule_with_warmup**: Learning rate scheduler with warmup and linear decay
- **AutoConfig**: Model configuration loader

```python
from torch.optim import AdamW
```
- **AdamW**: Adam optimizer with decoupled weight decay (better than standard Adam for transformers)

```python
from torch.cuda.amp import autocast, GradScaler
```
- **autocast**: Enables automatic mixed precision (AMP) to speed up training
- **GradScaler**: Scales gradients for mixed precision training to prevent underflow

```python
import gc
import warnings
import os
```
- **gc**: Garbage collection to free up memory
- **warnings**: To suppress warning messages
- **os**: For file system operations and environment variables

```python
warnings.filterwarnings("ignore")
```
- Suppresses all warning messages for cleaner output

---

## Section 2: Configuration

### Lines 52-68: Configuration Class

```python
class Config:
    SEED = 42
```
- Sets random seed for reproducibility across all random operations

```python
    LABELS = ["anger", "fear", "joy", "sadness", "surprise"]
```
- Defines the 5 emotion labels for multi-label classification

```python
    MODEL_NAME = "microsoft/deberta-v3-base"
```
- Specifies the pre-trained model (DeBERTa v3 base - 184M parameters, SOTA performance)

```python
    MAX_LEN = 128
```
- Maximum sequence length for tokenization (tokens longer than this are truncated)

```python
    BATCH_SIZE = 16
```
- Number of samples processed together in one forward/backward pass

```python
    EPOCHS = 4
```
- Number of complete passes through the training dataset

```python
    LR = 1.5e-5
```
- Learning rate (1.5 × 10⁻⁵) - small value typical for fine-tuning transformers

```python
    WEIGHT_DECAY = 0.01
```
- L2 regularization strength to prevent overfitting

```python
    WARMUP_RATIO = 0.1
```
- Fraction of training steps used for learning rate warmup (10% of total steps)

```python
    N_FOLDS = 5
```
- Number of folds for K-Fold cross-validation

```python
    TRAIN_CSV = "/kaggle/input/2025-sep-dl-gen-ai-project/train.csv"
    TEST_CSV = "/kaggle/input/2025-sep-dl-gen-ai-project/test.csv"
```
- Paths to training and test datasets (Kaggle environment paths)

```python
    SUBMISSION_PATH = "submission.csv"
```
- Output file for predictions

```python
CONFIG = Config()
```
- Creates a global instance of the configuration class

---

## Section 3: Seed & Device Setup

### Lines 84-93: Reproducibility and Device Selection

```python
def set_seed(seed=CONFIG.SEED):
    np.random.seed(seed)
```
- Sets numpy's random seed for reproducible random number generation

```python
    torch.manual_seed(seed)
```
- Sets PyTorch's random seed for CPU operations

```python
    torch.cuda.manual_seed_all(seed)
```
- Sets PyTorch's random seed for all GPU devices

```python
    os.environ['PYTHONHASHSEED'] = str(seed)
```
- Sets hash seed for Python's built-in hash() function for reproducibility

```python
set_seed()
```
- Calls the seed setting function

```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
```
- Checks if GPU is available; uses GPU if available, otherwise falls back to CPU
- Prints the device being used for training

---

## Section 4: Utility Functions

### Lines 109-115: `ensure_text_column` Function

```python
def ensure_text_column(df: pd.DataFrame) -> pd.DataFrame:
    if "text" in df.columns:
        return df
```
- Checks if DataFrame already has a "text" column; if yes, returns unchanged

```python
    for c in ["comment_text", "sentence", "content", "review"]:
        if c in df.columns:
            return df.rename(columns={c: "text"})
```
- Searches for common alternative text column names
- Renames the first matching column to "text" for standardization

```python
    raise ValueError("No text column found. Add/rename your text column to 'text'.")
```
- Raises an error if no text column is found

### Lines 117-126: `tune_thresholds` Function

```python
def tune_thresholds(y_true: np.ndarray, y_prob: np.ndarray) -> np.ndarray:
    th = np.zeros(y_true.shape[1], dtype=np.float32)
```
- Creates array to store optimal threshold for each label (initialized to 0)
- Multi-label classification requires separate thresholds per label

```python
    for j in range(y_true.shape[1]):
        best_t, best_f1 = 0.5, -1
```
- Iterates through each label
- Initializes best threshold to 0.5 (default) and best F1 to -1

```python
        for t in np.linspace(0.1, 0.9, 17):
```
- Tests 17 threshold values evenly spaced between 0.1 and 0.9

```python
            f1 = f1_score(y_true[:, j], (y_prob[:, j] >= t).astype(int), zero_division=0)
```
- Calculates F1 score for current label and threshold
- Converts probabilities to binary predictions using threshold

```python
            if f1 > best_f1:
                best_f1, best_t = f1, t
```
- Updates best threshold if current F1 is better

```python
        th[j] = best_t
    return th
```
- Stores optimal threshold for each label and returns the array

### Lines 128-141: `get_optimizer_params` Function

```python
def get_optimizer_params(model, lr, weight_decay):
    param_optimizer = list(model.named_parameters())
```
- Gets all model parameters with their names

```python
    no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
```
- Lists parameters that should NOT have weight decay applied
- Bias and LayerNorm parameters typically trained without weight decay

```python
    optimizer_parameters = [
        {
            "params": [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
            "weight_decay": weight_decay,
        },
```
- First parameter group: all parameters EXCEPT bias and LayerNorm
- These parameters will have weight decay applied

```python
        {
            "params": [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
            "weight_decay": 0.0,
        },
    ]
```
- Second parameter group: only bias and LayerNorm parameters
- These parameters have weight decay set to 0.0

```python
    return optimizer_parameters
```
- Returns grouped parameters for differential weight decay

---

## Section 5: Dataset Class

### Lines 157-180: `EmotionDS` Class

```python
class EmotionDS(torch.utils.data.Dataset):
    def __init__(self, df, tokenizer, max_len, is_test=False):
```
- Custom PyTorch Dataset class for emotion classification
- `is_test` flag indicates whether this is test data (no labels)

```python
        self.texts = df["text"].tolist()
```
- Extracts text data as a Python list

```python
        self.is_test = is_test
        if not is_test:
            self.labels = df[CONFIG.LABELS].values.astype(np.float32)
```
- Stores test flag
- If training data, extracts multi-label targets as float32 array

```python
        self.tok = tokenizer
        self.max_len = max_len
```
- Stores tokenizer and max length for later use

```python
    def __len__(self):
        return len(self.texts)
```
- Returns dataset size (required by PyTorch)

```python
    def __getitem__(self, i):
        enc = self.tok(
            self.texts[i],
            truncation=True,
            padding="max_length",
            max_length=self.max_len,
            return_tensors="pt",
        )
```
- Tokenizes the text at index `i`
- **truncation**: Cuts text longer than max_len
- **padding**: Pads shorter sequences to max_len
- **return_tensors="pt"**: Returns PyTorch tensors

```python
        item = {k: v.squeeze(0) for k, v in enc.items()}
```
- Removes the batch dimension (1, seq_len) → (seq_len)
- Returns dict with keys: input_ids, attention_mask, token_type_ids (if applicable)

```python
        if not self.is_test:
            item["labels"] = torch.tensor(self.labels[i])
        return item
```
- Adds labels to the item dict if training data
- Returns the complete item

---

## Section 6: Training & Validation Helper Functions

### Lines 196-213: `train_one_epoch` Function

```python
def train_one_epoch(model, loader, optimizer, scheduler, scaler, criterion):
    model.train()
```
- Sets model to training mode (enables dropout, batch normalization updates)

```python
    losses = []
    for batch in loader:
```
- Initializes list to track losses
- Iterates through batches

```python
        batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()}
```
- Moves batch data to GPU (or CPU)
- `non_blocking=True`: Async transfer for faster processing

```python
        optimizer.zero_grad(set_to_none=True)
```
- Clears gradients from previous step
- `set_to_none=True`: More memory efficient than setting to zero

```python
        with autocast(enabled=True):
            out = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"])
            loss = criterion(out.logits, batch["labels"])
```
- **autocast**: Uses mixed precision (float16) for faster computation
- Forward pass through model
- Calculates loss between predictions (logits) and true labels

```python
        scaler.scale(loss).backward()
```
- Scales loss to prevent gradient underflow in mixed precision
- Computes gradients via backpropagation

```python
        scaler.unscale_(optimizer)
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
```
- Unscales gradients before clipping
- Clips gradients to maximum norm of 1.0 to prevent exploding gradients

```python
        scaler.step(optimizer)
        scaler.update()
```
- Updates model parameters (with scaled gradients)
- Updates the scaler's internal state

```python
        scheduler.step()
```
- Updates learning rate according to schedule

```python
        losses.append(loss.item())
    return np.mean(losses)
```
- Stores loss value
- Returns average loss for the epoch

### Lines 215-230: `validate` Function

```python
def validate(model, loader, criterion):
    model.eval()
```
- Sets model to evaluation mode (disables dropout, fixes batch norm)

```python
    losses = []
    preds = []
    targs = []
```
- Initializes lists for losses, predictions, and targets

```python
    with torch.no_grad():
```
- Disables gradient computation (saves memory and speeds up inference)

```python
        for batch in loader:
            batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()}
            with autocast(enabled=True):
                out = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"])
                loss = criterion(out.logits, batch["labels"])
```
- Moves batch to device
- Forward pass with mixed precision
- Calculates validation loss

```python
            losses.append(loss.item())
            preds.append(torch.sigmoid(out.logits).float().cpu().numpy())
            targs.append(batch["labels"].cpu().numpy())
```
- Stores loss
- Applies sigmoid to convert logits to probabilities [0, 1]
- Moves predictions and targets to CPU as numpy arrays

```python
    return np.mean(losses), np.vstack(preds), np.vstack(targs)
```
- Returns average loss, stacked predictions, and stacked targets

---

## Section 7: Main K-Fold Training Loop

### Lines 246-324: `run_training` Function

```python
def run_training():
    if not os.path.exists(CONFIG.TRAIN_CSV):
        print("Train CSV not found. Please check the path.")
        return None, None
```
- Checks if training data exists
- Returns None if not found (graceful failure)

```python
    df = pd.read_csv(CONFIG.TRAIN_CSV)
    df = ensure_text_column(df)
```
- Loads training data
- Ensures text column exists

```python
    skf = StratifiedKFold(n_splits=CONFIG.N_FOLDS, shuffle=True, random_state=CONFIG.SEED)
    y_str = df[CONFIG.LABELS].astype(str).agg("".join, axis=1)
```
- Creates 5-fold stratified splitter
- Converts multi-label to string representation for stratification
- Example: [1,0,1,0,0] → "10100"

```python
    oof_preds = np.zeros((len(df), len(CONFIG.LABELS)))
```
- Initializes out-of-fold predictions array (for all training samples)

```python
    tokenizer = AutoTokenizer.from_pretrained(CONFIG.MODEL_NAME)
```
- Loads DeBERTa tokenizer

```python
    for fold, (train_idx, val_idx) in enumerate(skf.split(df, y_str)):
        print(f"\n{'='*20} FOLD {fold+1}/{CONFIG.N_FOLDS} {'='*20}")
```
- Iterates through each fold
- `train_idx`: indices for training, `val_idx`: indices for validation

```python
        df_tr = df.iloc[train_idx].reset_index(drop=True)
        df_va = df.iloc[val_idx].reset_index(drop=True)
```
- Splits data into training and validation sets for current fold
- Resets index for clean indexing

```python
        ds_tr = EmotionDS(df_tr, tokenizer, CONFIG.MAX_LEN)
        ds_va = EmotionDS(df_va, tokenizer, CONFIG.MAX_LEN)
```
- Creates PyTorch datasets for training and validation

```python
        dl_tr = torch.utils.data.DataLoader(ds_tr, batch_size=CONFIG.BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)
        dl_va = torch.utils.data.DataLoader(ds_va, batch_size=CONFIG.BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True)
```
- Creates data loaders
- **shuffle=True** for training (randomizes batch order)
- **shuffle=False** for validation (keeps consistent order)
- **num_workers=2**: Uses 2 subprocesses for data loading
- **pin_memory=True**: Speeds up CPU→GPU transfer

```python
        model = AutoModelForSequenceClassification.from_pretrained(
            CONFIG.MODEL_NAME, 
            num_labels=len(CONFIG.LABELS),
            problem_type="multi_label_classification"
        )
        model.to(device)
```
- Loads pre-trained DeBERTa model
- Configures for 5-label multi-label classification
- Moves model to GPU/CPU

```python
        optimizer_params = get_optimizer_params(model, CONFIG.LR, CONFIG.WEIGHT_DECAY)
        optimizer = AdamW(optimizer_params, lr=CONFIG.LR)
```
- Gets parameter groups with differential weight decay
- Creates AdamW optimizer

```python
        total_steps = len(dl_tr) * CONFIG.EPOCHS
        scheduler = get_linear_schedule_with_warmup(
            optimizer, 
            num_warmup_steps=int(total_steps * CONFIG.WARMUP_RATIO), 
            num_training_steps=total_steps
        )
```
- Calculates total training steps
- Creates learning rate scheduler:
  - Warmup: LR increases linearly for 10% of steps
  - Decay: LR decreases linearly to 0 for remaining 90%

```python
        criterion = nn.BCEWithLogitsLoss()
        scaler = GradScaler(enabled=True)
```
- **BCEWithLogitsLoss**: Binary cross-entropy loss for multi-label classification
- Creates gradient scaler for mixed precision

```python
        best_f1 = 0
        best_state = None
```
- Initializes tracking for best model

```python
        for ep in range(CONFIG.EPOCHS):
            train_loss = train_one_epoch(model, dl_tr, optimizer, scheduler, scaler, criterion)
            val_loss, val_preds, val_targs = validate(model, dl_va, criterion)
```
- Trains for one epoch
- Validates on validation set

```python
            val_f1 = f1_score(val_targs, (val_preds >= 0.5).astype(int), average="macro", zero_division=0)
```
- Calculates macro F1 score (average F1 across all labels)
- Uses 0.5 threshold for predictions

```python
            print(f"Ep {ep+1}: TrLoss={train_loss:.4f} | VaLoss={val_loss:.4f} | VaF1={val_f1:.4f}")
```
- Prints epoch metrics

```python
            if val_f1 > best_f1:
                best_f1 = val_f1
                best_state = model.state_dict()
```
- Saves model state if validation F1 improves

```python
        torch.save(best_state, f"model_fold_{fold}.pth")
```
- Saves best model weights to disk

```python
        model.load_state_dict(best_state)
        _, val_preds, _ = validate(model, dl_va, criterion)
        oof_preds[val_idx] = val_preds
```
- Loads best weights
- Gets predictions on validation set
- Stores out-of-fold predictions

```python
        del model, optimizer, scaler, scheduler
        torch.cuda.empty_cache()
        gc.collect()
```
- Deletes objects to free memory
- Clears GPU cache
- Runs garbage collector

```python
    return oof_preds, df[CONFIG.LABELS].values
```
- Returns out-of-fold predictions and true labels

```python
if os.path.exists(CONFIG.TRAIN_CSV):
    oof_preds, y_true = run_training()
else:
    print("Skipping training as data is not found (likely in a dry-run environment).")
```
- Executes training if data exists
- Otherwise skips gracefully

---

## Section 8: Threshold Optimization

### Lines 340-347: Threshold Tuning

```python
if os.path.exists(CONFIG.TRAIN_CSV):
    best_thresholds = tune_thresholds(y_true, oof_preds)
```
- Finds optimal threshold for each emotion label using validation predictions

```python
    oof_tuned = (oof_preds >= best_thresholds).astype(int)
```
- Converts probabilities to binary predictions using optimized thresholds

```python
    final_f1 = f1_score(y_true, oof_tuned, average="macro", zero_division=0)
    print(f"\nFinal CV Macro F1: {final_f1:.4f}")
    print(f"Best Thresholds: {best_thresholds}")
```
- Calculates cross-validated F1 score with optimized thresholds
- Prints final performance and optimal thresholds

```python
else:
    best_thresholds = np.array([0.5] * len(CONFIG.LABELS))
```
- Falls back to 0.5 thresholds if training data not available

---

## Section 9: Inference & Submission

### Lines 363-420: `predict_test` Function

```python
def predict_test(thresholds):
    if not os.path.exists(CONFIG.TEST_CSV):
        print("Test CSV not found.")
        return
```
- Checks if test data exists

```python
    df_test = pd.read_csv(CONFIG.TEST_CSV)
    df_test = ensure_text_column(df_test)
```
- Loads test data and ensures text column

```python
    tokenizer = AutoTokenizer.from_pretrained(CONFIG.MODEL_NAME)
    ds_test = EmotionDS(df_test, tokenizer, CONFIG.MAX_LEN, is_test=True)
    dl_test = torch.utils.data.DataLoader(ds_test, batch_size=CONFIG.BATCH_SIZE, shuffle=False, num_workers=2)
```
- Creates tokenizer, dataset, and data loader for test data
- `is_test=True`: No labels expected

```python
    fold_preds = []
```
- Initializes list to store predictions from each fold

```python
    for fold in range(CONFIG.N_FOLDS):
        model_path = f"model_fold_{fold}.pth"
        if not os.path.exists(model_path):
            print(f"Model for fold {fold} not found, skipping.")
            continue
```
- Iterates through all folds
- Checks if model exists

```python
        print(f"Predicting Fold {fold+1}...")
        model = AutoModelForSequenceClassification.from_pretrained(
            CONFIG.MODEL_NAME, 
            num_labels=len(CONFIG.LABELS),
            problem_type="multi_label_classification"
        )
        model.load_state_dict(torch.load(model_path))
        model.to(device)
        model.eval()
```
- Loads model architecture
- Loads trained weights
- Sets to evaluation mode

```python
        preds = []
        with torch.no_grad():
            for batch in dl_test:
                batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()}
                with autocast(enabled=True):
                    out = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"])
                preds.append(torch.sigmoid(out.logits).float().cpu().numpy())
```
- Makes predictions without computing gradients
- Uses mixed precision for speed
- Applies sigmoid to get probabilities

```python
        fold_preds.append(np.vstack(preds))
        del model
        torch.cuda.empty_cache()
        gc.collect()
```
- Stores fold predictions
- Frees memory

```python
    if not fold_preds:
        print("No predictions made.")
        return
```
- Checks if any predictions were made

```python
    avg_preds = np.mean(fold_preds, axis=0)
```
- Averages predictions across all folds (ensemble)

```python
    final_preds = (avg_preds >= thresholds).astype(int)
```
- Applies optimized thresholds to get binary predictions

```python
    sub = pd.DataFrame(columns=["id"] + CONFIG.LABELS)
    sub["id"] = df_test["id"] if "id" in df_test.columns else np.arange(len(df_test))
    sub[CONFIG.LABELS] = final_preds
    sub.to_csv(CONFIG.SUBMISSION_PATH, index=False)
    print(f"Submission saved to {CONFIG.SUBMISSION_PATH}")
    print(sub.head())
```
- Creates submission DataFrame
- Adds ID column (from data or generated)
- Adds prediction columns
- Saves to CSV
- Displays first few rows

```python
predict_test(best_thresholds)
```
- Executes prediction function with optimized thresholds

---

## Summary

This notebook implements a **robust emotion classification pipeline** with:

1. **K-Fold Cross-Validation**: 5-fold stratified CV for reliable performance estimates
2. **State-of-the-Art Model**: DeBERTa-v3-base transformer
3. **Optimization Techniques**:
   - Mixed precision training (faster, less memory)
   - Gradient clipping (stability)
   - Learning rate warmup and decay
   - Differential weight decay
4. **Threshold Optimization**: Per-label thresholds for better F1 scores
5. **Ensemble Prediction**: Averages predictions from all folds
6. **Memory Management**: Explicit cleanup between folds

The model predicts 5 emotions (anger, fear, joy, sadness, surprise) in a **multi-label** setting, where text can have multiple emotions simultaneously.