senga-LUK-ACT-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.1221	47.6190	1000	0.0995
0.1008	95.2381	2000	0.0894
0.0865	142.8571	3000	0.0885
0.0824	190.4762	4000	0.0897
0.0759	238.0952	5000	0.0912
0.0702	285.7143	6000	0.0907
0.065	333.3333	7000	0.0926
0.0612	380.9524	8000	0.0967
0.0638	428.5714	9000	0.1048
0.0579	476.1905	10000	0.0991
0.0549	523.8095	11000	0.1037
0.0552	571.4286	12000	0.1039
0.0508	619.0476	13000	0.1047
0.052	666.6667	14000	0.1055
0.0486	714.2857	15000	0.1090
0.0494	761.9048	16000	0.1088
0.0484	809.5238	17000	0.1104
0.0557	857.1429	18000	0.1107
0.0465	904.7619	19000	0.1122
0.0478	952.3810	20000	0.1116
0.043	1000.0	21000	0.1142
0.0447	1047.6190	22000	0.1153
0.0433	1095.2381	23000	0.1145
0.0434	1142.8571	24000	0.1171
0.0423	1190.4762	25000	0.1179
0.0418	1238.0952	26000	0.1162
0.0391	1285.7143	27000	0.1189
0.0385	1333.3333	28000	0.1198
0.0406	1380.9524	29000	0.1203
0.0411	1428.5714	30000	0.1207
0.0492	1476.1905	31000	0.1206
0.0384	1523.8095	32000	0.1212
0.0374	1571.4286	33000	0.1212
0.0397	1619.0476	34000	0.1218
0.0373	1666.6667	35000	0.1221
0.0369	1714.2857	36000	0.1227
0.0388	1761.9048	37000	0.1213
0.0379	1809.5238	38000	0.1218
0.0388	1857.1429	39000	0.1226
0.0387	1904.7619	40000	0.1216

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model