AuriStream - Speech Language Model
AuriStream is a speech language model by Greta Tuckute and Klemen Kotar.
This repository contains the shared model code for AuriStream models.
Overview
AuriStream is a GPT-like transformer model for cochlear token prediction with optional multi-token prediction (MTP) heads.
This model predicts cochlear tokens from a tokenizer such as WavCochCausalV8192.
Usage
This repository is not meant to be used directly. Instead, use one of the checkpoint repositories that reference this base code:
To load a checkpoint:
from transformers import AutoModel, AutoConfig
model = AutoModel.from_pretrained(
"TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k",
trust_remote_code=True,
)
Model Architecture
The AuriStream model includes:
- RMSNorm for layer normalization
- Rotary Position Embeddings (RoPE)
- SiLU activation in MLP layers
- Multi-token prediction heads
Configuration Options
| Parameter | Description | Default |
|---|---|---|
vocab_size |
Number of cochlear tokens | 8192 |
n_embd |
Hidden dimension | 768 |
n_layer |
Number of transformer layers | 12 |
n_head |
Number of attention heads | 12 |
n_pred_steps |
Number of prediction steps (MTP) | 1 |
Files
configuration_auristream.py- Configuration classmodeling_auristream.py- Model implementation
Tokenizer
This model uses cochlear tokens from WavCochCausalV8192.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support