model weights mismatch with code (for mlx-lm)

by Goekdeniz-Guelmez - opened Sep 29

Sep 29

Hey inclusion team,

nice job on the model, im trying to implement the model architecture for MLX-LM but in the modelling...py code the query_key_value tensor for the linear attention layers is initialised as (self.num_heads + 2 * self.num_key_value_heads) * self.head_dim wich is 3072, but the model weight here have the size * 2 wich is 6144. is there a version mismatch?

zheyishine

Sep 29

Linear attention layers use MHA instead of GQA, therefore the self.num_key_value_heads is equal to num_heads, weight size is 2048*3. Details can be found here.

Goekdeniz-Guelmez

Oct 2

@zheyishine Thanks a lot, thats what I needed!

Goekdeniz-Guelmez changed discussion status to closed Oct 2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment