model weights mismatch with code (for mlx-lm)
#2
by
Goekdeniz-Guelmez
- opened
Hey inclusion team,
nice job on the model, im trying to implement the model architecture for MLX-LM but in the modelling...py code the query_key_value tensor for the linear attention layers is initialised as (self.num_heads + 2 * self.num_key_value_heads) * self.head_dim wich is 3072, but the model weight here have the size * 2 wich is 6144. is there a version mismatch?
Linear attention layers use MHA instead of GQA, therefore the self.num_key_value_heads is equal to num_heads, weight size is 2048*3. Details can be found here.
Goekdeniz-Guelmez
changed discussion status to
closed