hf-inference

Team
non-profit
Activity Feed

Inference Provider

VERIFIED
34,738,587 monthly requests

AI & ML interests

None defined yet.

Recent Activity

tomaarsen  updated a collection 2 days ago
deployed-models
tomaarsen  updated a collection 2 days ago
deployed-models
tomaarsen  updated a collection 2 days ago
deployed-models
View all activity

erikkaum 
posted an update about 11 hours ago
view post
Post
40
Releasing my first kernel 🔥 MaxSim

Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA.

The result is 3–5× speedup compared to naive PyTorch baseline 🔥

Benchmarks:
- SmallRerank (B=32, C=10): up to 3.2× (M3 Pro) / 2.8× (A100)
- HeavyRerank (B=32, C=100): up to 3.8× (M3 Pro) / 5.3× (A100)
- LongDocStress (Ld=1024): up to 6.2× (L4)

Try it out 👇
https://huggingface.co/kernels/erikkaum/maxsim