Hiring 💼

4 82 201

Cahlen Humphreys PRO

cahlen

https://bigcompute.science

cahlen

AI & ML interests

☠️💻

Recent Activity

liked a model about 14 hours ago

Open-OSS/privacy-filter

reacted to BlueNipples's post with 👀 1 day ago

Good news, llama.cpp seems to be close to supporting MTP on qwen models. Bad news, every single gguf will have to be redone when it is.

reacted to yuriyvnv's post with 🔥 1 day ago

📄 The WAVe paper is officially out in the Information Sciences Journal. You saw the PT and NL model releases earlier this year. This is the peer-reviewed paper behind them, with the full method, ablations, and downstream ASR evaluation. Quick recap: WAVe is a 1B multimodal embedding model that filters synthetic speech at the word level, not the sentence level. On Portuguese ASR it cuts training steps by 34%, improves cross-domain generalization by 50%, and matches WER with 30% less synthetic data. 📦 Resources - Paper: https://www.sciencedirect.com/science/article/pii/S0020025526005220 - PT model: https://huggingface.co/yuriyvnv/WAVe-1B-Multimodal-PT - NL model: https://huggingface.co/yuriyvnv/WAVe-1B-Multimodal-NL - Collection: https://huggingface.co/collections/yuriyvnv/multi-modal-embeddings-for-synthetic-transcript-filtering - Code: https://github.com/yuriyvnv/WAVe If you train ASR on synthetic or back-translated data, would like to see WAVe benchmarked on other languages. @reach-vb @ylacombe @hf-audio @BramVanroy #speech #asr #multimodal #syntheticdata #lowresource

View all activity

Organizations

liked a model about 14 hours ago

Open-OSS/privacy-filter

Token Classification • 1B • Updated 1 day ago • 244k • 10

reacted to BlueNipples's post with 👀 1 day ago

Post

2276

Good news, llama.cpp seems to be close to supporting MTP on qwen models. Bad news, every single gguf will have to be redone when it is.

1 reply

reacted to yuriyvnv's post with 🔥 1 day ago

Post

2538

📄 The WAVe paper is officially out in the Information Sciences Journal.

You saw the PT and NL model releases earlier this year. This is the peer-reviewed paper behind them, with the full method, ablations, and downstream ASR evaluation.

Quick recap: WAVe is a 1B multimodal embedding model that filters synthetic speech at the word level, not the sentence level. On Portuguese ASR it cuts training steps by 34%, improves cross-domain generalization by 50%, and matches WER with 30% less synthetic data.

📦 Resources
- Paper: https://www.sciencedirect.com/science/article/pii/S0020025526005220
- PT model: yuriyvnv/WAVe-1B-Multimodal-PT
- NL model: yuriyvnv/WAVe-1B-Multimodal-NL
- Collection: https://huggingface.co/collections/yuriyvnv/multi-modal-embeddings-for-synthetic-transcript-filtering
- Code: https://github.com/yuriyvnv/WAVe

If you train ASR on synthetic or back-translated data, would like to see WAVe benchmarked on other languages.

@reach-vb @ylacombe @hf-audio @BramVanroy

#speech #asr #multimodal #syntheticdata #lowresource

reacted to unmodeled-tyler's post with 🚀 1 day ago

Post

2129

Hey Hugging Face!

Repo: https://github.com/unmodeled-tyler/vessel-browser

I wanted to share a cool feature from my open source AI native web browser, Vessel: Persistent highlights!

You can highlight anything on the page and the context is provided to the agent. It's kind of a fun way to learn about new stuff, synthesize info, or just deepen your comprehension/understanding.

Since highlights are persistent, you can close the page, come back later - and your highlights will be exactly where you left them. I've found this particularly useful when reviewing technical blogs, model cards, etc.

Check it out!

1 reply

posted an update 1 day ago

Post

103

So I built a multimodal video annotation pipeline in my spare time, as you do.

corpus-mill turns any long-form video with people on camera into a time-aligned event corpus across audio, vision, OCR, faces, brand observations, music, and clip-worthy moments. Runs entirely on local GPU because — and I cannot stress this enough — your footage has no business being on someone else's servers.

The honest origin: I needed real multimodal supervision data, the public corpora are weirdly thin once you need per-frame / per-speaker / per-second labels with provenance, so I built one. Then it grew. Then I looked up and it was 30K LOC and ~30 stages and I thought, ok, maybe other people would want this.

Stack is the usual suspects: Whisper-large-v3 (faster-whisper), pyannote-3.1 (which secretly drags in 433 NeMo modules — surprise!), Qwen2.5-VL-7B for vision/OCR/shoppable detection, dlib + YuNet for faces, qwen2.5:7b / qwen3:14b via local Ollama for the LLM passes, chromaprint + PDQ for fingerprinting. Outputs as Parquet + SQLite. Apache 2.0.

There's a Docker compose that works, after I spent a day discovering that faster-whisper wants CUDA 12 cuBLAS while pyannote 4 wants CUDA 13, and the answer is "install both, point LD_LIBRARY_PATH at the cu12 wheels, ship it." That's now baked in. You're welcome.

Spare-time project, bugs are real, fixing them for your specific footage is on you. If you're training multimodal models and want a corpus pipeline you fully control on-prem, this might save you months. If not, the README is at least mildly entertaining.

https://github.com/cahlen/corpus-mill

1 reply

liked 2 datasets 3 days ago

nvidia/Nemotron-Image-Training-v3

Viewer • Updated 10 days ago • 6.92M • 4.8k • 55

open-thoughts/AgentTrove

Viewer • Updated about 18 hours ago • 1.7M • 6.3k • 81

liked 3 models 3 days ago

reacted to prithivMLmods's post with 🔥 6 days ago

Post

3907

Multimodal-Edge Demo, a node-based inference canvas demo, is now live on Spaces. It features node-based Transformers for fast inference across 10+ edge-device multimodal models on the Hub, all within a single space. The series includes models from Qwen3.5, Qwen3-VL, Gemma 4, and the LFM 2.5 VL model series, with support for reasoning and grounding tasks.

🤗 Demo: prithivMLmods/Multimodal-Edge-Node
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/Multimodal-Edge-Node
✅ Multimodal Apps Collections: https://huggingface.co/collections/prithivMLmods/hall-of-multimodal-apps

🤗 > To learn more, visit the app page or the respective model pages.

liked a model 7 days ago

nvidia/Gemma-4-26B-A4B-NVFP4

Text Generation • 14B • Updated 1 day ago • 256k • 35

reacted to SeaWolf-AI's post with 👍 8 days ago

Post

5032

🌌 Introducing Model Galaxy — a Living, Multimodal Fork of the HF Model Atlas

👉 Try it: FINAL-Bench/model-galaxy

This Space is a fork of the brilliant Eliahu/Model-Atlas, the official demo of "Charting and Navigating Hugging Face's Model Atlas" (Horwitz et al., arXiv 2503.10633). Their pre-computed HF model graph is the foundation of every node and edge you see, and we are deeply grateful for its open release.

The original atlas is a static snapshot of early 2025. Model Galaxy turns it into a living, multimodal map. We injected the 2026 trending originals that did not exist when the atlas was frozen — DeepSeek-V4, Hy3-preview, GLM-5.1, Kimi-K2, gpt-oss, Nemotron-3 Super / Nano / Omni, Hermes-4.3, Qwen3-Coder-Next, Llama-3.3, Granite-4.1, plus the latest multimodal releases (FLUX.2, ERNIE-Image, HunyuanImage / Video, LTX-2.3, Wan2.2, Kokoro-82M, VoxCPM2, Voxtral-TTS, whisper-v3-turbo, Gemma-4, Qwen3-Omni, Phi-4-mm) — each with proper base_model lineage edges.

We also added the complete VIDRAFT Darwin family ontology: 120 nodes covering Darwin Core, AETHER, every brand variant (Rogue, AWAXIS, TenOS, Warecube), NOESIS-Darwin multimodal extensions, and 40+ community quantizations — the most complete Darwin lineage view anywhere.

The name "Galaxy" is now literal: our three injected clusters are re-laid out as logarithmic spiral galaxies, with bigger models near the bright cores and quantizations scattering to the outer arms — just like real star mass distribution. A top-right toggle switches between Galaxy mode (deep-space gradient with 220 animated stars) and Atlas mode (clean white panels for reports). A 15-second progress bar narrates the render, and per-modality / per-company colors make every cluster legible at a glance.

Final scale: 22,480 nodes in the default Modalities atlas, 137,324 in the Large NLP atlas, and a 277-node compact Darwin + Trending view for instant exploration. Feedback and PRs welcome.

liked 2 datasets 13 days ago

ScaleAI/SWE-bench_Pro

Benchmark • Updated Feb 23 • 731 • 40.9k • 108

NuTonic/sat-image-boundingbox-sft-full

Viewer • Updated 15 days ago • 531k • 55.6k • 13

New activity in RedHatAI/Qwen3.6-35B-A3B-NVFP4 13 days ago

Great quant!!

#6 opened 17 days ago by

tasticleeze

liked a model 13 days ago

RedHatAI/Qwen3.6-35B-A3B-NVFP4

Updated 18 days ago • 1.41M • 126

reacted to anakin87's post with ❤️🔥 14 days ago

Post

3278

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

🧑‍🍳 Here's how:

1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini 🏆

---

🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe

📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

liked a dataset 14 days ago

Jackrong/GLM-5.1-Reasoning-1M-Cleaned

Viewer • Updated 19 days ago • 572k • 7.09k • 182

Cahlen Humphreys PRO

AI & ML interests

Recent Activity

Organizations

cahlen's activity

Great quant!!