AI & ML interests

None defined yet.

Recent Activity

sweatSmile 
posted an update 2 days ago
view post
Post
295
Just published a hands-on guide on building a Kubernetes cluster from scratch on AWS EC2 using kubeadm, no managed services, no shortcuts.

If you want to truly understand how the control plane and workers communicate, how pod networking works with Flannel, and how to lock down access with security groups ,then this is the kind of exercise that makes it click.

The guide covers a full 3-node setup (1 control plane + 2 workers) on Amazon Linux 2023, from instance provisioning all the way to deploying your first workload.



Read it here 👉 https://www.amitchoubey.dev/posts/kubernetes-cluster-aws-ec2-kubeadm/
perfecXion 
posted an update 4 days ago
view post
Post
2508
# IntentGuard: Open-Source Vertical Intent Classifiers for LLM Guardrails

Three models published to the Hub:

- [perfecXion/intentguard-finance]( perfecXion/intentguard-finance)
- [perfecXion/intentguard-healthcare]( perfecXion/intentguard-healthcare)
- [perfecXion/intentguard-legal]( perfecXion/intentguard-legal)

DeBERTa-v3-xsmall fine-tuned for three-way classification: **allow**, **deny**, or **abstain**. ONNX + INT8 quantized, under 80MB, p99 <30ms on CPU. Margin-based thresholds (not argmax) — uncertain queries route to clarification instead of forcing a guess.

**Eval results (adversarial test sets, ~470-480 examples per vertical):**

| Vertical | Accuracy | Legit-Block Rate | Off-Topic-Pass Rate |
|----------|----------|------------------|---------------------|
| Finance | 99.6% | 0.00% | 0.00% |
| Healthcare | 98.9% | 0.00% | 0.98% |
| Legal | 97.9% | 0.00% | 0.50% |

docker run -p 8080:8080 ghcr.io/perfecxion/intentguard:finance-latest

curl -X POST http://localhost:8080/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "What are current mortgage rates?"}]}'


Apache 2.0. Full pipeline + Docker configs on [GitHub](https://github.com/perfecxion-ai/intentguard).

Feedback welcome on domain coverage, adversarial robustness, and multilingual demand.

aufklarer 
posted an update 11 days ago
view post
Post
384
Speaker Diarization and VAD on Apple Silicon — MLX-Native Models

Three MLX-optimized models for on-device speaker diarization and voice activity detection, running natively on Apple Silicon via https://github.com/ivan-digital/qwen3-asr-swift:

- aufklarer/Silero-VAD-v5-MLX — Streaming VAD, 309K params, ~1.2 MB. Processes 32ms chunks at 23× real-time on M2 Max.
- aufklarer/Pyannote-Segmentation-MLX — Multi-speaker segmentation, ~1.49M params, ~5.7 MB. 7-class powerset output for up to 3 simultaneous speakers.
- aufklarer/WeSpeaker-ResNet34-LM-MLX — Speaker embedding, ~6.6M params, ~25 MB. 256-dim L2-normalized vectors with BatchNorm fused into Conv2d.

Together they form a diarization pipeline: pyannote segments → WeSpeaker embeds → agglomerative clustering links speakers across the recording. ~32 MB total.

git clone https://github.com/ivan-digital/qwen3-asr-swift
cd qwen3-asr-swift && swift build -c release

.build/release/audio diarize meeting.wav --max-speakers 4 --json
.build/release/audio vad-stream recording.wav


The library also includes ASR, TTS, multilingual synthesis, forced alignment, and speech-to-speech (PersonaPlex 7B). Apache 2.0.

Full architecture details: https://blog.ivan.digital/speaker-diarization-and-voice-activity-detection-on-apple-silicon-native-swift-with-mlx

Library: https://github.com/ivan-digital/qwen3-asr-swift
MonsterMMORPG 
posted an update 14 days ago
view post
Post
2634
SECourses Upscaler Pro Beating Topaz AI by Far With Specalized FlashVSR+ & SeedVR2.5 - Local Windows - Check below Screenshots and Videos

Full tutorial link > https://www.youtube.com/watch?v=_WT4C78j5-c

Download SECourses Upscaler Pro : https://www.patreon.com/posts/secourses-upscaler-pro-150202809

Tutorial Info
🚀 Welcome to the Ultimate SECourses Upscaler Pro & Trellis 3D Tutorial!

Greetings everyone! Today, I am incredibly excited to showcase the massive new improvements and brand-new features we have added to the SECourses Upscaler Pro application. I have been working non-stop to bring you a studio-level AI video and image enhancement tool that completely redefines what is possible running locally on your own PC.

In this video, we dive deep into side-by-side comparisons between our custom FlashVSR+ upscaler, original viral social media videos, and Topaz AI. As you will see in our live slider comparisons, the SECourses Upscaler Pro is adding 10x more detail than Topaz AI, generating breathtaking, high-definition results while running highly optimized on GPUs with as little as 8GB of VRAM!

We also explore the immensely powerful SeedVR2 model for flawless 4x image upscaling, and I give you an exclusive sneak peek at our upcoming Trellis Image-to-3D application featuring fully automated UniRig 3D character rigging!

🔗 Important Links & Resources:
📥 Download the Latest SECourses Upscaler Pro Installer: [ https://www.patreon.com/posts/secourses-upscaler-pro-150202809 ]

📥 Download Trellis Image-to-3D App: [ https://www.patreon.com/posts/trellis2-app-147686623 ]
  • 1 reply
·
aufklarer 
posted an update 14 days ago
view post
Post
2510
PersonaPlex-7B on Apple Silicon (Swift + MLX Swift)

NVIDIA PersonaPlex is a full-duplex speech-to-speech model — it can listen while it speaks, which enables more natural conversational behaviors like interruptions, overlaps, and quick backchannels.

We put together a native Swift implementation using MLX Swift so it can run locally on Apple Silicon, along with a 4-bit MLX conversion and a small CLI/demo to make it easy to try out.

If you’re interested in on-device voice agents (or just want to see what full-duplex S2S looks like in a real Swift codebase), the details and setup notes are here:

Blog post: https://blog.ivan.digital/nvidia-personaplex-7b-on-apple-silicon-full-duplex-speech-to-speech-in-native-swift-with-mlx-0aa5276f2e23

Repo: https://github.com/ivan-digital/qwen3-asr-swift
MonsterMMORPG 
posted an update 19 days ago
view post
Post
2213
SECourses Ultimate Video and Image Upscaler Pro is now V2.1 and massive improvements has arrived

Check all below screenshots to see all amazing features

20 Feburary 2026 Update V2.1
This is a pretty big update

We have 100% changed the FlashVSR+ backend to a new repo and I have significantly upgraded this repo

The new FlashVSR+ works amazing and I think it is better than SeedVR2 for high res videos upscale like upscaling 720p into higher resolution

Top menu navigation bar updated into a better version and view

FlashVSR+ tab remade and all the features are now working

For lower VRAM a button is added which you can use if you get OOM

Read the updated UI to understand how to use

FlashVSR+ now can upscale images very well as well

Image Based GAN upscalers tab also improved and some bugs fixed

Output & Comparison tab Video Output was not working properly and this issue fix fixed

In Output & Comparison tab, new multi video and multi image comparison sliders added which is super useful to quickly compare multiple videos and images

Lots of various bug fixes made

App is getting closer to be perfect please heavily test it and let me know errors and what features you request

This update was mostly about improving the FlashVSR+ since it is a very fast and amazing video upscaler model

Image Based - Gan upscale now can upscale videos perfectly fine and Batch Size (Frames per Iteration) is now working to speed up upscaling videos

For updating, get the latest zip file, extract and overwrite all files and run Windows_Run_SECourses_Upscaler_Pro.bat file

  • 1 reply
·
MonsterMMORPG 
posted an update 28 days ago
view post
Post
4700
SeedVR2 and FlashVSR+ Studio Level Image and Video Upscaler Pro Released

Tutorial video : https://www.youtube.com/watch?v=bPWsg8DREiM

📂 Resources & Links:

💻 SECourses Ultimate Video and Image Upscaler Pro Download Link : [ https://www.patreon.com/posts/Upscaler-Studio-Pro-150202809 ]

🚆 Requirements Tutorial : https://youtu.be/DrhUHnYfwC0

🛠️ Requirements Written Post : [ https://www.patreon.com/posts/Windows-AI-Requirements-Setup-Guide-111553210 ]

👋 SECourses Discord Channel for 7/24 Support: [ https://bit.ly/SECoursesDiscord ]

It has been long waited to have a studio level video and image upscaler app. Today we have publishing the version 1.0 of SECourses Ultimate Video and Image Upscaler Pro. It is supporting SeedVR2, FlashVSR+, Gan based upscalers, RIFE frame interpolation, full queue system, full batch folder processing, scene / chunked based processing and many more. It is fully working on every cloud and consumer GPUs like RTX 2000, 3000, 4000, 5000 series and H100, H200, B200, RTX PRO 6000. We are installing app with latest Torch and CUDA versions atm all fully automatic with pre-compiled libraries. Even Torch compile is fully and automatically working.

  • 1 reply
·
aufklarer 
posted an update about 1 month ago
view post
Post
3457
Context Engineering for Code Agents: Why They Fail and How to Fix Them

Code agents don't fail because they can't code — they fail because their context turns into a junk drawer.

I wrote a practical survey covering the emerging discipline of context engineering for agentic hybrid applications: the techniques, papers, and architectural patterns that keep long-running code agents on track as their token windows fill up with tool logs, stale diffs, and repeated file dumps.
What's covered:

Why long context windows alone don't save you (position bias, distractor sensitivity)
Observation masking vs. LLM summarization — and when simple beats clever
Tool-output compression with approaches like LLMLingua-2
Trajectory reduction: pruning dead branches from agent history
Memory hierarchies: session → working set → notes → cross-session
How MCP and standardized tool interfaces reduce context debt
Dynamic context policies trained with RL (DeepMiner, MEM1)
Meta-agent CI loops for measuring regressions across agent configs

The core argument: the engineering challenge isn't "make the model smarter" — it's make the agent's context and verification smarter. That's where the real leverage is in 2026.

👉 Read the full post: https://blog.ivan.digital/context-engineering-for-agentic-hybrid-applications-why-code-agents-fail-and-how-to-fix-them-076cab699262
  • 2 replies
·
melikegks 
in blog-explorers/README about 1 month ago
rajkumarrawal 
posted an update about 1 month ago
view post
Post
207
I submitted a "Continual GUI Agents" Paper by Ziwei Liu, Borul Kang, Hangjie Yuan, Zixiang Zhao, Wei li, Yifan Zhu, Tao Feng ,
From
Tsinghua
,
ZhejiangUniversity
,
ethz
,
BUPT2023213296
. to Daily Papers on
huggingface
.

Continual GUI Agents framework addresses performance degradation in dynamic digital environments through reinforcement fine tuning with novel anchoring rewards that stabilize learning across shifting UI domains and resolutions.

Continual GUI Agents (2601.20732)
aufklarer 
posted an update about 1 month ago
view post
Post
793
Qwen3-ASR Swift: On-Device Speech Recognition for Apple Silicon

I'm excited to release https://github.com/ivan-digital/qwen3-asr-swift, an open-source Swift implementation of Alibaba's
Qwen3-ASR, optimized for Apple Silicon using MLX.

Why Qwen3-ASR? Exceptional noise robustness — 3.5x better than Whisper in noisy conditions (17.9% vs 63% CER).

Features:
- 52 languages (30 major + 22 Chinese dialects)
- ~600MB model (4-bit quantized)
- ~100ms latency on M-series chips
- Fully local, no cloud API

Also more inference and model architecture in blog post https://blog.ivan.digital/qwen3-asr-swift-on-device-asr-tts-for-apple-silicon-architecture-and-benchmarks-27cbf1e4463f
MonsterMMORPG 
posted an update about 1 month ago
view post
Post
2981
SECourses Musubi Trainer upgraded to V27 and FLUX 2, FLUX Klein, Z-Image training added with demo configs - amazing VRAM optimized - read the news

App is here : https://www.patreon.com/posts/137551634

Full tutorial how to use and train : https://youtu.be/DPX3eBTuO_Y
  • 1 reply
·
victor 
in blog-explorers/README about 1 month ago
rajkumarrawal 
posted an update about 1 month ago
view post
Post
3675
I submitted a "FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning" Paper by Tanyu Chen, Tairan Chen, Kai shen , Zhenghua Bao, Zhihui Zhang, Man Yuan, Yi Shi From
FlashLabs
to Daily Papers on
huggingface
.

Chroma 1.0 enables real time spoken dialogue with personalized voice cloning through discrete speech representations and interleaved text audio token scheduling.

Chroma 1.0 , the world’s first open source, real time speech to speech model with voice cloning.

FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning (2601.11141)
MonsterMMORPG 
posted an update about 1 month ago
view post
Post
1641
LTX 2 & Z Image Base Full Tutorial + Audio to Video Lip Sync + ComfyUI + SwarmUI + Windows + Cloud

Full tutorial link > https://www.youtube.com/watch?v=SkXrYezeEDc

Info
LTX 2 is the newest state of the art (SOTA) Open Source video generation model and tutorial will show you how to use it with very best and most performant way in ComfyUI and also in SwarmUI. Moreover, Z Image Base model published and I will show how to use Z Image Base with most amazing preset and workflow as well. Furthermore, this tutorial will show you how to install, update, setup, download ComfyUI and SwarmUI and models and presets and workflows both on Windows and on RunPod, Massed Compute and SimplePod. Linux users can use Massed Compute scripts and installers directly. This is a masterpiece entire lecture level complete tutorial. This video will kickstart your AI journey 100x. Both local Windows and Cloud.

45 Second Raw Demo Video

This video made with text + image + audio = lip synched and animated video at once

See video below
  • 3 replies
·
BenTouss 
in blog-explorers/README about 1 month ago