Xiaolin Zhang
xiaolinz
AI & ML interests
None yet
Organizations
Context-Engineering
DiLoCo
-
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch
Paper • 2501.18512 • Published • 29 -
DiLoCo: Distributed Low-Communication Training of Language Models
Paper • 2311.08105 • Published • 16 -
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Paper • 2503.09799 • Published • 15 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 8
LLM
Context-Engineering
DeepSeek
DiLoCo
-
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch
Paper • 2501.18512 • Published • 29 -
DiLoCo: Distributed Low-Communication Training of Language Models
Paper • 2311.08105 • Published • 16 -
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Paper • 2503.09799 • Published • 15 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 8