deep-learning
an archive of posts with this tag
| Jun 20, 2026 | DASF:一种闭环的 batch size schedule-free 方法 -- views |
|---|---|
| Jun 16, 2026 | 为什么 LLM pretrain 过程中途要把 batch size 翻倍 -- views |
| May 15, 2026 | μP Map -- views |
| Apr 14, 2026 | 在 LLM 语境下,梯度里的噪声会如何影响 training dynamics? -- views |
| Mar 07, 2026 | 球面之上:带有 Hyperball 机制的优化器的 μP 缩放 -- views |
| Mar 05, 2026 | 球面之上:从球面动力学到 μP -- views |
| Mar 02, 2026 | Tensor Programs (二):从Tensor Programs到 μP -- views |
| Feb 14, 2026 | Tensor Programs (一):从Feature Learning 的谱条件到 μP -- views |
| Feb 08, 2026 | 从 Gated DeltaNet 到 Kaczmarz -- views |
| Dec 30, 2025 | Can We Derive Scaling Law From First Principles? |