Can We Derive Scaling Law From First Principles?

This post redirects to the PDF file.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • 并行性与表达能力的权衡:从 $AC^0$/$TC^0$ 到 Linear Attention 的理论边界
  • 有限宽度下随机高斯矩阵谱范数的偏置与涨落
  • Adam 与 Muon 优化器更新矩阵的 Frobenius 范数估计
  • 球面之上:带有 Hyperball 机制的优化器的 μP 缩放
  • 球面之上:从球面动力学到 μP