This page collects my long-form notes on mechanistic interpretability, deep learning theory, optimization, and scaling laws. If you are new here, start from the latest posts below.
-
如何对齐不同初始化大小下的 Data scaling 曲线
研究了 data scaling 的 empirical slope 关于初始化 std 的关系,并提出一种简单方法来对齐不同初始化大小下的 data scaling 曲线
-
Can We Derive Scaling Law From First Principles?
New research available. Click to read the full PDF.