The week ahead: 20251013
-
[Paper] Embedding initialization shown to be related to Grokking (delayed generalization). On the XOR task, embeddings trained on a small model is shown to accelerate generalization (accelerate grokking) when used as initialization for a larger model. Relevant prior work.
-
[Paper] GDPVal - Eval for GPT on tasks that matter for GDP.
-
[GitHub] Nanochat - codebase for budget-friendly LLM tuning. Shipping as the capstone project of an upcoming Eureka Labs course.
-
[Paper] Scaling law for RL compute.
-
[Paper] Shifting goalposts.
-
[Paper] BitNet distillation - Techniques to achieve strong task-specific performance with distillation into ternary weights (-1, 0, 1).
-
[Paper] New QA dataset with reasoning.
-
[Tweet] Claude cafe at NYC.
Have a great week!
Comments