The week ahead: 20251013

October 13, 2025

[Paper] Embedding initialization shown to be related to Grokking (delayed generalization). On the XOR task, embeddings trained on a small model is shown to accelerate generalization (accelerate grokking) when used as initialization for a larger model. Relevant prior work.
[Paper] GDPVal - Eval for GPT on tasks that matter for GDP.
[GitHub] Nanochat - codebase for budget-friendly LLM tuning. Shipping as the capstone project of an upcoming Eureka Labs course.
[Paper] Scaling law for RL compute.
[Paper] Shifting goalposts.
[Paper] BitNet distillation - Techniques to achieve strong task-specific performance with distillation into ternary weights (-1, 0, 1).
[Paper] New QA dataset with reasoning.
[Tweet] Claude cafe at NYC.

Have a great week!

You May Also Enjoy