• [Paper] Embedding initialization shown to be related to Grokking (delayed generalization). On the XOR task, embeddings trained on a small model is shown to accelerate generalization (accelerate grokking) when used as initialization for a larger model. Relevant prior work.

  • [Paper] GDPVal - Eval for GPT on tasks that matter for GDP.

  • [GitHub] Nanochat - codebase for budget-friendly LLM tuning. Shipping as the capstone project of an upcoming Eureka Labs course.

  • [Paper] Scaling law for RL compute.

  • [Paper] Shifting goalposts.

  • [Paper] BitNet distillation - Techniques to achieve strong task-specific performance with distillation into ternary weights (-1, 0, 1).

  • [Paper] New QA dataset with reasoning.

  • [Tweet] Claude cafe at NYC.

Have a great week!

Updated:

Comments