The week ahead: 20251201

December 1, 2025

[Blog] HOPE. A model as a set of nested sub-optimizers, each with its own context flow and hyperparams. Relevant paper.
[Blog] Putting up only 250 posts online is what it takes to poison an LLM of arbitrary size. Relevant paper.
[Blog] Biologically motivated attempt to bring explicit recurrence back to the architecture.
[GitHub] AgentMark - An abstraction of prompt templates and API calling.
[Paper] ABC (Agentic Benchmark Checklist) - Benchmark for benchmarks. Outcome validity and task validity.
[Paper] Anomaly detection for LLM benchmarks. ~5% of questions in GSM8K are invalid, if we make the assumption that all questions in a benchmark need to test the same ability.
[GitHub] Google ADK - Yet another agent framework.
[Hackathon] AI Gateway hackathon. Game as eval.
[Paper] The choice of using model-specific optimized prompts or using a universal fixed prompt has a non-negligible impact on LLM ranking results.

Have a great week!

You May Also Enjoy