The week ahead: 20251201
-
[Blog] HOPE. A model as a set of nested sub-optimizers, each with its own context flow and hyperparams. Relevant paper.
-
[Blog] Putting up only 250 posts online is what it takes to poison an LLM of arbitrary size. Relevant paper.
-
[Blog] Biologically motivated attempt to bring explicit recurrence back to the architecture.
-
[GitHub] AgentMark - An abstraction of prompt templates and API calling.
-
[Paper] ABC (Agentic Benchmark Checklist) - Benchmark for benchmarks. Outcome validity and task validity.
-
[Paper] Anomaly detection for LLM benchmarks. ~5% of questions in GSM8K are invalid, if we make the assumption that all questions in a benchmark need to test the same ability.
-
[GitHub] Google ADK - Yet another agent framework.
-
[Hackathon] AI Gateway hackathon. Game as eval.
-
[Paper] The choice of using model-specific optimized prompts or using a universal fixed prompt has a non-negligible impact on LLM ranking results.
Have a great week!
Comments