• [Blog] HOPE. A model as a set of nested sub-optimizers, each with its own context flow and hyperparams. Relevant paper.

  • [Blog] Putting up only 250 posts online is what it takes to poison an LLM of arbitrary size. Relevant paper.

  • [Blog] Biologically motivated attempt to bring explicit recurrence back to the architecture.

  • [GitHub] AgentMark - An abstraction of prompt templates and API calling.

  • [Paper] ABC (Agentic Benchmark Checklist) - Benchmark for benchmarks. Outcome validity and task validity.

  • [Paper] Anomaly detection for LLM benchmarks. ~5% of questions in GSM8K are invalid, if we make the assumption that all questions in a benchmark need to test the same ability.

  • [GitHub] Google ADK - Yet another agent framework.

  • [Hackathon] AI Gateway hackathon. Game as eval.

  • [Paper] The choice of using model-specific optimized prompts or using a universal fixed prompt has a non-negligible impact on LLM ranking results.

Have a great week!

Updated:

Comments