Industry

  • [Tweet] Optimizing OpenAI models specifically for Openclaw.
  • [Blog] Project Deal: Agents to represent both the buy and the sell side.
  • [Blog] Copilot surge.

Research

  • [Arxiv] GDPval: Eval tasks by GDP impact.
  • [Blog] Vending-Bench 2: model to operate a vending machine. Notably, a “good” baseline is assumed at +$206 per day.
  • [ArXiv] Model creativity benchmark.

Other

  • [Gist] LLM Wiki v2.
  • [Page] Flipbook: streaming pixels from a model.
  • [Blog] How the heck does shazam work?

Have a great week!

Updated:

Comments