• [Tweet] GenAI for the US military.

  • [Post] Halftime - targeted ads generation inside shows.

  • [Paper] Lots of security vulnerabilities from vibe coding.

  • [Course] Stanford CS146S: The Modern Software Developer. Bunch of vibe development & automation best practices.

  • [Blog] LLM Confessions. Basically, it’s shown that LLMs can retroactively “confess” about mistakes they made during some task inference, and another LLM can jump in and label the quality of these confessions. These labels are shown to be helpful for improving the quality of confessions. Notice that it doesn’t actually make a difference whether the confessing LLM is the same LLM as used during task inference, or another LLM. So, this work basically just shows LLM-as-a-judge can be used to improve the quality of LLM-as-a-judge. Relevant paper.

  • [Paper] Synthetic customer personas. The idea of using LLMs to generate customer profiles could sound appealing because it is cheap, but is shown to lead to many unintended pitfalls in diversity & bias. This is not surprising since LLMs are just the compressed form of some websites. When we compress a dataset like that and then decompress from it, the result can’t be very diverse & calibrated, because the information has already gone through a bottleneck. Probably, LLM is only best used for extrapolating gaps in existent data, instead of creating new data.

  • [Paper] Universal Weight Subspace Hypothesis. Independently and differently trained models converge to a parametric subspace - most variation in the learned weights lies in a small number of dominant directions. On one hand, an optimized parameter subspace shouldn’t be too surprising as that just means the dimensionality of our web is finite from the perspective of our model. However, multiple models sharing the same subspace might be evidence for some inductive biases that exist in gradient descent itself.

  • [Paper] Self-Jailbreaking. Reasoning models seem more vulnerable to adversarial prompts than non-reasoning base models, particularly through assuming a benign intent during CoT. This “self-jailbreaking” misalignment behavior seems emergent from normal reasoning training, e.g. math reasoning.

  • [Survey] Measuring Agents in Production. Simple yet effective methods drive most successful agents in 2025. These agents generally have bounded autonomy, are based on vanilla proprietary models, use model API directly without involving third-party frameworks, and rely on human labeling for evaluation. Widening gap between research and production.

  • [Paper] Artificial Hivemind. Different models provide homogenous responses to open-ended questions. IMO it is not surprising as models are just compressed datasets, and these models are just slightly different ways of compressing essentially very similar datasets. It is incorrect to assume models have consciousness. Takeaway: New learnings must come from new modalities.

Have a great week!

Updated:

Comments