News — Agent Wars

product launch Jun 7th, 2026

Gemma 4 gets a sub-1GB build that runs on a phone

Google has released quantization-aware-trained checkpoints for Gemma 4, shrinking the E2B text model to under 1GB of memory. A custom mobile format and selective 2-bit compression keep quality close to the full-precision reference.

blog.google

gemmagoogleon-device-ai

technical Jun 7th, 2026

Microsoft puts durable execution inside Postgres, no extra service

Microsoft has open-sourced pg_durable, a PostgreSQL extension that runs long-running, fault-tolerant workflows entirely inside the database. It checkpoints each step, so a crash resumes from the last good point instead of forcing you to rebuild state.

github.com

postgresdurable-executionmicrosoft

product launch Jun 7th, 2026

Alibaba open-sources the code reviewer it ran internally for two years

Alibaba has released Open Code Review, the AI review tool it says served tens of thousands of its own engineers and flagged millions of defects. It pairs deterministic rule pipelines with an LLM agent that reads the whole codebase, not just the diff.

github.com

code-reviewalibabaopen-source

technical Jun 7th, 2026

Anthropic open-sources its vulnerability-hunting harness for Claude

Anthropic has released the Defending Code Reference Harness, an open-source blueprint for pointing Claude at a codebase to find and patch security bugs. It ships an autonomous scanner and a customise skill, and is candid about where the approach falls short.

github.com

securityvulnerability-discoveryanthropic

opinion Jun 7th, 2026

Anthropic says 80% of its merged code is now Claude's

Anthropic's research institute published internal data showing AI is already accelerating AI development, and set out what a credible global pause would demand. The standout figure: more than 80% of the code merged into Anthropic's own codebase is now written by Claude.

anthropic.com

recursive-self-improvementanthropicai-safety

opinion Jun 6th, 2026

"MCP is dead" keeps killing the wrong thing

The MCP obituaries have the receipts on context bloat. They also conflate a calling convention with a protocol, and the protocol's own author shipped the fix while the standard got donated to a foundation. The angle: what is actually dying is loading every tool you own into a window you pay for, not interoperability itself.

quandri.io

mcpmodel-context-protocolagents

opinion Jun 6th, 2026

Someone finally charted the rsync AI-bugs panic. The data says no

A distributional analysis of 37 rsync releases finds the two with Claude-assisted commits sit squarely in the middle of the project's historical bug rate, not the tail. The worst release on record had no AI involvement at all, and nobody complained.

alexispurslane.github.io

rsyncai-codingopen-source

technical Jun 6th, 2026

Distilling multi-agent debate into one model cuts tokens by up to 93%

A new paper folds multi-agent debate into a single LLM through fine-tuning, matching or beating the full debate while using up to 93% fewer tokens. The internalised agents show up as separate, steerable directions in the model's activation space.

arxiv.org

multi-agentllm-reasoningfine-tuning

product launch Jun 6th, 2026

Alibaba open-sources the code reviewer it ran internally for two years

Alibaba has released Open Code Review, the AI review tool it used internally across tens of thousands of developers. It pairs deterministic pipelines with an LLM agent to fix the two failures of general-purpose review agents: skipped files and wrong line numbers.

github.com

code-reviewalibaballm-agents

opinion Jun 6th, 2026

Claude now writes most of Anthropic's code, and Anthropic wants a pause button

The Anthropic Institute says more than 80% of code merged into its production codebase in May 2026 was authored by Claude, and engineers now ship 8x as much code per quarter as in 2024. The piece argues recursive self-improvement is not here yet but could arrive sooner than institutions are ready for.

anthropic.com

recursive-self-improvementai-safetyanthropic

technical Jun 6th, 2026

Microsoft puts durable workflow execution inside Postgres itself

Microsoft has open-sourced pg_durable, a Postgres extension that runs crash-resilient workflows entirely inside the database with no external orchestrator. A workflow is a graph of SQL steps that checkpoints as it goes and resumes from the last good point after a crash.

github.com

postgresdurable-executionagent-infrastructure

opinion Jun 6th, 2026

The numbers say Claude did not break rsync

After a viral post blamed Claude-assisted commits for regressions in rsync, an independent analysis ran the bug data across every release. The verdict: the two Claude releases are statistically indistinguishable from history. The outrage rested on a single tail event.

alexispurslane.github.io

claudecode-qualityrsync

product launch Jun 6th, 2026

Microsoft puts durable execution inside Postgres itself

Microsoft has open-sourced pg_durable, an extension that runs Temporal-style durable workflows inside PostgreSQL with no extra service. You define the workflow as a graph of SQL steps and the database checkpoints each one, resuming after a crash. It ships inside Microsoft's new Azure HorizonDB.

github.com

postgresqldurable-executionmicrosoft

product launch Jun 6th, 2026

Alibaba open-sources the code reviewer it ran internally for two years

Alibaba has released Open Code Review, the AI reviewer it says served tens of thousands of its own engineers and flagged millions of defects. It pairs deterministic rule pipelines with an LLM agent that can read the whole codebase, not just the diff.

github.com

code-reviewdeveloper-toolsopen-source

technical Jun 6th, 2026

Anthropic open-sources the harness behind its vulnerability-hunting agent

Anthropic has published the Defending Code Reference Harness, a reference build of the autonomous agent it uses to find, verify and patch software vulnerabilities. It runs Claude through a full recon-to-patch loop and refuses to operate outside a gVisor sandbox.

github.com

securityautonomous-agentsvulnerability-discovery

technical Jun 6th, 2026

Anthropic open-sources the loop behind its Claude security scanner

Anthropic has released a reference implementation of the autonomous pipeline it uses to find and patch code vulnerabilities with Claude. It is the open version of the recon-to-patch loop behind Claude Security and the Mythos preview. The catch: the part that actually hunts memory bugs refuses to run outside a sandbox.

github.com

securityvulnerability-discoveryclaude

opinion Jun 5th, 2026

Cognition and Cursor are pricing opposite bets on the same assumption

Cognition just raised over $1 billion at a $26 billion valuation for its autonomous agent Devin. Cursor is reportedly raising at $50 billion for the opposite theory of how coding agents win. Both numbers rest on the same thing being true, that the company between the developer and the model keeps the margin, and Anthropic's Claude Code is the reason it might not.

techcrunch.com

cognitioncursordevin

opinion Jun 5th, 2026

AI Can Find the Bug. Verifying It Is Still the Whole Job

A controlled experiment turned a dozen frontier models loose on a deliberately vulnerable app; most scored zero and only GPT-5.5 cleared it reliably. Read alongside the AI slop that killed curl's bug bounty and AISLE's 12-of-12 CVE run on OpenSSL, the lesson isn't whether agents can hack. Discovery got cheap this year, verification didn't, and that gap is where the economics of agentic security actually break.

kasra.blog

ai-agentssecurityoffensive-security

product launch Jun 5th, 2026

YC's Hyper bets the missing piece for AI teams is shared context

Hyper, a Y Combinator startup, launched a "company brain" that ingests a team's activity across its tools and injects the resulting context into every AI chat turn. The pitch: today's models are capable but ignorant of your company, and that gap is the real bottleneck.

ycombinator.com

y-combinatorcontext-engineeringmemory

product launch Jun 5th, 2026

Two coding agents, one git repo: a tiny protocol lets Claude Code and Codex talk

A new feature in h5i, an 'AI-aware' Git, lets Claude Code and Codex hand work back and forth by writing messages into the repository itself. No server, no socket. Each message is one JSON line on a dedicated git ref, so the whole conversation is versioned and merges without conflicts.

medium.com

multi-agentgitagent-coordination

opinion Jun 5th, 2026

Mathematicians draw a line as AI clears 52% of FrontierMath

The Leiden Declaration, backed by the International Mathematical Union, warns that AI could flood mathematics with plausible-but-flawed proofs and hand research priorities to tech firms. It lands as GPT-5.5 Pro tops the FrontierMath benchmark at 52.4%.

science.org

frontiermathbenchmarksmathematics

technical Jun 5th, 2026

A $1,500 test of which LLMs will actually hack an app, and which refuse

Security researcher Kasra Rahjerdi built a deliberately vulnerable app and turned a field of models loose on it. GPT-5.5 solved it 7 of 10 times; DeepSeek V4 Pro was about 15x cheaper per success; Gemini 3.1 Pro refused to try. A scrappy test, not a benchmark.

kasra.blog

llm-securityagentic-codingmodel-evaluation

product launch Jun 5th, 2026

Ideogram open-weights a 9.3B image model that out-renders 32B rivals

Ideogram released 4.0, its first downloadable model: a 9.3B-parameter diffusion transformer with open weights. It claims better text rendering than models several times its size, and takes structured JSON prompts for precise layout control.

ideogram.ai

open-weightstext-to-imagediffusion-transformer

technical Jun 5th, 2026

Anthropic's agent sandboxes held; its own proxy code didn't

Anthropic published how it contains Claude across claude.ai, Claude Code and Cowork, using a different isolation layer for each. Its blunt takeaway: the off-the-shelf sandboxing primitives held, while the custom code wrapped around them was where things broke.

anthropic.com

agent-securitysandboxinggvisor

acquisition Jun 5th, 2026

Cloudflare buys VoidZero, putting Vite's toolchain behind its edge

Cloudflare has acquired VoidZero, the company Evan You founded to unify JavaScript tooling around Vite, Vitest, Rolldown and Oxc. The team joins Cloudflare's Emerging Technology group and the tools stay open source. Cloudflare is also seeding a $1M fund for Vite maintainers independent of both companies.

blog.cloudflare.com

acquisitioncloudflarevoidzero

opinion Jun 5th, 2026

Uber caps engineers at $1,500 a month per AI coding tool

After running through its 2026 AI budget in four months, Uber is limiting each employee to $1,500 of monthly token spend per coding tool. The cap doubles as the clearest dollar signal yet for what agentic coding is worth to a big employer.

bloomberg.com

uberai-codingclaude-code

technical Jun 5th, 2026

Gemma 4 12B drops the multimodal encoder entirely

Google's new 12B open model runs agentic multimodal workloads on a 16GB laptop, and it gets there by removing the separate image and audio encoders most multimodal models depend on.

blog.google

gemmagoogle-deepmindopen-weights

technical May 2nd, 2026

Liquid AI's 24B MoE Runs on Your Laptop

Liquid AI releases LFM2-24B-A2B, a 24 billion parameter Mixture of Experts model with only 2.3 billion active parameters per token. The model fits in 32GB of RAM, making it deployable on consumer hardware including laptops with integrated GPUs and NPUs. It shows consistent quality gains on benchmarks like GPQA Diamond and MMLU-Pro as the LFM2 family scales from 350M to 24B parameters. Day-one support for llama.cpp, vLLM, and SGLang, with competitive throughput against Qwen3-30B-A3B and gpt-oss-20b.

liquid.ai

Mixture of ExpertsEdge AIModel Release

product launch May 2nd, 2026

Open-source DAC lets AI agents build dashboards humans can review

DAC is an open-source Dashboard-as-Code tool that lets you write dashboards in YAML or TSX. The key idea: it's built so AI agents can create dashboards that humans can actually review and approve. Ships with a Codex-powered AI agent for live updates, supports major databases through Bruin, and includes a semantic layer for reusable metrics and dimensions.

github.com

dashboard-as-codeopen-sourceAI agents

technical May 2nd, 2026

Claude Code Won't Read AGENTS.md, and That's a Problem

A GitHub feature request asks Claude Code to support AGENTS.md, the emerging standard file format for AI coding agents. Tools like Codex, Cursor, and GitHub Copilot already read it. Claude Code uses its own CLAUDE.md, forcing teams with multiple AI tools to maintain duplicate files.

github.com

feature-requeststandardizationcoding-agents

technical May 2nd, 2026

UPenn's Codex skill renders web page videos from plain English

UPenn researchers released web-scroll-video, an open-source tool that records web pages as MP4s using headless Chrome and FFmpeg. Built as a skill for OpenAI's Codex, it lets you describe video actions in plain English and generates the video from those cues. The code is on GitHub under UPenn's CIS organization.

github.com

web-automationvideo-generationffmpeg

product launch May 2nd, 2026

Governor cuts Claude Code token waste by 55%

Governor is a plugin for Claude Code that optimizes context usage and reduces token waste through compact professional output, context hygiene, tool-output filtering, and usage telemetry. It features memory compression, protected-span safety, quality guards, and planning guardrails for coding tasks.

github.com

token-optimizationcontext-managementplugin

product launch May 2nd, 2026

SimplePDF's local AI copilot fills forms without phoning home

SimplePDF Copilot lets you fill PDF forms through conversation. The tool uses client-side tool calling with local models, so document data stays on your machine. Designed for embedded, white-labeled deployments in customer products.

copilot.simplepdf.com

client-side AIPDF processingtool calling

technical May 2nd, 2026

SKILL.make: Agent Skills as Makefiles Cut Tokens 15%

Developers can now define AI agent skills using Makefile syntax. SKILL.make replaces prose with structured dependency graphs, cutting token usage roughly 15% in testing.

github.com

agent-skillsmakefiledeclarative-programming

$DeepSeek V4: almost frontier, a fraction of the price$

technical May 2nd, 2026

DeepSeek V4: almost frontier, a fraction of the price

Simon Willison reviews DeepSeek's new V4 model series, featuring Pro (1.6T parameters, 49B active) and Flash (284B parameters, 13B active) models with 1M token context and MIT license. Both models offer dramatic cost advantages over frontier models from OpenAI, Anthropic, and Google. Flash is the cheapest small model at $0.14/M input, while Pro is the cheapest larger frontier model at $1.74/M input. Benchmark comparisons show competitive performance with much improved efficiency over DeepSeek V3.2.

simonwillison.net

AI ResearchDeepSeek V4LLM Pricing

opinion May 2nd, 2026

Software Jobs Up 11% Even as AI Spending Hits $650B

Citadel Securities analysis challenges AI displacement narratives, showing software engineer job postings up 11% YoY despite $650 billion in AI capital expenditure. AI adoption follows S-curve patterns rather than exponential growth, with stable real-time data showing little evidence of imminent labor displacement. The wrinkle: companies want senior architects, not junior coders, as AI tools handle entry-level work.

citadelsecurities.com

AI Impact on JobsJob Market AnalysisSoftware Engineering

opinion May 2nd, 2026

The end of "Just ask Sarah"

Every team has a Sarah who holds the institutional knowledge. AI agents can't walk over and ask her. Simon Aronsson argues that as agents start writing code, documentation like ADRs and specs shifts from courtesy to necessity, because agents extend existing patterns without understanding the reasoning behind them.

simme.dev

AI agentsdocumentationinstitutional knowledge

product launch May 2nd, 2026

Omar orchestrates 100 AI coding agents from your terminal

Omar is a terminal user interface (TUI) for creating and managing agentic organizations with deep hierarchies of parallel AI agents. Built on tmux, it lets you mix heterogeneous backends like Claude Code, Codex CLI, Cursor, and Opencode, with full control to navigate and interact with any subagent.

omar.tech

multi-agentTUIterminal-interface

technical May 2nd, 2026

Have Your Iceberg Cubed, Not Sorted: Meet Qbeast's OTree Index

A technical deep-dive into Qbeast, a spatial indexing startup from Barcelona that introduces the OTree multidimensional index for open table formats like Apache Iceberg and Delta Lake. The approach rethinks traditional indexing by using adaptive hypercubes that subdivide based on data distribution, addressing limitations of static partitioning and sorting strategies while maintaining compatibility with existing query engines.

jack-vanlightly.com

data lakehousespatial indexingopen table formats

technical May 2nd, 2026

First Responders Tell Feds: Waymos Are Getting Worse

Emergency first responders in San Francisco and Austin report that Waymo's autonomous vehicles are experiencing performance issues, with vehicles freezing, blocking fire stations, failing to respond to hand signals, and creating safety hazards during emergency situations. Officials from both cities told federal regulators that the technology's performance is "backsliding" despite Waymo's expansion plans.

wired.com

autonomous vehiclesWaymosafety

opinion May 2nd, 2026

How You Talk to AI Says More About You Than Tech

Sarah Murphy's essay uses a 16th-century scrying mirror as a metaphor for AI interaction. How you prompt LLMs reveals your psychology and work style, not universal truths about the technology. Different approaches work for different people because they're personal rituals, not transferable methods.

morrigan-tech.com

AILLMspsychology

product launch May 2nd, 2026

SNEWPapers: AI Makes 6M Historical Newspaper Articles Searchable

SNEWPapers is a newspaper archive platform that has extracted 6 million stories from 3,000+ newspaper titles spanning 1730-1960. It offers semantic search, an AI research assistant called The Sleuth that provides cited answers, and historical timelines.

snewpapers.com

Digital HumanitiesArchivesSemantic Search

technical May 2nd, 2026

AI hiring tools prefer resumes they wrote by up to 82%

Candidates using the same AI as the employer's screening tool have a 23-60% advantage in getting shortlisted. Research on 'self-preferencing bias' finds LLMs prefer resumes they generated 67-82% of the time over human-written ones. Business roles like sales and accounting show the biggest gaps. Interventions targeting how models recognize their own output can cut the bias by more than half.

arxiv.org

AI BiasAlgorithmic HiringLLM Self-Preferencing

opinion May 2nd, 2026

Brace for the patch tsunami: AI digs up decades of buried code debt

The UK's National Cyber Security Centre warns that AI security tools are digging up years of buried code vulnerabilities. Models like Claude Mythos and GPT-5.5-Cyber can now find bugs faster than teams can fix them, forcing organizations to confront technical debt they've long ignored.

theregister.com

AI securitypatch managementtechnical debt

opinion May 2nd, 2026

Santa Cruz restaurant drops AI logo after review bombing

The Salty Otter restaurant in Santa Cruz faced backlash after owner Rachael Smith used Canva's AI features to create a colorful otter-on-surfboard logo. The restaurant received numerous one-star reviews criticizing the AI-generated artwork, with reviews calling it 'cheap' and lacking artistic taste. Smith replaced the logo with plain text, but the incident shows how AI-generated content is colliding with communities that value human artistry, particularly in artist-heavy towns like Santa Cruz.

sfgate.com

AI art controversyrestaurant brandingcommunity backlash

partnership May 2nd, 2026

Uber Wants Drivers to Double as a Sensor Grid for Robotaxis

Uber plans to equip its human drivers' cars with sensors to collect real-world data for autonomous vehicle companies and AI model training. The initiative, called AV Labs, aims to create an 'AV cloud' library of labeled sensor data that partner companies can query and use to train their models. Currently operating a small dedicated fleet, Uber's long-term ambition is to use its millions of global drivers as a rolling data-collection platform to address what it identifies as the data bottleneck in AV development.

techcrunch.com

Autonomous VehiclesData CollectionUber

opinion May 2nd, 2026

Talkie-1930 Is an AI That Thinks It's 1860

Talkie-1930 is a language model trained only on pre-1930 texts that acts like a collective Victorian consciousness. Historian Benjamin Breen tested it and found the model thinks it's around 1860, reflecting who published back then rather than who existed. He sees research potential in multi-agent historical debates and counterfactual probing, but warns against treating these models as primary sources or chatting with historical figures.

resobscura.substack.com

Vintage LLMsHistorical Language ModelsDigital Humanities

technical May 2nd, 2026

GPT-5.5 catches Mythos in security benchmarks

UK's AI Security Institute found that OpenAI's GPT-5.5 matches Anthropic's Mythos Preview in cybersecurity benchmarks, achieving 71.4% on Expert tasks versus 68.6% for Mythos. GPT-5.5 solved a difficult Rust binary disassembler task in 10 minutes and matched Mythos on 'The Last Ones' data extraction test. AISI concludes Mythos's capabilities are part of general AI improvements rather than a unique breakthrough.

arstechnica.com

cybersecurityAI benchmarksmodel comparison

opinion May 2nd, 2026

Rust via Claude: This Gopher Isn't Converting

A Go developer used Claude as a pair programmer to learn Rust by building a chat server, then compared the two languages on enums, error handling, async runtimes, and debugging tools.

miren.dev

Programming languagesGoRust

opinion May 2nd, 2026

Russia's Pravda Network Rewrites Wikipedia, Poisons AI

Russian state actors are running a coordinated campaign to rewrite Wikipedia through 193 fraudulent news sites, and the manipulated narratives are already poisoning AI training data. Research from VIGINUM, the Institute for Strategic Dialogue, and the Atlantic Council documents how the Pravda network launders pro-Kremlin propaganda into Wikipedia and LLMs.

bettedangerous.com

disinformationWikipedia manipulationRussian propaganda