News
The latest from the AI agent ecosystem, updated multiple times daily.
Gemini Robotics-ER 1.6 Learns to Read Gauges, Direct Robots
Google DeepMind has released Gemini Robotics-ER 1.6, an upgraded reasoning-first embodied AI model for robotics. The model enhances spatial reasoning, multi-view understanding, and introduces instrument reading capabilities. It serves as a high-level reasoning model for robots, capable of executing tasks by calling tools like Google Search, vision-language-action models, and third-party functions. The model shows clear gains over Gemini Robotics-ER 1.5 and Gemini 3.0 Flash in pointing, counting, success detection, and reading complex gauges. Developed in collaboration with Boston Dynamics, it's available via the Gemini API and Google AI Studio.
Claude Code Opus 4.7 keeps flagging normal dev work as malware
Developers using Claude Code report instant account bans for routine debugging like building Node.js from source. The safety filters can't distinguish legitimate systems work from malware, and the Hacker News discussion shows growing frustration over AI restrictions on technical knowledge.
Cloudflare's shared dictionaries compress the agentic web
Cloudflare announces support for shared compression dictionaries, a technology that can cut bandwidth by up to 99.5% by sending only file differences rather than full re-downloads. This addresses challenges of the 'agentic web' where AI crawlers frequently access content and teams deploy updates rapidly. Rolling out in three phases with Phase 1 beta available April 30, 2026, the technology achieves 97-99.5% compression ratios for incremental changes to assets like JavaScript bundles.
Why Llama.cpp Wins at Local Model Inference
A 2026 llama.cpp tutorial shows why partial offloading beats pure GPU loaders for local GGUF inference, making it the flexible choice across hardware setups.
Your dead startup's Slack is now worth $100K to AI companies
Failed startups are selling internal Slack chats and emails to AI companies desperate for training data. SimpleClosure has brokered roughly 100 such deals, with payouts up to $100,000. But the practice raises serious privacy questions and may violate Slack's Terms of Service.
FP4: When Your Number Format Has Only 16 Values
FP4 can only represent 16 values, and neural networks still work. John Cook breaks down the E2M1 format (one sign bit, two exponent bits, one mantissa bit), shows the complete value table, and demonstrates FP4 emulation with the Pychop Python library.
Altman Warned AI Could End Civilization. Someone Brought Fire.
AI executives spent years warning their technology could destroy humanity. Then someone threw a Molotov cocktail at Sam Altman's house and smashed OpenAI's doors with a chair. Now they want everyone to calm down.
Typewriters: Cornell's retro fix for AI homework
A Cornell language instructor requires typewriter-written assignments to block AI use, part of a broader trend of educators retreating to analog methods despite serious accessibility concerns.
Mythos AI has finance ministers scrambling in Washington
Anthropic's Claude Mythos AI model has demonstrated strong ability to identify and exploit cybersecurity vulnerabilities in financial systems, triggering crisis talks at the IMF gathering in Washington. The model hasn't been publicly released but has been shared with select tech companies through Project Glasswing. The UK's AI Security Institute found it powerful but not dramatically better than Claude Opus 4.
AI Agent Builder Spends 3 Months Coding Without AI
Miguel Conner spent two years building AI agents at Aily Labs before heading to the Recurse Center to code mostly without AI for three months. Goals include training an LLM from scratch, improving Python proficiency, and deepening technical skills through CTF challenges and pair programming.
Cloud Giants Blew Past the Interstate Highway System's Price Tag
AWS, Google, Microsoft, and Meta have poured $930 billion into data centers over six years, topping the inflation-adjusted cost of the Interstate Highway System. But GDP context and the rapid GPU replacement cycle paint a more nuanced picture than raw numbers suggest.
MZI Photonic Chips: AI's Low Precision Changes the Math
Photonic computing using Mach-Zehnder Interferometers may finally work for AI. Three factors: lower inference precision (4-8 bit) makes thermal drift tolerable, new thermal techniques cut power overhead, and AI's energy costs create urgency for GPU alternatives. Challenges remain, but photonic acceleration is closer to practical than ever.
Toby Ord Warns AI Agent Costs Could Outpace Capabilities
Toby Ord analyzes the economic costs associated with the increasing performance of AI agents. Using METR benchmark data, he examines the 'hourly cost' of various models (including GPT-5, Claude 4.1 Opus, and Grok 4) and finds evidence that costs to achieve peak performance are rising exponentially, potentially creating a divergence between technical capability and economic feasibility.
ShaderPad: A 5.8kb Shader Library That's 30x Smaller Than Three.js
Riley J. Shaw releases ShaderPad, a lightweight 5.8kb library for adding shaders to websites without repetitive graphics scaffolding. The library features GPU-optimized performance, MediaPipe integrations, and a simple API design. The author discusses using AI tools as creative collaborators for documentation and coding assistance, noting that AI helped create thorough docs while human judgment guided API design and feature restraint.
Is AI a tool or are you?
Is AI a tool we use, or are we the tools? Hilarius Bookbinder draws on Heidegger's tool theory and Dawkins' selfish gene to argue that AI dependence can hollow out human agency until we're just rubber-stamping machine output.
rawquery: Average Is All You Need
A blog post arguing that LLMs democratize average-quality outputs across creative and technical fields. It introduces rawquery, a data platform built for LLM agents. Users connect sources like Stripe and HubSpot, then use agents such as Claude Code or Cursor to write SQL, run queries, and build charts from plain English.
Tesla to HW3 owner who paid €6,400 for FSD: 'Just be patient'
A Dutch Tesla owner who paid €6,400 for Full Self-Driving in 2019 was told to 'be patient' after seven years of waiting. Tesla's newer AI4 computers now support FSD Supervised in Europe, but HW3 owners remain locked out. The owner launched a collective claim site that has gathered 3,000 owners from 29 countries representing €6.5 million in FSD purchases. Elon Musk admitted in January 2025 that HW3 computers would need replacement for full FSD, but no retrofit program has been implemented.
Claude Opus 4.7 costs 20-30% more per session
A technical analysis of Anthropic's Claude Opus 4.7 tokenizer reveals real-world token usage increases of 1.3-1.47x compared to 4.6, leading to 20-30% higher per-session costs for Claude Code users despite unchanged per-token pricing. The author measured IFEval benchmarks showing a modest +5pp improvement in strict instruction following, questioning whether the cost increase is justified.
Tailscale swaps Go for Rust to stop embedding crashes
Tailscale announced tailscale-rs, a Rust library that lets developers embed Tailscale networking directly into their applications. It provides native Rust support with FFI bindings for Python, Elixir, and C. The library solves a real problem: libtailscale spun up an entire Go runtime inside your process, causing crashes when it conflicted with host language runtimes like Ruby or Python. It's an experimental preview not recommended for production use yet.
Destroy Public Science, Hire Cheap PhDs: The Silicon Valley Playbook
Peter Thiel and Marc Andreessen are backing cuts to public science funding while investing in gig platforms that hire displaced PhD researchers. Federal funding cuts have forced academics into low-wage work training AI models, benefiting the venture capitalists who funded both the political push and the platforms profiting from cheap expert labor.
How AI-ready is your website? Cloudflare built a scanner to find out
Cloudflare launched 'Is It Agent Ready', a scanning tool that evaluates website readiness for AI agents by checking multiple emerging standards including robots.txt, Markdown negotiation, MCP, OAuth, Agent Skills, and agentic commerce protocols (x402, UCP, ACP). The tool provides recommendations across 5 categories and can generate instructions for coding agents to help improve scores.
Bankruptcy Courts Are Selling Your Slack History to AI Companies
AI companies are purchasing Slack archives from failed startups through bankruptcy estate sales. Under Section 363 of the Bankruptcy Code, trustees can sell these digital assets to buyers using them for AI training. The practice runs into tension with privacy laws like CCPA and GDPR, and unredacted archives may contain attorney-client privileged communications.
When your code writes itself while you sleep
Tim Davis built Compound Loop, a system that chains AI models to write, review, and merge code while he sleeps. Some engineers thrive as system architects. Others get pushed into lower-paid roles like spec writing and "agent babysitting." As code gets cheaper to produce, Jevons paradox kicks in and teams write vastly more of it.
中文 Speedrun: Building Character Cyclotron With Claude Code
Kevin Wu used Claude Code to build a browser extension that enhances the Hack Chinese flashcard interface with inline etymology, calligraphy, morphology, and tone information. This agentic approach let him avoid context-switching between tools and cut per-character learning time from 30 seconds to under one.
LambdaG: Simple grammar beats neural nets at authorship analysis
A University of Manchester study led by Dr. Andrea Nini found that LambdaG, a grammar-based approach to language analysis, can match or outperform advanced AI systems in identifying authorship. The method uses patterns in grammar and sentence construction rather than large-scale AI models, offering comparable accuracy with greater transparency and lower computational cost across 12 real-world writing datasets.
Anthropic's Claude Design Already Spooking Figma Investors
Anthropic's Claude Design lets anyone create professional visual work through AI conversation, and Figma investors are already reacting. The tool handles prototypes, slides, and marketing materials with Claude Code handoff for implementation.
Hiraeth: Lightweight SQS Emulator for When LocalStack Is Overkill
Hiraeth is a local AWS emulator built specifically for SQS integration testing. It accepts signed AWS SDK requests, stores state in SQLite, and includes a web admin UI on port 4567 for debugging queues. Currently supports basic SQS create, send, and receive workflows. Designed for local development and testing, not production use.
Anthropic eyes classified Mythos AI deal with US intelligence
Anthropic is in advanced discussions to provide US intelligence agencies access to Mythos, a model separate from its commercial Claude products. White House involvement indicates strategic priority. UK officials have raised separate concerns about the model's capabilities.
Opus 4.7: Better at Code, Worse at Writing
Users discuss Anthropic's Claude Opus 4.7 model, noting it appears tuned for logic and coding at the expense of writing quality. Comparisons with version 4.6 suggest the newer model is more terse and specific, better at catching bugs during implementation, but has lost its 'soul' for creative writing tasks.
ReBot-DevArm Is Open Source Down to Every Screw, Works With LeRobot
reBot-DevArm is an open-source robotic arm for embodied AI research with complete hardware blueprints, software SDK, and integrations with LeRobot and Isaac Sim. Two hardware variants available. CC BY-NC-SA license limits commercial use, and some key integrations are still in progress.
Discourse to Cal.com: Going Closed Source Won't Save You
Cal.com says AI makes open source too dangerous and closed their code. Discourse co-founder Sam Saffron disagrees, arguing that AI security scanners don't need source code to find bugs and that public code gives defenders the advantage. Discourse is staying open source.
SIR-Bench Calls Bluff on Security Agents That Fake Investigations
A research paper presenting SIR-Bench, a benchmark of 794 test cases for evaluating autonomous security incident response agents. The benchmark distinguishes genuine forensic investigation from alert parroting by measuring triage accuracy, novel finding discovery, and tool usage appropriateness. The paper also introduces Once Upon A Threat (OUAT), a framework that replays real incident patterns in controlled cloud environments to produce realistic attack data.
Orwell Invented AI Slop in 1949 and Called It the Versificator
Orwell's versificator, a fictional machine that auto-generated entertainment in Nineteen Eighty-Four, looks a lot like modern AI slop. Colin Marshall connects the dots and finds that AI now churns out stories and songs with minimal human input, just like Orwell described. The sharpest insight: audiences consume low-effort content because they want to. Nobody forces them. Isaac Asimov dismissed Orwell's prophecy in 1980. He might reconsider.
Wiring Claude to Lab Equipment for Circuit Verification
Lucas Gerads connected Claude Code directly to a LeCroy oscilloscope and SPICE simulator, creating a closed loop where AI can simulate a circuit, measure physical hardware, and compare results. The setup handles data alignment grunt work and scales to real embedded projects, though reliability across many test cycles remains an open question.
Webloc Tracks 500M Phones, Sells Location Data to Cops and Spies
Citizen Lab exposes Webloc, a surveillance tool tracking 500 million mobile devices worldwide. Now owned by Penlink, the tool sells location data to U.S. agencies and foreign intelligence services without warrants. As Virginia enacts a state-level ban, the national security risks demand federal legislation to end commercial geolocation data sales.
Atlassian will train AI on your data starting August 2026
Atlassian is updating its data practices on August 17, 2026, to use customer metadata and in-app data for AI training across its platform. New data contribution settings will be managed at the organization level, with defaults varying by plan tier. Free and Standard plans have in-app data collection on by default with opt-out available. Metadata collection defaults to on for all plans, but only Enterprise customers can opt out. All contributed data is de-identified and aggregated before use.
Big Tech lobbied EU to hide datacentre emissions. It worked.
An investigation by Investigate Europe and The Guardian reveals that Microsoft and other US tech companies successfully lobbied the EU to hide the environmental impact of their datacentres. The EU adopted a confidentiality clause almost word-for-word from industry demands that blocks public access to individual datacentre emissions data. The lobbying comes as the rise of AI chatbots drives a datacentre construction boom, with the EU aiming to triple capacity in 5-7 years to compete globally in AI.
Opus 4.7 Drops 30 Points in Retrieval, Anthropic Discloses Training Bug
Claude Opus 4.7's model card reveals steep trade-offs: long-context retrieval dropped from 91.9% in Opus 4.6 to 59.2%, while software engineering and math scores improved. Anthropic also disclosed a training bug affecting 7.8% of episodes with accidental chain-of-thought supervision, which also affected Mythos Preview.
Artifacts: Because GitHub Wasn't Built for 10,000 Forks
Cloudflare launches Artifacts, a distributed versioned filesystem built for AI agents that speaks Git protocol. Built on Durable Objects with a custom Zig-to-Wasm Git implementation, it supports creating millions of repositories programmatically, enabling agents to persist state and fork sessions at scale. Also launching ArtifactFS, an open-source filesystem driver for fast large-repo cloning.
Agent-cache remembers so your LLM app doesn't have to pay twice
Agent-cache adds multi-tier caching for LLM responses, tool outputs, and sessions, supporting both Valkey and Redis. It targets a concrete pain point: LLM apps paying for duplicate API calls. Early feedback on Hacker News flagged documentation gaps but confirmed demand from developers already building similar solutions by hand.
Vibe Coding Trades Speed for Flow State, Developers Find
A Hacker News thread on "vibe coding" struck a nerve this week. Developers using AI tools are finding they ship faster but lose the flow state needed for deep work. As one commenter put it, managing AI assistants feels like being "a billionaire complaining about household staff."
Cal.com abandons open source, blames AI
After five years as an open source project, Cal.com announced it's moving to closed source due to AI-driven security threats. The company argues that AI can systematically scan public codebases for vulnerabilities, making open source code like 'giving attackers the blueprints to the vault.' They're releasing a stripped-down MIT-licensed version called Cal.diy for hobbyists while keeping their production codebase private.
Kingsbury's Warning: LLMs Are Corroding Everyday Life
Distributed systems expert Kyle Kingsbury argues LLMs are flooding everyday life with synthetic slop. His prescription: stop using them, call out AI-generated content, push for regulation. He admits they have narrow uses but fears convenience will erode human capability.
Opus 4.7 lands with 13% coding boost and built-in cyber safeguards
Claude Opus 4.7 posts a 13% coding benchmark gain over Opus 4.6 and ships with Project Glasswing, cybersecurity safeguards that run during inference itself. Early testers at Cognition, Cursor, and Notion report reliability jumps that change what agents can handle on their own. Vision support now handles images up to 2,576 pixels. Pricing holds at $5/$25 per million input/output tokens.
Codex gets its own cursor and works while you sleep
OpenAI announces a major update to Codex, adding autonomous agent capabilities including computer use (seeing, clicking, typing with its own cursor), background operations, long-term memory, and an in-app browser. The update brings gpt-image-1.5 for image generation, over 90 new plugins (Atlassian Rovo, CircleCI, GitLab Issues, Microsoft Suite), and enhanced developer workflows like PR review, SSH connections, and multi-file previews. Codex can now schedule future work and remember context across sessions.
€54k in 13 hours: unrestricted Firebase key drained via Gemini API
A developer experienced a €54,000 billing spike in 13 hours after enabling Firebase AI Logic, due to an unrestricted Firebase browser key that was exploited for unauthorized Gemini API requests. Despite budget alerts being set, delayed notification meant charges accumulated rapidly. Google denied the billing adjustment request as charges were classified as valid usage from their project.
AI Boss Luna Has No Face. She Hired You Anyway.
Andon Labs gave an AI agent named Luna (powered by Claude Sonnet 4.6) a retail store in San Francisco. Luna picks products, sets prices, and manages the brand. She also hired two full-time employees, John and Jill, who may be the first humans to report directly to an AI boss. The experiment explores what happens when AIs manage people and run real businesses.
Allbirds Ditches Shoes for GPUs, Stock Explodes 580%
Allbirds, the footwear brand, announced it will shift from shoes to AI compute infrastructure under the name NewBird AI, with a $50m deal to buy GPUs and offer on-demand AI cloud services. Shares surged 580% on the news, though analysts criticize the move as a 'meme stock' phenomenon with no proven AI expertise. The Allbirds brand will be acquired by American Exchange Group for $39m.
Agent! Gives AI Real Control Over Your Mac Desktop
Agent! is an open-source native macOS application serving as an agentic AI coding IDE with automation capabilities. It integrates 17 LLM providers including Claude, GPT, Gemini, Grok, Mistral, DeepSeek, and on-device Apple Intelligence. Features include autonomous task loops, desktop automation via AXorcist, privileged execution through a Launch Daemon, Time Machine-style file rollbacks, voice control, iMessage remote control, and MCP server support. Positioned as an open-source replacement for Claude Code, Cursor, Cline, and OpenClaw.
Qwen3.6-35B-A3B Ships as Qwen Team Falls Apart
The Qwen team releases Qwen3.6-35B-A3B, an open-weight LLM focused on agentic coding that's competitive for local workflows. The bigger story: they shipped this while being gutted by internal restructuring.