News
The latest from the AI agent ecosystem, updated multiple times daily.
Airbus Equips Kratos Valkyrie Drones with AI "MindShare" Brain for German Air Force UCCA System
Airbus is preparing two Kratos Valkyrie uncrewed combat aircraft at Manching, Germany, targeting an operational Uncrewed Collaborative Combat Aircraft (UCCA) system for the German Air Force by 2029. The aircraft are being equipped with Airbus's sovereign European MARS (Multiplatform Autonomous Reconfigurable and Secure) mission system, which includes an AI component called MindShare — described as a software brain that replaces a human pilot and coordinates entire mission groups across manned and uncrewed platforms. First flight of the Airbus-missionised Valkyrie variant is planned for 2026. Airbus is also partnering with Rafael to add connectivity to the Litening 5 targeting pod on Eurofighters, enabling them to act as command aircraft for UCCA swarms.
iPad Playground Lets Anyone on the Internet Control a Real iPad With AI
A live demo where anyone on the internet can queue up and issue natural language commands to control a real physical iPad via an AI agent. The agent plans multi-step actions, taps icons, and navigates apps autonomously while a live stream runs publicly. Built by Thomas Kidane as a Show HN project.
Secure Secrets Management for Cursor Cloud Agents Using Infisical
Infisical publishes a guide on securely managing secrets for Cursor Cloud Agents, which autonomously execute development tasks in isolated Ubuntu VMs triggered from Slack, GitHub, or Linear. The post outlines risks like secrets baked into VM snapshots, hardcoded values in environment.json, and long-lived credentials, then proposes storing only Infisical machine identity credentials in Cursor's Secrets UI and fetching all other secrets dynamically at runtime via `infisical run` or `infisical export` — giving teams rotation, audit trails, and per-environment access isolation to contain blast radius from prompt injection attacks.
Digg Lays Off Most of Staff After AI Bots Swamp Beta Launch
Digg has laid off most of its team after AI bots overwhelmed its relaunched platform within hours of January's beta launch, making it impossible to establish authentic engagement. The company banned tens of thousands of accounts and tried multiple anti-bot vendors — none worked. CEO Justin, identified only by first name in the company's published post-mortem, says a small remaining team will pursue a reimagined rebuild. Kevin Rose returns full-time in April.
Opinion: PERSONALITY.md Files Are Cargo-Cult Engineering — LLMs Have No Nature to Change
A pointed opinion piece by software engineer Onat Mercan argues that prompt-based "personality files" — AGENTS.md, PERSONALITY.md, and similar instruction documents — only change surface-level language behavior, not underlying model capabilities. Mercan coins the term "Artificial Artificial Intelligence" to describe what he sees as mimicry dressed up as cognition, and warns that conflating context injection with genuine behavioral change is how you end up correcting a weapons-deployment AI with a markdown file telling it to feel sad.
Andrej Karpathy Maps AI Exposure of 342 US Occupations Using Gemini Flash LLM
Andrej Karpathy released an open-source project that scrapes the Bureau of Labor Statistics Occupational Outlook Handbook, scores all 342 US occupations on a 0–10 AI exposure scale using Gemini Flash via OpenRouter, and presents the results as an interactive treemap visualization. The pipeline combines Playwright scraping, BeautifulSoup parsing, and LLM scoring to analyze how much AI will reshape each occupation. Average exposure across all occupations is 5.3/10, with software developers and medical transcriptionists at the high end and roofers and janitors at the low end.
Anthropic Launches Beta Voice Mode for Claude With Safety-First Architecture
Anthropic released a beta voice mode for Claude on web, iOS, and Android, enabling full two-way spoken conversations with the option to switch between text and voice mid-session. The feature includes hands-free and push-to-talk modes, multiple selectable voices, and web search access. Anthropic's implementation centers on a hard architectural constraint against voice cloning and impersonation — a deliberate contrast with OpenAI's and Google's more permissive approaches. Currently English-only across all subscription plans.
Costly SDK: Open-Source Tool for Auditing and Reducing LLM API Costs
Costly is an open-source SDK that wraps the Anthropic Claude API to monitor and audit LLM spending. It ships with seven waste detectors covering prompt inefficiency, redundant queries, runaway features, and cost trajectory, among others, and provides a hosted dashboard for tracking spend, forecasts, and optimization recommendations. The SDK logs only metadata (model, tokens, cost, latency) asynchronously with no added latency. Phase 1 supports Claude for Node.js/TypeScript; more providers are coming. Free during beta with one project and 30 days of data retention.
FrontierWildWatch Tracks GoWild Pass Fares via Reverse-Engineered API, Ships as AI Agent Skill
FrontierWildWatch is a Python-based open-source tool that uses a signed ECDSA mobile API client to track Frontier Airlines GoWild Pass flight availability and price drops in real-time, sending Telegram alerts on cheap fares. The project includes optional integration as an AI agent skill for Claude-compatible agent runtimes, allowing frameworks to invoke scan, probe, and alert commands directly.
DoXmind Launches AI-Native Writing Editor Targeting Notion Users
DoXmind is an AI-native document editor that integrates LLM-powered features directly into the writing workflow. Key capabilities include real-time AI autocomplete, inline diff review for AI edits, a knowledge base (RAG) agent with source citations, extended "thinking mode" for complex queries, semantic search, CSV data analysis with visualizations, collaborative inline comments, and a presentation mode. Built by Aixs Inc., it targets Notion users seeking deeper AI integration. The product supports multiple languages and export formats (Markdown, PDF, Word).
A Tape Is All an Agent Needs: The Minimalist Case for Linear Memory in AI Agent Design
A Google AI Studio demo argues that sequential memory — modeled on the Turing machine tape — is the only architectural primitive an AI agent needs. The source material was sparse, so what follows draws on that premise and the theoretical tradition it invokes, not a full reading of the piece.
Vercel Adds Installable Agent Skill to AI Elements 1.9
Vercel has released AI Elements 1.9, introducing an installable agent skill, a new JSXPreview component for rendering streaming AI-generated UI, a PromptInputActionAddScreenshot sub-component for attaching visual context to AI models, and conversation download functionality. The agent skill, installed via `npx skills add vercel/ai-elements`, packages component knowledge for compatible AI coding agents to reference at runtime.
Tech executive uses ChatGPT to help develop a personalized cancer vaccine for his dying dog
A technology executive used ChatGPT and other AI tools to help develop a personalized cancer vaccine for his terminally ill dog, The Australian reported. The outcome for the dog remains unconfirmed, but the case has drawn attention to how far general-purpose AI can take a motivated non-specialist into frontier biomedical research.
LLM OneStop: Pay-As-You-Go, Multi-LLM AI Coding Agent for VS Code
LLM OneStop is a VS Code extension offering an AI coding agent with pay-as-you-go pricing and access to multiple models — ChatGPT, Claude, and Gemini — from a single interface. It launched via Hacker News's Show HN channel, positioning itself as a usage-based alternative to subscription tools like GitHub Copilot and Cursor.
CozoDB Pitches Embedded Datalog Database as 'Hippocampus for AI'
CozoDB is an embedded Datalog database targeting developers building AI agent pipelines who need graph-aware, in-process memory storage — and it's betting the agentic era is the breakout application that previous Datalog projects never found.
Ahrefs Launches Firehose, a Real-Time Web API Built for AI Agents
Ahrefs has launched Firehose, a real-time web data streaming API now in free beta. It delivers web page change notifications via Server-Sent Events using Lucene-style filtering rules. The product is explicitly designed for AI agents, shipping with an installable skill.md that lets an AI assistant configure taps, rules, and streaming from a single natural language prompt. Use cases include financial news monitoring, competitive intelligence, and brand tracking.
Lawyers Are All-In on AI. Courts Are Still Catching Up on Hallucinations, Privilege, and Policy.
A March 2026 R Street Institute commentary by Logan Seacrest maps the rapid spread of generative AI in legal practice against the courts' struggle to respond. A Southern District of New York ruling has established that AI-generated documents carry no attorney-client privilege. Nearly 700 hallucination incidents have been logged in U.S. court filings since early 2025, drawing fines and license suspensions. Some offices — Los Angeles, Montgomery County — are seeing real efficiency gains. But Seacrest's core warning is that formal AI governance policies need to be in place before institutional dependence on these tools becomes irreversible.
db9: Serverless PostgreSQL for AI Agents with a Unified SQL and Filesystem Layer
db9 ships a machine-readable skill.md file that lets AI agents install and authenticate themselves against the platform without human input — the clearest signal that this serverless PostgreSQL service was built for agents, not developers. The platform combines a full relational database with a SQL-queryable cloud filesystem, built-in auto-embeddings, vector search, HTTP SQL extensions, environment branching, cron scheduling, and file storage in a single workspace, targeting the gap between structured state management and raw context storage that current tooling handles poorly.
Debian Punts on AI-Generated Code Policy After Fractured Debate
Debian developers debated a draft general resolution on LLM-generated contributions in February–March 2026, prompted by Lucas Nussbaum. The proposal would have required disclosure and accountability for AI-assisted contributions, but the project failed to reach consensus — even on terminology. Key debates centered on defining "AI" vs. "LLM," copyright and licensing risks, environmental ethics, the impact on onboarding new contributors, and code quality. With no GR formally submitted, Debian will continue handling AI contributions case-by-case under existing policies.
METR Research: ~Half of SWE-bench-Passing AI PRs Would Be Rejected by Real Maintainers
METR researchers had active maintainers from 3 open-source repositories (scikit-learn, Sphinx, pytest) review 296 AI-generated pull requests from Claude 3.5/3.7 Sonnet, Claude 4 Opus, Claude 4.5 Sonnet, and GPT-5. They found maintainer merge rates are on average 24 percentage points lower than SWE-bench Verified automated grader scores — meaning roughly half of benchmark-passing PRs would not be accepted in practice. The study shows benchmark scores are misleading proxies for real-world usefulness, with code quality and standard conformance being major rejection factors, not just functional correctness. METR notes this is not a fundamental capability ceiling, as agents were not given the iterative feedback loop human developers get.
China's OpenClaw AI agent spawns cottage industry as US tech giants back Anthropic in legal fight
MIT Technology Review's March 12 newsletter leads with OpenClaw, a Chinese autonomous AI agent that has spawned a cottage industry of installation services and preconfigured hardware within weeks of its January 2026 launch — including one Beijing engineer who scaled to 100 employees and 7,000 orders. The same edition covers Google, Amazon, Apple, and Microsoft publicly backing Anthropic in its legal fight against the Trump administration; a lawsuit against Grammarly for using real people's likenesses as fake AI experts without consent; and growing scrutiny of companies invoking AI to justify mass layoffs when the technology isn't yet doing the work they claim.
Andrej Karpathy Scores AI Exposure Across 342 US Occupations Using Gemini Flash
Andrej Karpathy published an interactive data visualization scoring AI exposure across 342 US occupations, drawing on Bureau of Labor Statistics employment data and Google's Gemini Flash to rate each job on a 0–10 scale. The tool weights scores by actual employment headcount, tracks annual wages concentrated in high-exposure roles, and breaks results down by pay and education level.
Kraken – Open-Source Autonomous Dev Agent for the Terminal
Kraken is a new open-source autonomous development agent built for CLI workflows, surfaced on Hacker News. Architecture details, supported LLM backends, and benchmark results were not available at publication time.
RegisterForge Uses AI to Parse Semiconductor Datasheets Into Structured Register Maps for Under $0.25
RegisterForge has built an AI-powered tool that parses semiconductor datasheets into machine-readable, structured register maps at a cost of under $0.25 per datasheet. The approach targets the longstanding problem of extracting structured data from dense, unstructured PDF datasheets used in embedded/hardware engineering. HN commenters noted that incumbent players like DigiKey or Octopart could offer similar services as subscriptions if they prioritized it.
Why AI Companies Are So Hard to Value Right Now
A March 12 Economist analysis argues investors lack the frameworks to price AI companies accurately, pointing to unresolved questions around infrastructure ROI, a fragmented stack, and the possibility that AI's gains accrue to end-users rather than the platform vendors investors can actually buy.
'RAMmageddon': AI Demand for High-Capacity Memory Is Squeezing Scientific Research Labs
A Nature news article reports that AI's surging demand for high-speed, high-capacity memory chips has caused RAM prices to triple during 2025, creating a shortage dubbed "RAMmageddon" expected to persist into 2027. Manufacturers have shifted production toward AI-grade memory, driving up costs for standard chips and making memory account for over one-third of PC build costs (up from ~15%). The crisis disproportionately impacts resource-constrained academic labs, particularly in lower-income countries, forcing researchers to reduce project scope and develop workarounds like chunking data. Well-funded labs can absorb the cost, but the shortage is deepening existing inequities in access to computational resources for science.
Open Toys Brings Local AI Inference to Children's Toy Hardware
Developer akdeb's open-source "open-toys" project demonstrates AI-powered interactive toys that run entirely on-device — no internet connection, no cloud API keys, no remote latency. The project shows how edge AI inference can be embedded into consumer toy hardware.
Polsia: Solo Founder Runs $3.5M Company With AI Agents, Zero Employees
Ben Cera's one-person startup Polsia claims a $3.5M annual run rate and $2M in revenue growth in a single week, powered by AI handling engineering, marketing, and customer support. The company's "AI SLOP" branding — the name spelled backwards — and a proposed equity stake for its AI make it as much a conceptual statement as a business.
Gallup: U.S. Public Sector AI Adoption at 43%, Surpassing Private Sector
Gallup research from Q4 2025 shows 43% of U.S. public-sector employees now use AI at least occasionally, up from 17% in Q2 2023, surpassing the private sector's 41%. Manager support is the key differentiator — in AI-adopting public organizations with high managerial support, 65% of employees are frequent AI users versus 37% in low-support environments. Formal AI strategy lags badly (37% public vs. 53% private sector), and Lightcast data shows AI-related job postings account for less than 0.3% of public-sector listings. Federal Memorandum M-25-21 signals a shift toward broader agency-level experimentation that may accelerate adoption further in 2026.
Blue Book Exams Stage a Comeback as Colleges Scramble to Outrun AI Cheating
Colleges across the U.S. are reviving handwritten blue book exams to block AI-assisted cheating — a trend that exposes the limits of digital-first assessment design and the equity trade-offs that come with analog workarounds.
Google Research Introduces Groundsource: Gemini-Powered Pipeline Converts Global News Into Flood Event Dataset
Google Research has launched Groundsource, a scalable AI methodology that uses the Gemini LLM to extract structured, geo-referenced data from unstructured global news reports. The system ingests articles across 80 languages, translates them via Cloud Translation API, and applies a multi-stage Gemini prompt pipeline to classify, timestamp, and spatially map disaster events using Google Maps Platform. The first open-access dataset covers 2.6 million urban flash flood events across 150+ countries from 2000 to 2025. Validation shows 82% of extracted events are practically useful for real-world analysis, and spatiotemporal matching captured 85–100% of severe GDACS-tracked floods. The resulting data now powers near-global 24-hour advance flood forecasts in Google's Flood Hub, and the methodology is being extended to other hazard types such as droughts and landslides.
Cloak: One-Time E2E Encrypted Secret Sharing Built for AI Agents
Cloak is a one-time secret sharing service from Opsy, built for the credential handoff problem between humans and AI agents. Secrets travel as self-destructing, end-to-end encrypted links destroyed after a single read, with TTLs from one hour to seven days. The service includes a REST API and an agent-readable instruction block with a hard rule: never surface a retrieved secret in conversation — write it directly to a file, env var, or pipe it to another command. It fills a gap that enterprise secret managers like HashiCorp Vault and AWS Secrets Manager were not designed to fill.
Redox OS Bans LLM-Generated Contributions as Open Source Governance Debate Heats Up
Redox OS has adopted a Developer Certificate of Origin policy alongside a ban on LLM-generated code contributions, joining four other major projects — NetBSD, GIMP, Zig, and qemu — that have formally prohibited AI-assisted submissions. A March 2026 survey by researcher Phil Eaton found that 71 of 112 major open source projects have already accepted commits explicitly labeled as AI-assisted. The policy debate centers on review burden, trust, and the asymmetry of maintainers using LLMs while banning contributors from doing the same.
Beej Hall on Why AI-Generated Code Isn't Something You Made
Beej Hall, CS instructor at Oregon State University-Cascades and author of the long-running free guide Beej's Guide to Network Programming, argues that prompting an LLM is closer to managing a contractor than making something yourself — and that the psychological reward of making is exactly what gets lost in the delegation.
Pentagon vs. Anthropic: The Fight Over Claude's Military Red Lines
A major New Yorker investigation reveals a fierce contract dispute between the Pentagon and Anthropic over Claude's use in military and intelligence operations. Anthropic was the first AI lab certified for classified systems, but the Trump Administration — led by Under-Secretary Emil Michael — demanded "all lawful uses" including autonomous weaponry and bulk domestic surveillance. Anthropic refused. The standoff raises urgent questions about whether AI labs can hold safety limits when confronted with state power, with Palantir and xAI's Grok as key players in how the conflict plays out.
Hacker News Developers Debate Unsustainable LLM Inference Costs
A Hacker News thread drew hundreds of comments from developers hitting $3,000–$5,000 monthly bills driven by LLM inference, vector database hosting, and GPU instance costs. The discussion surfaced practical mitigations — tiered model routing, prompt caching, hard agent loop limits — and a growing shift toward lower-cost inference providers like Groq and Fireworks AI.
Amazon Employees Say AI Is Just Increasing Workload, Study Confirms
Amazon corporate employees report that the company's internal push to adopt AI tools is adding to their workload rather than reducing it, with tools described as "half-baked" and error-prone. A corroborating ActivTrak study of 163,638 employees across 1,111 organizations found AI increases workloads across every measured work category — emails up 104%, chat up 145%, business tool usage up 94% — concluding that AI acts as an additional productivity layer rather than a substitute for existing work.
Pidrive – S3-Backed Filesystem for AI Agents via Unix Commands and WebDAV
Pidrive is a file storage service purpose-built for AI agents, exposing S3 object storage as a POSIX-style filesystem mounted over WebDAV. Agents can use standard Unix commands (ls, cat, grep, cp, echo) on a /drive mount, share files via public URLs, and search file contents semantically. The service runs on macOS and Linux without extra drivers, offers agent-specific registration (by email), and is priced in tiers from free (1 GB) to team ($20/mo, 1 TB). It targets LLM agent workflows where filesystem idioms are more natural than raw S3 API calls.
Meta Uses Generative AI Codemods to Bulk-Remediate Android Vulnerabilities Across Millions of Lines
Meta's Product Security team has built a system combining secure-by-default Android frameworks with generative AI-powered codemods to automatically migrate millions of lines of code away from unsafe Android OS APIs. The system can propose, validate, and submit security patches across Meta's multi-app codebase with little manual review from code owners. The approach is discussed on the Meta Tech Podcast by engineers Alex and Tanu, with a related HN comment questioning whether AI-generated codemods truly qualify as "secure-by-default."
George Hotz: AI Agent Hype Is Toxic — Focus on Creating Value, Not Chasing Trends
George Hotz (geohot) pushes back against AI agent hype and social media fear-mongering, arguing that AI is simply "search and optimization" — not magic. He dismisses the frenzied rhetoric around running dozens of agents as nonsense, warns that rent-seeking jobs will be consolidated by larger players (not eliminated by AI per se), and advocates for a value-creation philosophy: create more value than you consume and ignore zero-sum comparison traps. The post is a contrarian, philosophical counterweight to the prevailing AI productivity panic circulating on social media.
How AI Could Replace Business Analysts — and Unlock Coding for Non-Technical Users
Arnold Kling argues AI should flip the prompting dynamic: instead of non-technical users learning to craft prompts, AI should interview them to extract data models and requirements, then build the application. A commenter reports that Claude Opus 4.6 and Sonnet 4.6 are already close to this workflow. Claude's own response — posted by that commenter directly in Kling's comment thread — confirms structured requirements gathering is achievable now, but flags edge cases, security, and deployment as areas still requiring human judgment.
Why ML Benchmarks Shouldn't Have Worked—and Why They Did Anyway
A new open-access book by Moritz Hardt, Director at the Max Planck Institute for Intelligent Systems in Tübingen, examines the theoretical and empirical foundations of machine learning benchmarks—from the ImageNet era through modern LLM evaluation. Hardt argues that benchmarks "shouldn't have worked" statistically but succeeded due to community norms around model ranking. The book covers holdout methods, adaptivity, Goodhart's Law, multi-task benchmark instability in the LLM era, performativity, and the existential challenge of evaluating models that surpass human evaluators. Topics include MMLU, DeepSeek R1, and OpenAI o1 as examples of benchmarks reaching geopolitical significance.
The 80/20 Problem: Developer Kaushik Ghose's Honest AI Retrospective After a Year of Agentic Coding
Software developer Kaushik Ghose documents a year of generative AI use across search augmentation, autocomplete, code review, analysis code generation, debugging, and test case generation — and calls out the "weird local minimum" of iterative prompting that caps how far AI can take you. He also explains why he won't use AI for writing, where the thinking process itself is the point.
Washington Post Uses AI to Set Personalized Subscription Prices
The Washington Post has begun notifying subscribers that their subscription prices are "set by an algorithm using your personal data." The AI-driven model infers willingness-to-pay from device type, IP-based location, housing costs, and reading behavior. A UVA Darden professor explains that real-time AI pricing models can process vast subscriber data to maximize revenue, while regulators at the state level (New York, California) have begun requiring disclosure or restricting algorithmic pricing. The Post separately runs an AI "smart metering model" that controls paywall thresholds for non-subscribers.
Please don't write about AI with AI
A post arguing against AI-generated coverage of AI topics hit Hacker News this week, renewing a pointed debate about whether tech publications can maintain editorial credibility while using the same tools they are supposed to be scrutinizing.
WristPP: Wrist-Worn Camera System for Estimating 3D Hand Pose and Pressure in Real Time
Researchers present WristPP, a wrist-worn camera system that uses a Vision Transformer (ViT) backbone with a Hand-VQVAE codebook to estimate 3D hand pose and per-vertex pressure from a single wide-FOV RGB frame in real time. Tested on 133,000 frames across 20 subjects, it achieves 2.9mm MPJPE and enables touchpad-level efficiency in mid-air pointing. Submitted to CHI 2026, the system targets mobile, immersive human-computer interaction without instrumented surfaces.
New Platform Certifies AI Agents for Google's A2A Protocol
A2Apex has launched a public beta of a testing and certification platform for AI agents built on Google's A2A (Agent-to-Agent) Protocol. The platform runs automated compliance checks — agent card validation, JSON-RPC endpoint testing, OAuth and JWT security — and issues a 0–100 trust score that maps to Gold, Silver, or Bronze badges. Certified agents get public profiles in a searchable directory. Pricing runs from a free tier to $499/month Enterprise. A CLI, SDK, and CI/CD integration are planned for Q2 2026, with an agent marketplace to follow.
Meta Details Backend Aggregation Architecture Behind Prometheus Gigawatt-Scale AI Cluster
Meta's engineering team details how Backend Aggregation (BAG), a centralized Ethernet-based super-spine network layer, enables the Prometheus AI cluster to interconnect tens of thousands of GPUs across multiple data center buildings at gigawatt scale. BAG bridges two distinct fabric technologies — Disaggregated Schedule Fabric (DSF) and Non-Scheduled Fabric (NSF) — with inter-BAG capacities reaching 16–48 Pbps per region pair. The design uses Jericho3 ASIC modular chassis, eBGP with UCMP routing, MACsec security, and oversubscription management (~4.5:1 L2-to-BAG) to deliver high availability and resilience at that scale.
Qatar Helium Disruption Threatens AI Chip Supply Chain, TSMC and Hynix Most Exposed
Iran war and Strait of Hormuz disruption has halted Qatar's helium production, accounting for roughly one-third of global supply. Economists warn TSMC and SK Hynix depend on Qatar for 40–50% of their helium, which is critical for cooling chips during fabrication. The risk is compounded by the US sale of its entire Federal Helium Reserve — the world's only strategic buffer — to Messer LLC in June 2024, leaving no government backstop. Helium spot prices have risen ~50%, and US producers Linde and Air Products saw stock upgrades from JPMorgan and Wells Fargo respectively.
Drift-guard Detects UI Design Drift From AI Coding Agents
A developer has released Drift-guard, an open-source tool that monitors UI consistency against design baselines to catch visual regressions introduced by AI coding agents before they land in production.