Agent Wars
technical Mar 13th, 2026

Sloppypaste: Naming an AI Bad Habit — and Pitching the Fix

A new site coins 'sloppypaste' for the habit of dumping unread AI output on colleagues, then pivots to pitching Agent Relay — infrastructure that promises to cut humans out of inter-agent handoffs entirely. It's a clever double move: name the behaviour, then sell the architectural fix. Whether either the awareness campaign or the product behind it has real traction is less clear.

Agent Wars
technical Mar 13th, 2026

Claude Opus 4.6 Reportedly Proves Erdős Prime Divisibility Conjecture for Binomial Coefficients

A PDF circulating on Hacker News this week claims that Anthropic's Claude Opus 4.6 has solved the Erdős Prime Divisibility Conjecture for Binomial Coefficients, showing that for all integers 1 ≤ i < j ≤ n/2 with n ≥ 2j, there exists a prime p ≥ i dividing gcd(C(n,i), C(n,j)). The proof combines algebraic tools including the Prime Power Bridge Lemma and Cofactor Escape Lemma, Diophantine methods via S-unit equations, and computational verification of over 109 million triples with n ≤ 4400. The Hacker News post presents the document as a polished proof claiming peer-review readiness, but no independent expert verification has been publicly cited.

Agent Wars
technical Mar 13th, 2026

CapNet Gives AI Agents a Permission Slip Instead of a Master Key

CapNet is an open-source permission proxy that replaces the raw API keys and OAuth tokens typically handed to AI agents with narrowly scoped, cryptographically signed capability tokens — described by its author as 'OAuth for actions.' Built by developer Connerlevi, the proof-of-concept enforces spend limits, tool allowlists, and vendor restrictions, supports delegation with automatic attenuation across sub-agents, and provides cascade revocation and immutable audit logs. It ships with an MCP gateway, OpenClaw plugin, Chrome extension wallet, and six attack-scenario demos.

Agent Wars
technical Mar 13th, 2026

Copilots didn't move the macro needle. Now comes the agent wave.

A Financial Times analysis finds that surging AI adoption still hasn't shifted aggregate productivity statistics in the US, UK, or EU — reviving the Solow Paradox. The more pointed question for the agent industry: if first-generation copilots couldn't move the needle, will autonomous agents automating entire workflows finally make the difference, and how long before it shows up in the data?

Agent Wars
technical Mar 13th, 2026

Glimpse gives macOS agents a native face — no Electron required

Glimpse is a lightweight native macOS UI library that opens a WKWebView window in under 50ms via a bidirectional JSON Lines protocol over stdin/stdout. Built with Swift and wrapped in Node.js, it requires no Electron or browser dependencies. Designed explicitly for AI agent workflows, it supports floating overlays, cursor-following companion widgets, and transparent HUDs. It integrates natively with the 'pi' coding agent, providing a floating status pill that tracks agent activity in real-time.

Agent Wars
technical Mar 13th, 2026

Claude Forge – GAN-Inspired Adversarial Multi-Agent Pipeline for Claude Code

Claude Forge is an open-source adversarial development pipeline built for Claude Code that applies GAN (Generative Adversarial Network) architecture principles to software development workflows. It features five specialized AI agent roles — Planner, Plan Reviewer, Implementer, Code Reviewer, and Final Reviewer — organized as generators and discriminators. Agents communicate via structured signals and a shared feedback.md file, with safety rails including a max 3-iteration loop, immutable plan documents, and human-in-the-loop escalation on NO-GO signals.

Agent Wars
technical Mar 13th, 2026

Rust MCP Server Gives Claude a Stateful Workbench for Ontology Engineering

Open Ontologies is a Rust-based MCP server that wraps an Oxigraph triple store behind 39 callable tools and 5 workflow prompts, letting Claude iteratively build, validate, and version RDF/OWL ontologies rather than generating them in a single, unverified pass.

Agent Wars
technical Mar 13th, 2026

Developer Builds Custom Voice-to-Text Pipeline Optimised for Parallel Claude Code Sessions

A developer built a custom voice-to-text setup using Claude Code that features three speed modes: a fast local mode using Nvidia Parakeet v3, a medium mode using Parakeet plus GPT OSS 120B on Cerebras for LLM-based corrections, and a slow high-quality mode using ElevenLabs Scribe V2 plus Claude Opus 4.6. The tool integrates with Zellij (a terminal multiplexer), supports concurrent transcriptions routed to the correct pane, and was purpose-built for interacting with multiple parallel Claude Code coding agent sessions. It outperformed commercial options like SuperWhisper and VoiceInk for this developer's AI-agent-heavy workflow.

Agent Wars
technical Mar 13th, 2026

AI Hiring Predicts R&D Spend by Six Months, New Benchmarking Tool Finds

Company Profiler, a new AI readiness benchmarking platform, finds that AI hiring volume predicts R&D investment by six to twelve months — making job postings a forward-looking signal rather than a lagging one. The tool scores more than 500 companies across 15 industries using job posting data, SEC filings, and earnings calls. Software companies average 75/100; retail averages 42. Built by Mike Berkley, a former product executive at Spotify, Fubo, Axios, and Viacom, the platform is currently free to use.

Agent Wars
technical Mar 13th, 2026

RNSR claims a perfect FinanceBench score — and it never chunks a single document

RNSR (Recursive Neural-Symbolic Retriever) is an open-source document retrieval system claiming 100% accuracy and 0% hallucination on FinanceBench. It replaces traditional chunking-based RAG with hierarchical structure preservation, combining a Font Histogram Algorithm for document hierarchy detection, Recursive Language Models (RLM) that write navigation code, Knowledge Graphs for entity/relationship extraction, Tree-of-Thoughts reasoning, and a unified SQLite-backed store. It benchmarks against GPT-4 RAG (~60%) and Claude RAG (~65%), and supports OpenAI, Anthropic, and Gemini as LLM providers.

Agent Wars
technical Mar 13th, 2026

One Developer's Blueprint for Killing the AI Chat Interface

Anton Krylov's Idea Cells proposal treats the AI chat interface as a design mistake and replaces it with a Jupyter-style canvas of typed cells, each scoped to a specific category of knowledge work. The taxonomy runs from terminal and writer cells to structured idea generation, formal reasoning units (conjecture, lemma, proof-gap), and data visualization. A typed linking system routes outputs between cells rather than collapsing them into pasted text, and the entire canvas versions like a Git repository.

Agent Wars
partnership Mar 13th, 2026

systemd now requires AI agent disclosure on patches — and ships documentation to match

systemd 260-rc3 adds formal AI agent guidance to the project, including a new AGENTS.md file documenting the systemd architecture, coding style, development workflow, and contribution guidelines for AI coding agents. A companion CLAUDE.md file references AGENTS.md specifically to assist Claude Code, and a new claude-review.yml enables AI-assisted pull request reviews via Claude Code. Notably, systemd now requires AI disclosure tags — modeled on the existing 'Co-developed-by' Git trailer — on AI-assisted patches.

Agent Wars
technical Mar 13th, 2026

RestaRules – A robots.txt for AI agent behaviour at venues

As AI booking agents proliferate, a new open-source project is trying to hand restaurants a simple instrument of control before the industry's window to self-regulate closes. RestaRules proposes a machine-readable JSON file, hosted at a standard path on any venue's web server, that tells AI agents exactly what they're allowed to do — and crucially, what they're not.

Agent Wars
technical Mar 13th, 2026

ScraperNode Publishes 8,697 n8n Templates — 5,942 Are AI-Powered Workflows

ScraperNode has published a GitHub repository of 8,697 n8n automation templates under MIT licence — 5,942 of them AI-powered, a ratio that reflects how quickly plug-and-play agentic tooling is accumulating in the no-code market. Categories span agent orchestration, RAG chatbots, LLM integrations, and MCP server patterns across providers including OpenAI, Anthropic Claude, and Google Gemini.

Agent Wars
technical Mar 13th, 2026

VIBE: The Four-Principle Framework Calling Out a Year of AI-Assisted Engineering Mistakes

A framework articulating four principles (Value over Velocity, Intent before Implementation, Build the Right Foundations, Evolve the System) for engineering teams navigating AI-assisted development. Argues that while AI coding agents make code generation trivially fast, product thinking, good design, and architectural discipline remain essential and must not be bypassed by the ease of prompting.

Agent Wars
technical Mar 13th, 2026

A HuggingFace Project Is Ranking the AI Rankers

MAYA-AI/all-leaderboard tracks hundreds of AI benchmarks by HuggingFace trending scores and community likes — no editorial gatekeeping. It covers stalwarts like Open LLM Leaderboard and Chatbot Arena alongside newer arrivals like FINAL Bench, Smol AI WorldCup, and ALL Bench, with sorting, domain filters, and real-time global rank visibility.

Agent Wars
technical Mar 13th, 2026

MCP Security 2026: 30 CVEs in 60 Days — What Went Wrong

A deep-dive security analysis documenting 30+ CVEs targeting the Model Context Protocol (MCP) ecosystem between January–February 2026, covering 2,614 implementations scanned. Key findings: 82% vulnerable to path traversal, 38–41% lack authentication, and CVE-2025-6514 (mcp-remote, CVSS 9.6) affected 437,000+ downloads. Five core attack patterns are catalogued — tool poisoning, prompt injection via external data, trust bypass, supply chain attacks, and cross-tenant exposure — with real-world examples from WhatsApp MCP, GitHub MCP, Cursor IDE (MCPoison), Anthropic's own Filesystem MCP Server and MCP Inspector. Maps findings to the OWASP Agentic Security Top 10 and provides a defense checklist for MCP server operators.

Agent Wars
technical Mar 13th, 2026

Your AI Agent Takes Three Minutes. Your Focus Takes Three Hours to Recover.

Developers are losing their concentration to a new enemy: the AI agent wait that's too long to ignore and too short to fill usefully. An HN thread with 450 comments has become a survival guide.

Agent Wars
technical Mar 13th, 2026

21 Reasons AI Agents Love Gleam

Dave Rapin built the third version of his curling club platform almost entirely with AI agents, and came away convinced that Gleam — a niche, statically typed functional language — produces faster results than JavaScript or Python for agentic workflows. The reason is counterintuitive: agents write worse Gleam, but the compiler's precise, synchronous error signals let them self-correct faster than runtime failures caught in production.

Agent Wars
technical Mar 13th, 2026

What Suno Prompting Gets Right About Agent Pipeline Design

A developer's blog post on Suno prompting surfaces a principle worth taking seriously for agent builders: signal-dense tokens drawn from a model's training distribution consistently outperform descriptive natural language. The argument generalises across modalities — code, image, and audio — pointing to a consistent question for any node in a multi-modal agent pipeline. The evidence base is thin (assertions rather than tests), but the mental model is cleaner than most prompt engineering advice.

Agent Wars
technical Mar 13th, 2026

Synthetic Grid Sequences Outperform 10× Natural Language Data in LLM Pre-Training

A new technique lets language models learn faster and reason better by training first on sequences generated from abstract grid simulations — with 164 million synthetic tokens matching the effect of 1.6 billion words of real text. The sequences contain no language at all, which forces attention circuits to discover structural patterns rather than lean on the semantic shortcuts baked into internet data. Two MIT researchers say the gains carry through to math, code, and general reasoning benchmarks relevant to agent systems.

Agent Wars
technical Mar 13th, 2026

Claude.ai's Generative UI Reverse-Engineered and Rebuilt for the Pi Coding Agent

A developer reverse-engineered Claude.ai's generative UI system — extracting 72KB of Anthropic's internal design guidelines through the platform's own conversation export feature — and rebuilt it as an extension for Pi, the open-source coding agent. The result streams live interactive HTML widgets into a native macOS window using the same design rules Anthropic applies internally.

Agent Wars
technical Mar 13th, 2026

OpenAI Built a Coding Agent. Then Its Own Engineers Started Depending on It.

An internal OpenAI document offers a rare look at how Codex is actually being used across engineering teams — from debugging on-call incidents to autonomously opening pull requests. The details are more candid than a typical product announcement.

Agent Wars
technical Mar 13th, 2026

'Plumbing' Bets That Category Theory Can Fix LLM Orchestration

Guest post by William Waites introducing 'plumbing,' a statically typed language for coordinating LLM agents grounded in symmetric monoidal category theory. The language enables compile-time verification of multi-agent graph compositions — checking well-formedness, deadlocks, and structural guarantees before any LLM calls are made. Built with a working compiler and runtime, it targets the cost and reliability failures of ad hoc orchestration frameworks like LangGraph, CrewAI, and n8n, with examples including adversarial document composition and a multi-agent debate ensemble with runtime temperature modulation via control ports.

Agent Wars
vc funding Mar 13th, 2026

$6T in Gulf capital is looking for the exit

Three of the four major Gulf sovereign wealth funds have opened internal legal reviews to invoke force majeure on their US and international investment commitments amid the Iran war — a step with no modern precedent. The funds collectively manage $6 trillion, more than 40% of all sovereign wealth fund capital globally, and are anchor investors in Stargate and co-investment vehicles backed by BlackRock, Brookfield, Microsoft, and Google. The mechanics of project finance mean a capital suspension doesn't just delay AI infrastructure deals — it kills them outright, since interconnection queues, PPA windows, and chip orders can't simply be paused and restarted. Even if force majeure is never formally invoked, the public disclosure of these reviews has already permanently repriced Gulf capital risk.

Agent Wars
product launch Mar 13th, 2026

Microsoft Bets on Hospital EHR Access to Differentiate Copilot Health

Microsoft has launched Copilot Health, a dedicated, secure space within its Copilot AI assistant that aggregates personal health data — including wearable device metrics, electronic health records from 50,000+ US hospitals via HealthEx, and lab results via Function — to deliver personalized health insights. The product leverages MAI-DxO (Microsoft AI Diagnostic Orchestrator), a diagnostic AI system designed to combine general physician knowledge with specialist depth, aiming toward 'medical superintelligence'. Copilot Health is launching in US English via a waitlist, informed by a panel of 230+ physicians across 24 countries. It is explicitly not a diagnostic tool.

Agent Wars
technical Mar 13th, 2026

The Claude Code plugin teaching developers to learn, not just ship

Learning-Opportunities is a Claude Code plugin by psychological scientist Dr. Cat Hicks that uses evidence-based learning science techniques — retrieval practice, spaced repetition, generation effect, and metacognition — to help developers build genuine expertise while doing AI-assisted coding. After completing significant architectural work (new files, schema changes, refactors), Claude offers optional 10-15 minute interactive exercises. The tool directly addresses the risk that AI coding tools erode developer skills through passive code acceptance, fluency illusions, and machine-velocity cramming, and ships alongside companion skills for goal-setting and repo comprehension.

Agent Wars
technical Mar 13th, 2026

Jamdesk Ships AI-Powered Screenshot Redaction Skill for 30+ Coding Assistants

Jamdesk has published an open-source 'blur-image' skill that chains AI vision models with ImageMagick to automatically detect and redact sensitive content in screenshots—API keys, credentials, emails, tokens. Described as compatible with over 30 coding assistants, the tool runs a five-phase pipeline: ImageMagick preflight, AI-based region detection with pixel-coordinate extraction, user confirmation, blur execution, and output verification. A key security caveat: low-sigma Gaussian blur can be partially reversed on high-contrast terminal text, leading the author to recommend sigma 20 or higher, or solid black fill for maximum irreversibility.

Agent Wars
technical Mar 13th, 2026

FixMyImage Bundles 70 AI Editing Tools Into a Free Browser App

FixMyImage launched as a free browser-based image editor with over 70 AI-powered tools. It has no agentic or LLM capabilities, but its existence says something useful about where the AI market has landed.

Agent Wars
technical Mar 13th, 2026

Ch4p wants to be the agent runtime security teams don't hate

A developer going by @vec0zy has announced Ch4p, a stealth-stage agent runtime built around security as a foundational design principle. With enterprise AI deployments repeatedly stalling over security sign-off, Ch4p is betting on a gap between how existing runtimes were architected and what compliance teams actually need.

Agent Wars
technical Mar 13th, 2026

CostRouter bets that most AI tasks don't need a frontier model

CostRouter is a model routing tool that automatically directs API requests to the cheapest model capable of handling a given task, claiming cost reductions of up to 60%. It sits between an application and its LLM providers, selecting models dynamically based on task complexity and capability requirements.

Agent Wars
technical Mar 13th, 2026

DarkMatter Wants to Give AI Agents Networking Primitives of Their Own

LoseyLabs quietly shipped something interesting this week: a peer-to-peer mesh layer that lets AI agents find each other and communicate without routing through any central server. It installs in seconds, uses cryptographic identities instead of accounts, and registers itself automatically into every major agentic coding environment. The project is early, but it addresses a real gap — most multi-agent setups still depend on brokers that add latency, cost, and a single point of failure.

Agent Wars
technical Mar 13th, 2026

Percepta Claims Exponential Inference Speedups by Executing Programs Inside Transformers

A technical exploration by Christos Tzamos and researchers at Percepta examining whether large language models can function as computational systems, focusing on executing programs inside transformers to achieve exponentially faster inference.

Agent Wars
technical Mar 13th, 2026

Vibecoding is attracting real money. Is it producing real software?

A Hacker News thread this week revived the debate over whether vibecoding — the AI-assisted development philosophy Andrej Karpathy popularised in early 2025 — has produced anything worth shipping or backing. The community is split, investors are mostly betting on the tooling layer, and the harder question of whether vibe-coded products can build real moats remains open.

Agent Wars
technical Mar 13th, 2026

ATLAS: Self-improving trading agents via Karpathy-style autoresearch

ATLAS is an open-source multi-agent trading framework by General Intelligence Capital that applies Karpathy's autoresearch pattern to financial markets. 25 specialized agents operate across 4 layers — macro, sector, superinvestor philosophy, and decision — with agent prompts treated as weights optimized via a Darwinian selection loop using rolling Sharpe ratio as the loss function. The worst-performing agent gets its prompt rewritten every 5 trading days; improvements are committed, failures reverted via git. Built on Claude Sonnet, the system claims +22% return over 173 deployment days, though the firm withholds the trained prompts themselves — its core IP — from the public release.

Agent Wars
technical Mar 13th, 2026

Tiiny AI's Crowdfunded 'Pocket Supercomputer' Makes Big Claims With Few Specs to Back Them

Tiiny AI launched a Kickstarter on March 11 for the Tiiny Pocket Lab, a pocket-sized device it calls the world's first pocket-size AI supercomputer. Priced at $1,299 for early backers, it promises local AI inference with no subscription or token fees. Hardware specifications remain undisclosed and the 'world's first' claim is unverified. Delivery is estimated for August 2026 across eight markets.

Agent Wars
technical Mar 13th, 2026

xAI and SpaceX Poach Two Cursor Leaders

Two senior leaders at Cursor, the AI code editor built by Anysphere, have left for xAI and SpaceX, according to a person familiar with the matter. Neither company has confirmed the hires and the individuals have not been publicly named.

Agent Wars
technical Mar 13th, 2026

The GPU Idle Problem: Lessons from 16 Open-Source RL Libraries

A deep technical survey by Hugging Face researchers comparing 16 open-source reinforcement learning libraries for LLM post-training, motivated by the design of TRL's upcoming async trainer. The core problem: synchronous RL training leaves GPUs idle during autoregressive generation — 32K-token rollouts on a 32B model can take hours. The solution most of the ecosystem has landed on is disaggregated inference and training on separate GPU pools connected by a rollout buffer with async weight sync. Libraries are compared across 7 axes: orchestration primitives, rollout buffer design, weight sync protocols, staleness management, partial rollout handling, LoRA support, and distributed backends. Key findings: Ray dominates orchestration (8/16 libraries), NCCL broadcast is the default weight transfer method, LoRA support is sparse, and distributed MoE support is the emerging differentiator. The survey rounds out by examining agentic RL workloads, process rewards, multi-agent co-evolution, and distillation, showing that each reduces to the same async coordination challenge.

Agent Wars
product launch Mar 13th, 2026

Past the LIMIT: Mixedbread's Omnimodal Wholembed v3 Is the First Semantic Model to Beat BM25

Mixedbread has released Wholembed v3, a unified omnimodal multilingual late-interaction retrieval model built for agentic AI applications. It sets a new state-of-the-art on the LIMIT benchmark — becoming the first semantic model to outperform BM25 lexical retrieval — and on BrowseComp-Plus, a deep research agent benchmark with 830 complex multi-step queries. It outperforms OpenAI Text Embedding 3 Large, Cohere Embed 4, Voyage 4 Large, and Gemini Embedding 2 across recall metrics. The model supports text, images, audio, and video retrieval across hundreds of languages and is now the default model powering Mixedbread Search.

Agent Wars
technical Mar 13th, 2026

Social Craft AI: The LinkedIn Analytics Play Hiding Behind an AI Badge

Social Craft AI is a LinkedIn network analysis tool that surfaced on Hacker News pitching connectivity scoring and benchmarking. The AI branding is hard to verify — network analysis is a graph theory problem, not an LLM one — and the product leaves unanswered questions about how it actually accesses LinkedIn data.

Agent Wars
technical Mar 13th, 2026

Moltbook had 17,000 real users. Meta bought it anyway.

Commentary arguing that Meta's acquisition of Moltbook and OpenAI's hiring of OpenClaw creator Peter Steinberger expose a due diligence problem at the heart of the AI acquisition boom. Moltbook's claimed 1.4M user base collapses under scrutiny — Wiz researcher Gal Nagli registered 500K accounts himself via an open REST API and estimates only ~17K real users — while a misconfigured Supabase database granted full read/write access to all platform data. OpenClaw carries a critical RCE bug (CVE-2026-25253) via WebSocket token hijacking, insecure local secret storage, exposed localhost admin interfaces, and a skills marketplace where 12–20% of listings are malware. Safer alternatives — NanoClaw, TrustClaw, Carapace AI, The Colony, Clawstr, and 4Claw — already exist.

Agent Wars
technical Mar 13th, 2026

Golden Sets: Regression Engineering for Probabilistic Systems

Most AI teams treat evals as a post-release report. Ryan Setter argues they should function as a release gate — and has published a detailed framework for building the infrastructure to make that stick, using curated test case collections with versioned rubrics, multi-metric thresholds, and deterministic assertions he calls 'golden sets'.

Agent Wars
opinion Mar 13th, 2026

You can't escape coordination costs by throwing more AI agents at a problem

A technical reflection from ChatBotKit applying classical distributed systems theory — Amdahl's Law, the Universal Scalability Law, FLP impossibility, and CAP theorem — to multi-agent AI systems. Argues that the same mathematical limits on parallelism and coordination apply to LLM agent swarms, and that the constants are actually worse due to natural language communication overhead. Advocates for orchestrator-based tree architectures with low coupling over all-to-all shared-context swarms, drawing parallels to Conway's Law and human organizational design.

Agent Wars
technical Mar 13th, 2026

Stripping Code Comments Improved Agent Performance — For One Model, Anyway

Antimemetic AI set out to show that code comments help AI agents. They found the opposite — at least for GPT-5-mini, where removing comments from SWE-bench Verified tasks improved pass rates across all reasoning levels. GPT-5.2 was unaffected. The study introduces 'memetic attraction' to describe how comments can pull agent attention in the wrong direction, and proposes 'codebase alignment' as a framework for deliberately engineering documentation to shape — or defend against — AI agent behavior.

Agent Wars
technical Mar 13th, 2026

How AI Took Over Neuroscience's Biggest Conference in 22 Years

A researcher scraped 22 years of CoSyNe (Computational and Systems Neuroscience) conference programs and found AI/ML terminology has quintupled in frequency since 2004. The data maps four eras in computational neuroscience history, ending in a NeuroAI period where transformers and foundation models are now standard conference vocabulary — and where Anthropic is sponsoring the 2026 keynote and a Google DeepMind researcher chairs the program.

Agent Wars
technical Mar 13th, 2026

Codelegate Wants to Be the Air Traffic Controller for Your AI Coding Agents

A new open-source tool for Mac and Linux lets developers run Claude Code, Codex CLI, and other AI coding agents simultaneously on the same repository, using Git worktrees to keep sessions from stepping on each other.

Agent Wars
opinion Mar 13th, 2026

The Judgment-Volume Inversion: Why AI Coding Agents Amplify Bad Engineering

Software engineer Michael Timbs has a name for what's quietly happening in AI-accelerated codebases: the judgment-volume inversion. Coding agents don't correct poor engineering instincts — they amplify them at scale.

Agent Wars
technical Mar 13th, 2026

How Brex Tests Its AI Audit Agent: By Committing Simulated Fraud

Brex has published one of the more rigorous public accounts of agentic reliability engineering to date: a simulation framework that builds a synthetic company, scripts realistic fraudsters, and wires adversarial scenario tests directly into the pull request pipeline. Authored by engineering lead Rohit Mehta, it's a detailed answer to a problem most AI teams in consequential domains are still pretending doesn't exist.

Agent Wars
technical Mar 13th, 2026

Ask HN: Which open-source coding model comes closest to Claude Opus 4.6?

A Hacker News thread asks developers to name the best open-source LLM for coding work compared to Claude Opus 4.6. The replies cover benchmark results, hands-on experience with agentic tools, and the practical trade-offs of self-hosting versus paying for a closed API.

Agent Wars
technical Mar 13th, 2026

LLM Agents Build GPU-Accelerated RL Environments for Under $10

A 22,320x speedup. That's what Princeton researchers got when they pointed LLM coding agents at Pokémon Showdown's TypeScript codebase and asked them to produce a GPU-parallel JAX simulator. Their new paper — deliberately written so an AI could reproduce every result from the manuscript — argues that months of specialized engineering work can now be automated for less than $10 in API costs.