Agents
Models
Companies
News
Compare
About

METR

METR is a research organization focused on Model Evaluation & Threat Research that studies AI developer productivity and provides benchmark data on AI agent capabilities. The organization has conducted research identifying reward-hacking behavior in AI models.

Company Info

Founded: Unknown
Headquarters: Unknown
Founders
Employees: Unknown
Website: Unknown
Status: active

Funding

Unknown total

Related News

open-source ai-agents wearables

opinion Prediction Markets Were Built for the Wrong Species: AI Agents as the Next Liquidity Providers Mar 15th, 2026 computerfuture.me product launch Captain (YC W26) Launches Managed RAG Platform for Enterprise AI Agents Mar 14th, 2026 runcaptain.com opinion Statistical Analysis Finds LLM Code Quality Flat Since Early 2025 Mar 14th, 2026 entropicthoughts.com opinion AI Didn't Simplify Software Engineering: It Just Made Bad Engineering Easier Mar 14th, 2026 robenglander.com technical Percepta AI Shows Transformers Can Execute Programs Internally, With Attention That Scales Logarithmically Mar 14th, 2026 percepta.ai product launch Perplexity's 'Personal Computer' Targets Enterprise Knowledge Work with Bold ROI Claims Mar 14th, 2026 perplexity.ai product launch Autoresearch@home Wants Volunteers to Donate GPU Time for Distributed AI Research Mar 14th, 2026 ensue-network.ai technical Open Weights Isn't Open Training: The Painful Reality of Post-Training a 1T Parameter Model Mar 14th, 2026 workshoplabs.ai technical Claude/Codex Agents Get Evolutionary Database in Autoresearch Fork Mar 14th, 2026 github.com product launch Microsoft Copilot Health Centralizes Personal Medical Records Without HIPAA Compliance Mar 14th, 2026 reclaimthenet.org technical LA Gig Workers Are Training Humanoid Robots — and May Be Training Themselves Out of a Job Mar 14th, 2026 latimes.com technical Palantir Demos Show How the Military Could Use AI Chatbots to Generate War Plans Mar 14th, 2026 wired.com opinion Utilities and Hyperscalers Clash Over Who Absorbs AI's Soaring Electricity Costs Mar 14th, 2026 cnbc.com opinion Digg Shuts Down Over AI Bot Spam, Kevin Rose Returns to Rebuild Mar 14th, 2026 digg.com product launch Claudetop: Real-Time Token Cost Monitor for Claude Code Sessions Mar 14th, 2026 github.com technical WristPP: Wrist-Worn Camera System for Estimating 3D Hand Pose and Pressure in Real Time Mar 14th, 2026 arxiv.org opinion Please don't write about AI with AI Mar 14th, 2026 news.ycombinator.com technical Why ML Benchmarks Shouldn't Have Worked—and Why They Did Anyway Mar 14th, 2026 mlbenchmarks.org technical Meta Uses Generative AI Codemods to Bulk-Remediate Android Vulnerabilities Across Millions of Lines Mar 14th, 2026 engineering.fb.com opinion Amazon Employees Say AI Is Just Increasing Workload, Study Confirms Mar 14th, 2026 gizmodo.com opinion Redox OS Bans LLM-Generated Contributions as Open Source Governance Debate Heats Up Mar 14th, 2026 gitlab.redox-os.org opinion Polsia: Solo Founder Runs $3.5M Company With AI Agents, Zero Employees Mar 14th, 2026 polsia.com technical Andrej Karpathy Scores AI Exposure Across 342 US Occupations Using Gemini Flash Mar 14th, 2026 karpathy.ai technical METR Research: ~Half of SWE-bench-Passing AI PRs Would Be Rejected by Real Maintainers Mar 14th, 2026 metr.org technical Golden Sets: Regression Engineering for Probabilistic AI Systems Mar 14th, 2026 heavythoughtcloud.com technical A JavaScript MLP Built on Dual-Number Autodiff — and Why That's the Interesting Choice Mar 14th, 2026 github.com opinion Emacs and Vim in the Age of AI: Risks, Opportunities, and the Terminal-Native Advantage Mar 14th, 2026 batsov.com product launch Ink Launches Agent-Native Infrastructure Platform with MCP and Skills Integration Mar 14th, 2026 ml.ink product launch Pi-Autoresearch: Open-Source Autonomous Experiment Loop for LLM Training, Test Speed, and Lighthouse Scores Mar 14th, 2026 github.com technical Infinity Inc Claims to Surpass vLLM Performance with AI-Generated Inference Stack for Qwen3 Mar 14th, 2026 infinity.inc product launch Modulus Lets Developers Run Parallel AI Coding Agents Across Repos Without Manual Context Setup Mar 14th, 2026 modulus.so technical Longitudinal study finds AI tools boost developer productivity ~10%, not the hyped 2-3x Mar 14th, 2026 newsletter.getdx.com opinion Recursive Self-Improvement May Already Be Here, Says AGI Skeptic Mar 14th, 2026 hardlyworking1.substack.com opinion Hacker News Bans AI-Generated and AI-Edited Comments to Keep Discussion Human Mar 14th, 2026 news.ycombinator.com opinion Microsoft Copilot's Push Into Health Records Exposes a HIPAA Gray Zone Mar 14th, 2026 nytimes.com opinion Kevin Kelly: A Century of Dystopian AI Fiction Has Pre-Loaded Public Imagination Against the Technology Mar 14th, 2026 kevinkelly.substack.com opinion AI Addendum to the Agile Manifesto: Prioritizing Shared Understanding Over Shipping Speed Mar 14th, 2026 github.com opinion Meta's Ray-Ban Glasses Aren't a Privacy Breach — They're Business as Usual Mar 14th, 2026 idiallo.com product launch VibePod: Unified CLI for Running AI Coding Agents in Isolated Docker Containers Mar 14th, 2026 github.com product launch Iris: Open-Source MCP-Native Eval & Observability Tool for AI Agents Mar 14th, 2026 github.com opinion xAI in turmoil: Musk fires cofounders, parachutes Tesla/SpaceX fixers as coding product flails against Claude Code and Codex Mar 14th, 2026 arstechnica.com opinion John Carmack Pushes Back on Open Source Training Restrictions Mar 14th, 2026 twitter.com technical LoGeR: Google DeepMind & UC Berkeley Scale 3D Reconstruction to 19,000-Frame Videos Mar 14th, 2026 loger-project.github.io

[agentwars]

Tracking the rise of AI agents

© 2026 Agent Wars