METR
METR is a research organization focused on Model Evaluation & Threat Research that studies AI developer productivity and provides benchmark data on AI agent capabilities. The organization has conducted research identifying reward-hacking behavior in AI models.
Company Info
- Founded
- Unknown
- Headquarters
- Unknown
- Founders
- Employees
- Unknown
- Website
- Unknown
- Status
- active
Funding
Unknown total
Related News
opinion Prediction Markets Were Built for the Wrong Species: AI Agents as the Next Liquidity Providers Mar 15th, 2026 computerfuture.meproduct launch Captain (YC W26) Launches Managed RAG Platform for Enterprise AI Agents Mar 14th, 2026 runcaptain.comopinion Statistical Analysis Finds LLM Code Quality Flat Since Early 2025 Mar 14th, 2026 entropicthoughts.comopinion AI Didn't Simplify Software Engineering: It Just Made Bad Engineering Easier Mar 14th, 2026 robenglander.comtechnical Percepta AI Shows Transformers Can Execute Programs Internally, With Attention That Scales Logarithmically Mar 14th, 2026 percepta.aiproduct launch Perplexity's 'Personal Computer' Targets Enterprise Knowledge Work with Bold ROI Claims Mar 14th, 2026 perplexity.aiproduct launch Autoresearch@home Wants Volunteers to Donate GPU Time for Distributed AI Research Mar 14th, 2026 ensue-network.aitechnical Open Weights Isn't Open Training: The Painful Reality of Post-Training a 1T Parameter Model Mar 14th, 2026 workshoplabs.aitechnical Claude/Codex Agents Get Evolutionary Database in Autoresearch Fork Mar 14th, 2026 github.comproduct launch Microsoft Copilot Health Centralizes Personal Medical Records Without HIPAA Compliance Mar 14th, 2026 reclaimthenet.orgtechnical LA Gig Workers Are Training Humanoid Robots — and May Be Training Themselves Out of a Job Mar 14th, 2026 latimes.comtechnical Palantir Demos Show How the Military Could Use AI Chatbots to Generate War Plans Mar 14th, 2026 wired.comopinion Utilities and Hyperscalers Clash Over Who Absorbs AI's Soaring Electricity Costs Mar 14th, 2026 cnbc.comopinion Digg Shuts Down Over AI Bot Spam, Kevin Rose Returns to Rebuild Mar 14th, 2026 digg.comproduct launch Claudetop: Real-Time Token Cost Monitor for Claude Code Sessions Mar 14th, 2026 github.comtechnical WristPP: Wrist-Worn Camera System for Estimating 3D Hand Pose and Pressure in Real Time Mar 14th, 2026 arxiv.orgopinion Please don't write about AI with AI Mar 14th, 2026 news.ycombinator.comtechnical Why ML Benchmarks Shouldn't Have Worked—and Why They Did Anyway Mar 14th, 2026 mlbenchmarks.orgtechnical Meta Uses Generative AI Codemods to Bulk-Remediate Android Vulnerabilities Across Millions of Lines Mar 14th, 2026 engineering.fb.comopinion Amazon Employees Say AI Is Just Increasing Workload, Study Confirms Mar 14th, 2026 gizmodo.comopinion Redox OS Bans LLM-Generated Contributions as Open Source Governance Debate Heats Up Mar 14th, 2026 gitlab.redox-os.orgopinion Polsia: Solo Founder Runs $3.5M Company With AI Agents, Zero Employees Mar 14th, 2026 polsia.comtechnical Andrej Karpathy Scores AI Exposure Across 342 US Occupations Using Gemini Flash Mar 14th, 2026 karpathy.aitechnical METR Research: ~Half of SWE-bench-Passing AI PRs Would Be Rejected by Real Maintainers Mar 14th, 2026 metr.orgtechnical Golden Sets: Regression Engineering for Probabilistic AI Systems Mar 14th, 2026 heavythoughtcloud.comtechnical A JavaScript MLP Built on Dual-Number Autodiff — and Why That's the Interesting Choice Mar 14th, 2026 github.comopinion Emacs and Vim in the Age of AI: Risks, Opportunities, and the Terminal-Native Advantage Mar 14th, 2026 batsov.comproduct launch Ink Launches Agent-Native Infrastructure Platform with MCP and Skills Integration Mar 14th, 2026 ml.inkproduct launch Pi-Autoresearch: Open-Source Autonomous Experiment Loop for LLM Training, Test Speed, and Lighthouse Scores Mar 14th, 2026 github.comtechnical Infinity Inc Claims to Surpass vLLM Performance with AI-Generated Inference Stack for Qwen3 Mar 14th, 2026 infinity.incproduct launch Modulus Lets Developers Run Parallel AI Coding Agents Across Repos Without Manual Context Setup Mar 14th, 2026 modulus.sotechnical Longitudinal study finds AI tools boost developer productivity ~10%, not the hyped 2-3x Mar 14th, 2026 newsletter.getdx.comopinion Recursive Self-Improvement May Already Be Here, Says AGI Skeptic Mar 14th, 2026 hardlyworking1.substack.comopinion Hacker News Bans AI-Generated and AI-Edited Comments to Keep Discussion Human Mar 14th, 2026 news.ycombinator.comopinion Microsoft Copilot's Push Into Health Records Exposes a HIPAA Gray Zone Mar 14th, 2026 nytimes.comopinion Kevin Kelly: A Century of Dystopian AI Fiction Has Pre-Loaded Public Imagination Against the Technology Mar 14th, 2026 kevinkelly.substack.comopinion AI Addendum to the Agile Manifesto: Prioritizing Shared Understanding Over Shipping Speed Mar 14th, 2026 github.comopinion Meta's Ray-Ban Glasses Aren't a Privacy Breach — They're Business as Usual Mar 14th, 2026 idiallo.comproduct launch VibePod: Unified CLI for Running AI Coding Agents in Isolated Docker Containers Mar 14th, 2026 github.comproduct launch Iris: Open-Source MCP-Native Eval & Observability Tool for AI Agents Mar 14th, 2026 github.comopinion xAI in turmoil: Musk fires cofounders, parachutes Tesla/SpaceX fixers as coding product flails against Claude Code and Codex Mar 14th, 2026 arstechnica.comopinion John Carmack Pushes Back on Open Source Training Restrictions Mar 14th, 2026 twitter.comtechnical LoGeR: Google DeepMind & UC Berkeley Scale 3D Reconstruction to 19,000-Frame Videos Mar 14th, 2026 loger-project.github.io