An essay from ergosphere.blog poses an uncomfortable thought experiment. Imagine two PhD students, Alice and Bob, both producing publishable papers in their first year. Alice learned the hard way, reading papers with pencil in hand, debugging her own code, slowly building mental models. Bob used AI agents to summarize papers, debug code, and write results. Same output. Same metrics. Fundamentally different scientists.

Blame the incentive structures, not the technology. Academia counts what can be counted and ignores what can't. David Hogg, an astrophysicist who wrote a white paper on this topic, argues that in fields like astrophysics, people should be ends rather than means. The research output barely matters practically. What matters is producing people who can think about hard problems. Bob's approach skips that step entirely.

Matthew Schwartz ran an experiment supervising Claude through a physics calculation and produced a publishable paper in two weeks. He concluded Claude operates at a second-year graduate student level. But the real insight is buried deeper. The supervision itself turned out to be the bottleneck. Without an expert guiding it, the model produces confident nonsense. With expert oversight, it accelerates work for someone who already understands the domain. The tool only works for people who don't need it.

The Hacker News discussion pushed back in useful ways. One commenter pointed out that Claude Code is basically a patient colleague who never gets tired of explaining things. If institutions want different outcomes, they need different teaching methods, not technology bans. Another noted the uncomfortable economic truth. If the market stops valuing traditional research skills, Bob's approach isn't wrong. It's pragmatic. And that's exactly what makes the drift so comfortable, especially when considering AI agents fail in production when codebases aren't built for them.