Stripping Code Comments Improved Agent Performance — For One Model, Anyway

Antimemetic AI's researchers didn't set out to prove that code comments hurt AI agents. They expected the opposite. When they stripped comments from SWE-bench Verified tasks and ran them through GPT-5-mini, pass rates went up — consistently, across all four reasoning levels. The result surprised them enough that it became the paper.

The effect didn't hold for GPT-5.2. That model showed no meaningful change either way, and when the researchers compared how the two models responded to the same comments, the correlation was nearly zero (r ≈ 0.04). They agreed on the direction of effect — helped or hurt by comments — only 55% of the time. Two models, same comments, essentially opposite experiences.

To explain what was happening to GPT-5-mini, the team coined the term 'memetic attraction' — the pull that comments exert on an agent's attention, sometimes toward the wrong things. They catalogued four failure modes: distraction, where comments drew focus away from the actual problem (67% of failures); anchoring, where descriptions of existing mechanisms blocked agents from considering simpler approaches (15%); editing complexity, where dense comment blocks made file manipulation harder (12%); and overgeneralization, where agents applied patterns from comments too broadly (6%). For GPT-5.2, the same comments tended to help rather than hurt — 'infoblessings' rather than 'infohazards,' though the effect wasn't statistically significant.

The repository-level results complicated the picture further. Removing comments from tasks drawn from the requests library produced the biggest gains. In matplotlib tasks, removing them made things worse. That split suggests the quality and style of individual comments matter more than their mere presence — a variable that's hard to control for and harder to generalize from.

An earlier experiment showed the limits of this kind of manipulation. When the team tried obfuscating variable and function names to test whether semantic information affected performance, agents immediately noticed and flagged it, breaking normal task engagement. Comments didn't trigger the same reaction. The researchers attribute that to how developers actually work: incomplete, outdated, or just misleading documentation is normal. Agents absorbed bad comments without question because there was no obvious signal that anything was wrong.

The team frames the broader project as 'codebase alignment' — deliberately crafting comments, names, and documentation to shape how AI agents engage with a codebase. The idea is that code's semantic layer is an alignment surface: it could be tuned to help well-intentioned agents work more effectively, or designed to resist malicious ones. Whether that framing survives contact with more adversarial testing remains to be seen, but as a starting point, the observation that a codebase's informational environment shapes agent cognition is a more interesting finding than the benchmarks alone suggest.