Hobbyist finds 575 bugs in Python C-extensions using Claude Code

A hobbyist just used Claude Code to find 575 confirmed bugs across nearly a million lines of Python C-extension code. Daniel Diniz ran 13 specialized analysis agents in parallel, each targeting different bug classes like reference counting errors, GIL mishandling, and exception state corruption. The false positive rate landed around 10-15% after human review. Fixes have already merged into 14 projects, including Cython, Pillow, and regex.

The tool, called cext-review-toolkit, is an open-source Claude Code plugin that doesn't just spam maintainers with automated output. Diniz shares findings via private GitHub gists and asks each maintainer how they want to receive information. The approach varies by project: umbrella issues, direct PRs, or sometimes opting out entirely. When someone flags a false positive, he updates the agent prompts to avoid repeating that pattern. He listens. Guppy 3 maintainer YiFei Zhu fixed 24 of 30 reported issues, found additional bugs the tool missed, and provided feedback that sharpened the toolkit's accuracy.

Community reaction has been positive. Pillow maintainer Eric Soroos called it "one of the better sets of reports" he'd received but noted incomplete coverage, since he spotted similar bugs in related functions that went undetected. Maurycy Pawłowski-Wieroński raised a harder question: whether bugs only reproducible in pathological cases are worth maintainer attention at all. Diniz is now building tools for free-threaded Python extensions and CPython itself.