Slop or Not: Interactive Quiz to Distinguish AI-Generated vs. Human Writing

A new interactive quiz called Slop or Not (slop-or-not.space) is putting human pattern recognition to the test, challenging users to distinguish AI-generated text from authentic human writing pulled from Reddit, Hacker News, and Yelp. The game presents two side-by-side responses and asks players to identify the AI-written one, with three wrong answers ending the run. The tool's dataset pairs 16,000 real posts scraped from those platforms against AI-generated responses produced by six models across two providers — Anthropic and OpenAI — at three capability tiers, with length-matching applied to remove one of the most obvious distributional tells.

Discussion on Hacker News surfaced a surprisingly consistent set of heuristics among players who scored well. Telltale AI signals include the phrase "curious about," the grammatical construction "X not Y," excessive use of affirmatives like "genuinely" and "absolutely," and what commenters described as a "2017 millennial Instagram food review" register — phrases like "absolutely slaps" deployed with uncanny smoothness. On the human side, typos, "edit:" annotations, and irregular punctuation serve as strong authenticity signals. One commenter with a 19-game streak noted that AI prose tends toward relentless positivity and rhythmically consistent punctuation — a flatness that trained eyes can learn to spot. Counterintuitively, "medium" difficulty questions proved harder than "hard" ones, suggesting the hardest category contains more overt stylistic tells.

The tool's methodology has real strengths but also notable vulnerabilities. Because the creator generated the AI-side text directly, provenance on that half of each pair is confirmed by design — an approach aligned with the academic paper "Measuring AI Slop in Text" (arXiv:2509.19163, Shaib et al.), which uses the same controlled-generation strategy. The fragile assumption is on the human side: Reddit, HN, and Yelp posts from prior years are treated as ground-truth human writing, but there is no systematic mechanism to filter out posts that were themselves AI-generated or lightly edited from LLM output. Given that AI-generated content has proliferated on all three platforms since at least 2023, the human corpus likely carries unknown contamination. There is also a circularity problem: LLMs trained on large scrapes of these same communities may already sound platform-authentic not because they are mimicking a generic human style, but because they memorized the specific idioms of those communities.

Formal detection tools are faring no better. Research presented at the ACL 2025 GenAIDetect workshop found that leading commercial detectors degrade rapidly against models newer than their training distribution and remain vulnerable to basic evasion tactics. The Shaib et al. paper found inter-annotator agreement on binary "slop" labels to be strikingly poor, with kappa scores ranging from -0.15 to 0.29, confirming that even when generation source is known, human judges disagree substantially on what constitutes AI-ness. Commenters on Hacker News were already flagging an arms-race dynamic: as detection heuristics like "add typos or an edit: tag" become publicly known, adversarially coached AI outputs will begin to incorporate exactly those features, further eroding the intuitions the quiz is trying to build. The moment those heuristics go mainstream, they stop working.