Malware authors have found a cheap way to blind AI security scanners: feed them something they are trained to refuse.
Samples tied to the Hades worm family pad the top of their files with a fake "policy" comment block stuffed with nuclear and biological weapons language, formatted to read like a system prompt the scanner must reject. A language model reading the file from the top trips its own safety refusal and stops, never reaching the credential-stealing payload further down. SentinelOne researcher John Scott-Railton, formerly of Citizen Lab, flagged the trick on 10 June; Socket has tracked it across a campaign now spanning hundreds of malicious npm and PyPI packages.
It is a clean illustration of a problem that grows as more of the security stack becomes a model. A guardrail tuned to refuse on dangerous content becomes an off switch an attacker can pull on demand, and the refusal that looks like caution is exactly what lets the malware through.