Where did you think the training data was coming from?

The public uproar over Meta's Ray-Ban smart glasses funnelling footage to Facebook servers has been loud. Ibrahim Diallo, a software engineer, is less than shocked. In a piece published this week, he argues that the data collection behind modern AI has been sitting in plain sight for years — buried in the terms of service that billions of users click past without reading. Microsoft, Google, Apple, and Meta have all built harvesting infrastructure into products their customers pay for. AI training is a growing part of what that data feeds.

His best evidence is a quote from Yann LeCun — Meta's former Chief AI Scientist — casually describing how the company trained large convolutional neural networks on "literally billions of images" scraped from Instagram to predict hashtags, then reused those models elsewhere. LeCun said this roughly seven years before the Ray-Ban story broke. Large-scale collection of user-generated content is not a recent edge case or a policy slip; it has been the working method of frontier AI development since the deep learning boom began. The glasses aren't an aberration. They're a consequence of something that has been running at industrial scale for a decade.

Diallo's broader point is that the outrage misreads the situation. The collection was always there. What changes with wearables is that it moves off the screen and into physical space — conversations, faces, rooms. His conclusion is blunt: any internet-connected device you do not physically control should be assumed to be collecting data. The Ray-Ban story is not a warning about what might happen. It's a demonstration of what already has.