Bruce Kim and Indraneel Patil moved in together and decided to skip buying a robot vacuum. Instead, they built one for about $300. The robot uses a single camera for navigation, streaming frames to a laptop where a simple CNN decides what to do next. Five discrete actions: forward, reverse, turn left, turn right, and stop. The whole project took roughly four months of evenings and weekends.

The results are refreshingly honest. The robot can't perceive depth. It oscillates in kitchens, stuck switching between forward and reverse. Sometimes it stops when there's clearly open space ahead. Sometimes it drives straight into obstacles. Kim and Patil collected about 600 image-action pairs through teleoperation, then inflated that to roughly 30,000 with data augmentation. Validation loss never converged. ImageNet pre-training didn't help either. As Patil writes in his project writeup, the dataset simply doesn't contain enough signal for the network to learn reliable navigation.

Hacker News commenters quickly identified the core limitation. A CNN operating on single frames can detect nearby obstacles, but it can't build a map or plan a cleaning path. Commenter mgschwan pointed to ORBSLammer_LocalizationService, an ORB-SLAM3 wrapper that provides real-time visual SLAM with pose streaming and map management through a simple HTTP API. That's the gap between a cool demo and something that actually cleans your floors.

The compute constraint is real. On-board AI processing at this price is brutal. A Raspberry Pi 5 with camera accessories eats 30 to 50 percent of a $300 budget before you've bought motors or a chassis. The ESP32-CAM is cheap but can't run standard navigation networks without aggressive quantization that degrades accuracy. Commercial robots solve this with custom silicon produced at scale. Hobbyists get stuck choosing between off-board processing or dumb bump sensors. Kim and Patil chose off-board processing. It's educational and fun. Just don't expect it to clean your house without supervision.