Google quietly dropped something notable on the App Store: Gemma 4 running fully offline on iPhone. The AI Edge Gallery app now supports Google's latest model family with all inference happening on-device. No cloud, no data leaving your phone. Local AI just got more interesting.
The feature set goes beyond basic chat. Agent Skills let the model tap into Wikipedia, interactive maps, and custom skills loaded from GitHub. Coding agents typically rely on similar tool-use capabilities to complete complex tasks. Thinking Mode exposes the model's reasoning chain, which helps understand how it arrives at answers. There's also multimodal image analysis, real-time audio transcription, and Mobile Actions that control device functions using a fine-tuned FunctionGemma 270m model. You can toggle your flashlight or open maps through natural language, all processed locally.
The open-source community moved fast. Within days, developers used tools like Heretic to strip safety alignment from Gemma 4, creating uncensored versions that run on Apple Silicon. A gap emerges in App Store enforcement. Local AI apps ship as inference engines with sanctioned models, pass review, then users can load whatever weights they want post-installation. Apple's content policies weren't built for this architecture.
For privacy-sensitive environments like education or enterprise, the appeal is clear. Compliance requirements that make cloud AI risky become manageable when everything stays on hardware you control. The tradeoff is performance. Performance on device varies based on the model size and hardware capabilities, which is why smaller models often see better results.
These models run slower on phones than servers, and Apple's Neural Engine still has limits on what fits in memory. But for many use cases, that's acceptable.