Offline LLMs on a 10-Hour Flight: What Actually Works

Dmitri Lerko spent his ten-hour flight from London to Las Vegas running local LLMs on a MacBook Pro M5 Max. No wifi, no cloud APIs. Using Gemma 4 31B and Qwen 4.6 36B through LM Studio, he built a billing analytics tool for loveholidays covering two years of cloud spend data and processed roughly 4 million tokens on side tasks. For tight-scope refactoring and scaffolding, Lerko says the output matched what he gets from frontier cloud models.

The physical constraints are real. Sustained load drained 1% of battery per minute. The chassis hit 70 to 80W and got hot enough to be uncomfortable. Context quality degraded past 100k tokens. Several prompts triggered infinite loops requiring manual intervention. Lerko also accidentally packed an iPhone cable that delivered only 60W instead of the MacBook cable's 94W, so the battery drained even while plugged into seat power. He built a CLI called powermonitor to track power telemetry after catching the discrepancy.

A second developer reported similar findings on a seven-hour flight: 95% accuracy with local models but four to five times slower than Claude Code. On Hacker News, commenters argued that economy tray table space matters more than compute constraints. Others reported the same infinite loop problems with Qwen and Gemma on comparable hardware.

Lerko ran proprietary loveholidays billing data through locally downloaded model weights on a personal laptop, with no DLP and no audit trail. When employees can run capable models offline on sensitive data, governance controls built around cloud APIs become irrelevant. Feeling the physical cost of each inference does build discipline, as Lerko points out. But security teams should recognize that local inference creates a blind spot their current tools weren't designed to see.