Google researchers have deployed their Gemini large language model to process approximately 5 million news articles from around the world, extracting 2.6 million individual flood event reports to create a geo-tagged time series dataset called "Groundsource." The project, published publicly in March 2026, is the first time Google has used an LLM to transform qualitative journalistic text into a quantitative scientific dataset. Google Research product manager Gila Loike described it as a novel application of the technology. The Groundsource dataset has since been released as a free public download alongside the team's research findings.

Groundsource served as the real-world training baseline for a flash flood forecasting model built on a Long Short-Term Memory (LSTM) neural network architecture. The model ingests global weather forecast data and outputs probabilistic flash flood risk assessments at roughly 20-square-kilometer resolution, and is now live on Google's Flood Hub platform covering urban areas across 150 countries. Emergency response agencies worldwide are receiving its data directly; António José Beleza of the Southern African Development Community, which trialed the system, told TechCrunch it meaningfully accelerated his organization's ability to respond to flood events.

The project addresses a data infrastructure problem. Flash floods are too short-lived and geographically localized to be captured by conventional meteorological monitoring networks, leaving them poorly served by existing deep learning weather models. Google's approach is explicitly designed for regions lacking expensive radar infrastructure or extensive historical meteorological records. "Because we're aggregating millions of reports, the Groundsource dataset actually helps rebalance the map," said Juliet Rothenberg, program manager on Google's Resilience team. Rothenberg said the team is exploring whether the same methodology could be applied to other difficult-to-measure phenomena such as heat waves and mudslides.

Google enters a crowded field with significant incumbents. Commercial rivals including Tomorrow.io offer flood risk assessments at 2.5-kilometer resolution using proprietary satellite constellations, while ECMWF's GloFAS integrated AI forecasting into daily operations in September 2025. NOAA's FLASH system remains technically superior at 1-km resolution, but faces potential elimination under proposed FY2026 federal budget cuts. Marshall Moutenot, CEO of Upstream Tech and co-founder of data curation group dynamical.org, framed Google's contribution as part of a broader ecosystem effort: "Data scarcity is one of the most difficult challenges in geophysics. This was a really creative approach to get that data." By releasing Groundsource freely alongside its GRRR reanalysis dataset, Google is staking out open-data territory in geographies where commercial competitors see insufficient return to invest — a pattern that, at least in effect, echoes how NVIDIA and Microsoft have used open infrastructure to build platform leverage in adjacent AI markets.