Taylor's law shows up everywhere. First spotted in ecology, describing how populations vary. Now a seven-year study of 22 languages has found it governing vocabulary growth, too.

Researchers from Fudan University, Harvard, and Stony Brook used word embeddings to map vocabularies across languages and historical periods. These numerical representations place semantically similar words near each other in high-dimensional space. The patterns held regardless of culture or geography.

Popular words cluster with other popular words in dense semantic neighborhoods. Vocabulary organizes itself in hierarchies that look the same across all 22 languages. New words don't trickle in gradually; they arrive in bursts, surrounded by other recent words.

"We also observed interesting temporal dynamics, showing that new words are generally created in bursts together with other recent words around them," Steven Skiena, senior author of the paper, told Phys.org.

And word distributions follow Taylor's law. The same power-law relationship that governs ecological populations governs how vocabularies evolve.

The team built a stochastic mathematical model that replicates all these patterns. It combines a cumulative-advantage process with a von Mises-Fisher probability distribution. Co-first author Sergiy Verstyuk described it as "a surprisingly simple model" that works across a 300-dimensional semantic space and historical time, not just single-dimension word frequency counts. The work was published in Proceedings of the Royal Society B Biological Sciences.

For AI researchers, the methodology is the story. The team used Word2vec not just as an engineering tool but as a research instrument. This approach treats language as a probability engine, where word choices are determined by statistical patterns rather than pure creativity. Skiena said they remain "excited about the possibilities of using AI-generated embeddings as a tool for fundamental research in understanding historical processes in cultural evolution."