Automatic AI Image Generators to 12 Similar Image Styles, Research Findings

AI image generation models have a large set of visual data that you can draw on to create unique results. However, the researchers found that if the models were pushed to generate images based on a series of slowly changing directions, they would default to just a few visual cues, resulting in a generic style.
Research published in the journal Patterns took two AI graphics generators, Stable Diffusion XL and LLaVA, and tested them by playing a virtual telephone game. The game went like this: the Stable Diffusion XL model would be given a brief information and had to produce an image—for example, “As I was sitting mostly alone, surrounded by nature, I found an old book with exactly eight pages that told a story in a forgotten language waiting to be read and understood.” That image was presented to the LLaVA model, who was asked to explain it. That description was then fed back to Stable Diffusion, which was asked to create a new image based on that information. This went on for 100 rounds.
Like a game of human telephone, the first image was quickly lost. It’s no surprise there, especially if you’ve seen one of those time-lapse videos where people ask an AI model to reproduce an image without making any changes, only for the image to quickly transform into something unlike the original. What surprised the researchers, however, was that the models automatically turned out to be a handful of styles that looked familiar. In every 1,000 different iterations of the phone game, the researchers found that most sequences of images would end up falling into one of the top 12.
In most cases, the change happens gradually. A few times, it happened suddenly. But it almost always happens. And the researchers weren’t impressed. In the study, they called common image styles “visual elevator music,” basically the kind of images you’d see hanging in a hotel room. The most common scenes include things like seaside lighthouses, formal interiors, urban night settings, and rustic buildings.
Even when researchers switched to different models of imaging and interpretation, similar types of trends emerged. The researchers say that when the game is extended to 1,000 turns, style convergence still occurs around the 100th turn, but diversity comes from those extra chances. Interestingly, however, that variety still comes from one of the most popular visual trends.

So what does all that mean? Mainly AI is not particularly creative. In a person’s game of telephone, you will end up with a lot of variation because each message is delivered and heard differently, and each person has their own internal biases and preferences that may influence which message they receive. AI has the opposite problem. It doesn’t matter how different the original information is, it will always default to a small selection of styles.
Of course, the AI model draws on human-created instructions, so there’s something to be said about the data set and what humans are drawn to take pictures of. If there’s a lesson here, it’s probably that copying styles is a lot easier than teaching taste.



