Search
Close this search box.
Search
Close this search box.

Google’s Gecko benchmark identifies best AI image generator

Google’s Gecko benchmark identifies best AI image generator

Google’s DeepMind released Gecko, a new benchmark for comprehensively evaluating AI text-to-image (T2I) models.

Over the last two years, we’ve seen AI image generators like DALL-E and Midjourney become progressively better with each version release.

However, deciding which of the underlying models these platforms use is best has been largely subjective and difficult to benchmark.

To make a broad claim that one model is “better” than another isn’t so simple. Different models excel in various aspects of image generation. One may be good at text rendering while another may be better at object interaction.

A key challenge that T2I models face is to follow each detail in the prompt and have these accurately reflected in the generated image.

With Gecko, the DeepMind researchers have created a benchmark that evaluates the capabilities of T2I models similarly to how humans do.

Skill set

The researchers first defined a comprehensive dataset of skills relevant to T2I generation. These include spatial understanding, action recognition, text rendering, and others. They further broke these down into more specific sub-skills.

For example, under text rendering, sub-skills might include rendering different fonts, colors, or text sizes.

An LLM was then used to generate prompts to test the T2I model’s capability on a specific skill or sub-skill.

This enables the creators of a T2I model to pinpoint not only which skills are challenging, but at what level of complexity a skill becomes challenging for their model.

Google’s Gecko benchmark identifies best AI image generator
The Gecko benchmark framework uses a dataset of skills and subskills (a), human Likert scoring of image accuracy (b), LLM-generated queries for VQA analysis, and results in comprehensive metrics that correlate with human evaluations. Source: arXiv

Human vs Auto eval

Gecko also measures how accurately a T2I model follows all the details in a prompt. Again, an LLM was used to isolate key details in each input prompt and then generate a set of questions related to those details.

These questions could be both simple, direct questions about visible elements in the image (e.g., “Is there a cat in the image?”) and more complex questions that test understanding of the scene or the relationships between objects (e.g., “Is the cat sitting above the book?”).

A Visual Question Answering (VQA) model then analyzes the generated image and answers the questions to see how accurately the T2I model aligns its output image with an input prompt.

The researchers collected over 100,000 human annotations where the participants scored a generated image based on how aligned the image was to specific criteria.

The humans were asked to consider a specific aspect of the input prompt and score the image on a scale of 1 to 5 based on how well it aligned with the prompt.

Using the human-annotated evaluations as the gold standard, the researchers were able to confirm that their auto-eval metric “is better correlated with human ratings than existing metrics for our new dataset.”

The result is a benchmarking system capable of putting numbers to specific factors that make a generated image good or not.

Gecko essentially scores the output image in a way that closely aligns with how we intuitively decide whether or not we’re happy with the generated image.

So what is the best text-to-image model?

In their paper, the researchers concluded that Google’s Muse model beats Stable Diffusion 1.5 and SDXL on the Gecko benchmark. They may be biased but the numbers don’t lie.

The post Google’s Gecko benchmark identifies best AI image generator appeared first on DailyAI.

Vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident
Lexie Ayers
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

The most complete solution for web publishing

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Tags

Share this post:

Related Posts
Category
Lorem ipsum dolor sit amet, consectetur adipiscing elit eiusmod tempor ncididunt ut labore et dolore magna

This website uses cookies. By continuing to use this site, you accept our use of cookies.