Google has unveiled Imagen, a groundbreaking text-to-image diffusion model that pushes the boundaries of photorealism while demonstrating an advanced level of language understanding. According to the post by Google Research, Imagen combines the power of large transformer language models with the capabilities of diffusion models.
The results of which are new standards for generating high-fidelity images based on text input. Imagen leverages the effectiveness of generic large language models, such as T5, which are pretrained on text-only corpora. With this, it achieves superior image-text alignment and sample fidelity, surpassing previous methods.
What makes Imagen unique is its ability to generate photorealistic images by effectively encoding text using large frozen T5-XXL encoders. These encoders transform the input text into embeddings, which are then mapped to a 64×64 image using a conditional diffusion model. Imagen also incorporates text-conditional super-resolution diffusion models to upscale the image from 64×64 to 256×256 and eventually to an impressive 1024×1024 resolution.
In the simplest terms. According to the post by Google, “Imagen is an AI system that creates photorealistic images from input text“. According to their research, Imagen emphasizes the significance of scaling the size of pre-trained text encoders.
The team found that by increasing the size of the LLM in Imagen, the results show up with enhancements in image-text alignment, and the results are impressive as one can see by visiting the announcement page. There you’ll find a number of examples of Imagen’s work.
With the use of DrawBench, the Google Research team was also able to compare Imagen with other methods and models such as DALL-E 2. According to their report, human raters pointed to Imagen as providing a superior output.
In Google’s view, Imagen sets a new standard in text-to-image synthesis by seamlessly blending the power of large pre-trained language models and diffusion models. But as the company noted, there are limitations and societal impacts to consider.
Of course, their concerns center around misuse of the AI, so a public demo or the code has yet to be released. Another issue is the data Imagen utilized to train. Due to content found on the web, the team wants to place safeguards before releasing Imagen to the wider public.
It’s clear that concerns about the responsible use of AI is on the mind of not only Google but other major companies. It shows that AI’s rapid growth across industries is bringing with it pressures to be cautious with the release of new models.