Can you identify these AI pictures? Google's General Release of Imagen2
4 min readGoogle has been sharing some great news lately.
First up, they launched their impressive Gemini AI less than a month ago. The demos they showed at the press event really wowed everyone. And that's not all: they've made the first version of Gemini Pro available through the Gemini API, introduced Imagen 2, and started MedLM, a set of tools designed for healthcare. It's been a busy and exciting time for Google in AI.
Among these updates, the text-to-image tool Imagen 2 is getting a lot of attention. Before this, Google had released its first version called Imagen, a text-to-image diffusion model. The images it created were quite fascinating. Now with the arrival of Imagen2, as they said on the official X: "Imagen 2 is our most advanced text-to-image diffusion technology, with high-quality, realistic output and greater consistency with user prompts."
Developers and cloud customers can use Imagen 2 through the Imagen API in Google Cloud Vertex AI.
Prompt: oil painting, an orange on a chopping board. The light passes through the orange section, casting an orange glow on the cutting board. There is a blue and white cloth in the background. Caustics, reflected light, expressive brushstrokes.
!["Prompt: Oil painting, an orange on a chopping board... (pandaron.com)"] (https://static.pandaron.com/sized/prompting_oil_painting_pandaron_1000x1000.jpeg "Prompt: Oil painting, an orange on a chopping board... (pandaron.com)")
In order to create high quality yet more accurate images that are more consistent with user prompts, Google DeepMind has made some changes in the Imagen 2 training data set. They have added more detailed descriptions to the image descriptions/captions, so that Imagen 2 can Learn different descriptions and generalize them to better understand user prompts. The enhanced image-description pairs help Imagen 2 better understand the relationship between images and text, resulting in enhanced understanding of context and nuance.
Check this out. Prompt: The robin flies from the swaying ivy to the top of the wall, opens its beak, and sings a loud, lovely trill, just to show off. There's nothing cuter in the world than a robin when it's showing off. - They almost always do. ("The Secret Garden" by Frances Hodgson Burnett)
The team at Google have trained a specialized image aesthetics model based on human preferences for lighting, framing, exposure, clarity and other qualities. Each image is given an aesthetic score, which helps tune Imagen 2 to give more weight to images in the training dataset that match human preferences. This technology improves Imagen 2's ability to produce high-quality images. Imagen 2 can even render text within the images.
It can also design logos for various businesses, brands or products.
There are so many other features. Just listing a few:
Imagen 2 supports image editing functions such as inpainting and outpainting. By providing a reference image and an image mask, users can use inpainting techniques to generate new content directly in the original image, or use expanding techniques to extend the original image beyond its boundaries. Google Cloud’s Vertex AI plans to adopt the technology in the new year.
Imagen 2 integrates with SynthID, a cutting-edge toolkit for watermarking and identifying AI-generated content, allowing Google Cloud customers to add imperceptible digital watermarks directly into image pixels without compromising image quality. This allows SynthID to detect watermarks even after modifications such as filters, cropping, or lossy compression have been applied.
On the security and user protection front, what we have learned from the product is that, the research team behind goes through extensive safety checks before any new features release. There's also technical barriers set up to prevent the creation of any inappropriate content, like violence, offensive material, or anything explicit. This includes careful monitoring of the training data, the prompts given to the system, and the output it produces. For instance, they use detailed security filters to prevent the generation of sensitive content, such as images of specific individuals.