Approximately a month after OpenAI announced DALL-E 2, its latest AI system for creating images from text, Google has confirmed its fullest intentions by using AI with its own text-to-image broadcast model, to this project is known by the name of Image. Google results are great and high quality results, maybe even terrifying, impressive.
This model, powered by Google, considerably outperforms the one created by OpenAI using the COCO dataset. Despite not having been trained using COCO, Image still performed well here as well. Image, generally speaking, shows to be far superior.
But how does it work?
Image works by taking natural language text input, with phrases as simple as "A Golden Retriever dog wearing a blue plaid beret and a turtleneck with red dots," and then uses an encoder to convert that input text into embeds. Subsequently, a 'conditional broadcast model' mapping the embedded text onto a small image. Image uses text conditional super resolution diffusion models to upsample the image from 64x64 to 256x256 to 1024x1024 successively. The result would be as follows:
Image has improved significantly in terms of flexibility and results. AI is making rapid progress. Until now, Google's exercises have been with objects, animals, flowers and others; but how does it work with humans?
So far, we don't know how Image handles these text strings because Google has chosen not to show any people. There are ethical challenges with text-to-image research. If it is conceivable that a model can create almost any image from text, how good is a model at presenting unbiased results? AI models like Image are largely trained using data sets pulled from the web. Content on the Internet is biased and biased in ways that we are still trying to fully understand.
For now, you cannot access Image by yourself. On its website, Google allows you to click on specific words in a selected group to see results, such as "a photo of a furry panda in a cowboy hat and black leather jacket playing a guitar on top of a mountain." . Early research also indicates that Image reflects cultural biases through its description of certain elements and events.
Meanwhile, AI research teams grapple with the social and moral implications of their extremely impressive work. Lastly, Image is not publicly available, and neither is its code. However, you can learn a lot about the project in a new research paper.