A neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language.
DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.
They decided to name their model using a portmanteau of the artist Salvador Dalí and Pixar’s WALL·E.
