OpenAI unveiled GLIDE (Guided Language-to-Image Diffusion for Generation and Editing). This diffusion model achieves performance comparable to DALL-E despite utilizing only one-third of the parameters.
While most visuals can be described in words, producing images from text inputs necessitates specific skills and many hours of work. Enabling an AI agent to automatically make photorealistic pictures from natural language prompts gives people unparalleled ease in creating rich and diverse visual material and allows for simpler iterative refinement and fine-grained management of the created images.
In addition to producing images from text, GLIDE may be used to change existing images by using natural language text prompts to insert new objects, add shadows and reflections, conduct image inpainting, and so on. It can also convert basic line drawings into photorealistic photos, and it has powerful zero-sample production and repair capabilities for complicated circumstances.
Human assessors favored GLIDE’s output images to DALL- E’s, even though it is a considerably smaller model as it is 3.5 billion vs. DALL-E’s 12 billion parameters. Further, GLIDE needs more minor sampling delay and does not require CLIP reordering.