It combines multiple modalities such as text, semantic segmentation, sketch and style within a single GAN framework. This allows turning an artist’s vision into a high-quality AI-generated image.
With GauGAN2, the users can enter a brief phrase to quickly generate an image’s key features and theme. NVIDIA gives the example of a snow-capped mountain range, which can then be customised with sketches to make a specific mountain taller, add a couple of trees or add other customisations in the foreground, or clouds in the sky.
GAUGAN 2 is early in development at this point, and likely been trained only on a rather limited data set. Regardless, when it works, it offers a breathtaking snapshot of how AI technology could transform asset creation in movies in games in the years to come, with unique photorealistic landscapes and objects generated from just a few words of user input.

Discaimer: GauGAN2 doesn’t use GPT-3