Image GPT

In 2020, OpenAI released Image GPT (iGPT), a Transformer-based model that operates on sequences of pixels instead of sequences of text. OpenAI found that, just as GPT models for text could generate realistic samples of natural language, iGPT could “generate coherent image completions and samples,” given an input of initial pixels.