Latent Space
A compressed, lower-dimensional representation space where generative models encode the essential features of high-dimensional data such as images.
A latent space is an abstract, lower-dimensional representation space into which high-dimensional data (images, text, audio) is compressed by a learned encoder. A 512x512 RGB image contains roughly 786,000 dimensions, but an autoencoder's bottleneck may compress it into a few hundred dimensions capturing essential structure.
A key property is that semantically similar data points map to nearby locations. In a face image latent space, attributes like smile intensity or age correspond to specific directions, enabling manipulation through vector arithmetic.
- VAE latent space: Variational Autoencoders regularize the latent distribution to approximate a standard normal, producing a smooth space where any sampled point decodes into a plausible output
- GAN latent space: A GAN generator maps noise vectors to realistic images. StyleGAN's
WandW+spaces offer superior disentanglement and editability - Latent diffusion: Stable Diffusion performs diffusion in a VAE's latent space rather than pixel space, reducing cost by orders of magnitude while maintaining quality
Understanding latent spaces is central to image generation, style transfer, semantic editing, and anomaly detection. t-SNE and UMAP are commonly used to visualize latent spaces in 2D for model analysis.