In the functioning of contemporary generative models, and in particular Generative Adversarial Networks (GANs), the dataset is a statistical structure that defines the model’s space of possibilities – not simply a collection of images. A GAN operates through the tension between two networks (a generator and a discriminator) that continuously interact: one produces synthetic images, the other evaluates how plausible they are compared to the original data. This iterative process does not lead to a semantic understanding of the world, but to a progressive refinement of the data distribution, ultimately producing outputs that appear realistic because they are statistically consistent. In other words, the model does not “create” images, it explores a latent space built from the data, a mathematical representation of their patterns and variations.
The latent space is a key element for understanding the behavior of these systems: it is a multidimensional space where each point corresponds to a possible image. During training, the model learns to map regions of this space to visually plausible configurations.
However, the geometry of this space is not neutral, it is directly shaped by the dataset’s distribution. If some features are overrepresented, they occupy larger and denser portions of the latent space; if others are rare or absent, they are difficult to generate or become distorted. Datasets like Flickr-Faces-HQ (FFHQ), while considered standard for face generation, incorporate unbalanced distributions reflecting specific cultural hierarchies, with a prevalence of white subjects conforming to dominant aesthetic standards. This leads the model to converge toward an implicit notion of the “average face,” reducing variability and normalizing difference. It is at this level that Jake Elwes’ work with Zizi – Queering the Dataset comes into play, a project that directly intervenes on the dataset as a design lever. Starting from a StyleGAN model trained on FFHQ, Elwes introduces a subset of images featuring drag performers and non-conforming identities. This operation, although limited in quantity, has a significant effect on the model’s structure: by altering the data distribution, it changes the geometry of the latent space and therefore the trajectories through which the model generates images.
The result is a visual output that resists convergence toward a stable identity. Generated faces appear in constant transformation, traversed by mutations, overlaps, and formal shifts. Features emerge and disappear, aesthetic elements recombine in nonlinear ways, and configurations resist classification. This behavior can be read as a direct effect of the perturbation introduced into the dataset: the model can no longer stabilize around a coherent average because the very notion of average has been destabilized.
From a technical standpoint, this operation could be described as a modification of the probability distributions that govern the model. However, what emerges is something more interesting: a different generative logic. Instead of reducing complexity to a set of recognizable categories, the system maintains an internal tension between forms, allowing hybrid and ambiguous configurations to emerge. This reveals an often-hidden aspect: models are not designed to represent reality, but to optimize a likelihood function. When this function is perturbed, unexpected possibilities arise.
Another central element is the role of error. In applied contexts error is something to minimize, here it becomes productive. Discontinuities, glitches, and unstable transitions are signals of its internal structure. By making these moments visible, the project allows the model to be read not only through its successes but also through its limits. Failure becomes a tool for analysis, a way to understand what the model has learned and what remains beyond its representational capacity.
This approach suggests a broader perspective on designing generative systems. If the model’s behavior depends on the data, then dataset design becomes a critical as well as technical practice. It’s about interrogating data implications: which distributions are we building, which forms are made more likely, and which are marginalized? The dataset is a device that structures the imaginary produced by the model. The implications are particularly relevant when considering that generative systems do not only produce images, but also help define regimes of visibility. Synthetic images circulate, are integrated into interfaces, platforms, and creative tools, and influence how reality is perceived and represented. Intervening in these systems therefore means intervening in the very conditions of representation. Elwes’ work shows that this can be done by introducing variations that challenge normalization logics.
Another important aspect is process transparency. By showing the direct relationship between dataset and output, the project helps dismantle the idea of artificial intelligence as a black box. Models emerge as what they are: constructed, modifiable architectures sensitive to starting conditions. This awareness opens up more reflective practices, where working on models includes the possibility of questioning and redefining their internal logics.
AI can be understood as an open design field, where data, models, and outputs are part of a single dynamic system. Intervening at one level influences the others, generating effects that are not always predictable but can be guided. Just as the language we use shapes the reality we perceive and consider “real,” the data defines the reality of the machine and what it can generate. Projects like Jake Elwes’ demonstrate that there is a concrete space for this kind of intervention: a space where technique is not separate from culture, and where working on models also becomes a practice of reshaping the imaginary.