In July 2019, a team from MIT and IBM published GANpaint Studio, a system that can automatically generate and edit photographic images by working directly with the internal units of a generative adversarial network. The MIT CSAIL post describes the research and its intended applications.
What GANpaint Studio does
GANpaint Studio, available as an interactive online demo at the time of publication, allows a user to upload an image and modify multiple aspects of its appearance — changing the size of objects, adding new items such as trees and buildings, or adjusting other visual properties. The research was described by David Bau, a PhD student at MIT CSAIL and lead author of the related paper, as one of the first instances where computer scientists were able to “paint with the neurons” of a neural network.
The work was conducted as part of the MIT-IBM Watson AI Lab, directed by MIT professor Antonio Torralba, who is listed as overseeing the team.
The system identifies units inside a GAN that correlate with particular types of objects — trees, windows, doors — and tests whether activating or suppressing those units causes objects to appear or disappear. The team also identified units associated with visual errors, described in the post as artifacts, and worked to remove them to improve overall image quality.
The refusal property
One finding highlighted in the post is that the system appears to have learned implicit rules about object placement. The post quotes Torralba describing the behavior: “All drawing apps will follow user instructions, but ours might decide not to draw anything if the user commands to put an object in an impossible location. It’s a drawing tool with a strong personality, and it opens a window that allows us to understand how GANs learn to represent the visual world.”
The example given is that the system refuses to place a window in the sky. The post also notes that when asked to add doors to two different buildings in the same image, the system does not replicate identical doors — the outputs differ based on context.
How GANs work in this context
The post explains that GANs consist of two competing neural networks: a generator focused on producing realistic images, and a discriminator whose goal is to detect generated images. Each time the discriminator identifies a generated image, it exposes its internal reasoning, allowing the generator to improve. The competing dynamic drives both networks toward higher quality over training iterations.
Jaakko Lehtinen, an associate professor at Aalto University in Finland who was not involved in the project, is quoted in the post: “It’s truly mind-blowing to see how this work enables us to directly see that GANs actually learn something that’s beginning to look a bit like common sense. I see this ability as a crucial steppingstone to having autonomous systems that can actually function in the human world, which is infinite, complex and ever-changing.”
Applications and concerns
The post describes two intended application areas. For designers and artists, the system could enable quick adjustments to visual content. The post also suggests the approach could be adapted for video, allowing editors to quickly add objects to filmed scenes. A third application described is improving and debugging other GANs by identifying artifact units.
The team also acknowledged the potential for misuse. Co-author Jun-Yan Zhu, described as a postdoc at CSAIL, is quoted: “You need to know your opponent before you can defend against it. This understanding may potentially help us detect fake images more easily.”
Bau, as lead author, is quoted on the broader interpretability angle: “Right now, machine learning systems are these black boxes that we don’t always know how to improve, kind of like those old TV sets that you have to fix by hitting them on the side. This research suggests that, while it might be scary to open up the TV and take a look at all the wires, there’s going to be a lot of meaningful information in there.”
The research was conducted as part of the MIT-IBM Watson AI Lab.