Google has shipped a new image-editing capability in Google Photos that treats a photo as a 3D scene and repositions the virtual camera within that scene, generating previously hidden content to fill the gaps. The feature, called Auto frame, is live now and was built by a collaboration between Google DeepMind and Google’s Platforms and Devices team. According to the research post, the key distinction from classical editing is that cropping and zooming leave parallax relationships unchanged — this method actually moves the viewpoint.
The research post frames the motivation as compositions that were almost right — a face cut off at the edge, a portrait distorted by a wide-angle selfie lens, or a subject awkwardly positioned in frame. The post says traditional tools cannot address these problems because they do not change the original camera position, whereas Auto frame moves the virtual viewpoint.
Two-stage pipeline: 3D reconstruction then generative fill
The method runs in two stages. First, the system estimates a 3D point map of the scene using an internal model specifically configured to reconstruct human bodies and faces accurately. For every pixel in the original image, it estimates a 3D point representing the visible surface, along with an approximation of the original camera’s focal length. With that geometric understanding, classical 3D rendering is used to generate what the scene would look like from the new camera position, including changes to both camera pose — position and orientation — and focal length.
The rendering step alone is not sufficient. Moving the virtual camera exposes parts of the background that the original lens never captured, leaving holes in the re-rendered image. A generative latent diffusion model fills those gaps. The post describes it as trained on an internal dataset of image pairs with known camera parameters: during training, the model learns to reconstruct one image from a re-rendered version of the other. At inference, classifier guidance with regional scaling is used to preserve original content faithfully while allowing the model to generate plausible fill for the newly visible areas.
The post describes this as distinct from both crop-and-zoom tools and unconstrained generative editors.
Automatic portrait correction
Auto frame does not require manual control of camera position. Instead, ML models detect the position and 3D orientation of faces belonging to the main subjects in the photo. Combined with the 3D point map, that semantic information drives the system to compute camera parameters for what the post describes as “ideal framing” automatically.
A separate problem it addresses is wide-angle distortion. Front cameras on phones often use wide-angle lenses, and the resulting perspective distortion makes features closest to the lens appear unnaturally large. The system detects these distortions and adjusts the virtual camera’s intrinsics — its internal optical parameters — to restore more natural proportions. The post describes this as effectively “stepping back” from the subject after the fact, without requiring the user to physically move or reshoot.
The post describes this combination of composition reframing and wide-angle correction as particularly useful for portraits and front-camera photos.
Integration in Google Photos
The feature ships as part of the existing Auto frame UI in Google Photos. It processes eligible photos that contain people and offers the re-composed version as the second rendition option within the Auto frame candidates. The post says it is a “single-action improvement to the photo” — the user does not need to set camera parameters or interact with any 3D controls. The system selects the new viewpoint automatically.
The post describes identity preservation as a design requirement, stating the 3D point map model was “specifically configured to faithfully reconstruct human bodies and faces to limit reconstruction artifacts that would potentially harm identity preservation.” The classifier guidance step at inference applies regional scaling to preserve original content while allowing the generative model to fill newly visible areas.
Key contributors listed in the post include Thiemo Alldieck, Marcos Seefelder, Hannah Woods, Pedro Velez, and Michael Milne, among others.