Google has shipped a new image-editing capability in Google Photos that treats a photo as a 3D scene and repositions the virtual camera within that scene, generating previously hidden content to fill the gaps. The feature, called Auto frame, is live now and was built by a collaboration between Google DeepMind and Google’s Platforms and Devices team. According to the research post, the key distinction from classical editing is that cropping and zooming leave parallax relationships unchanged — this method actually moves the viewpoint.
The motivation is familiar to anyone who browses their camera roll: shots that were almost right, where a slightly different angle would have fixed a composition. A face that is cut off at the edge, a portrait distorted by a wide-angle selfie lens, or a subject positioned awkwardly in frame. Traditional tools cannot address these problems because they do not change where the camera was when the photo was taken. Auto frame does.
Two-stage pipeline: 3D reconstruction then generative fill
The method runs in two stages. First, the system estimates a 3D point map of the scene using an internal model specifically configured to reconstruct human bodies and faces accurately. For every pixel in the original image, it estimates a 3D point representing the visible surface, along with an approximation of the original camera’s focal length. With that geometric understanding, classical 3D rendering is used to generate what the scene would look like from the new camera position, including changes to both camera pose — position and orientation — and focal length.
The rendering step alone is not sufficient. Moving the virtual camera exposes parts of the background that the original lens never captured, leaving holes in the re-rendered image. A generative latent diffusion model fills those gaps. The post describes it as trained on an internal dataset of image pairs with known camera parameters: during training, the model learns to reconstruct one image from a re-rendered version of the other. At inference, classifier guidance with regional scaling is used to preserve original content faithfully while allowing the model to generate plausible fill for the newly visible areas.
That combination — deterministic 3D rendering to move what was captured, generative inpainting to synthesize what was not — is what separates this from both pure crop-and-zoom tools and unconstrained generative editors.
Automatic portrait correction
Auto frame does not require manual control of camera position. Instead, ML models detect the position and 3D orientation of faces belonging to the main subjects in the photo. Combined with the 3D point map, that semantic information drives the system to compute camera parameters for what the post describes as “ideal framing” automatically.
A separate problem it addresses is wide-angle distortion. Front cameras on phones often use wide-angle lenses, and the resulting perspective distortion makes features closest to the lens appear unnaturally large. The system detects these distortions and adjusts the virtual camera’s intrinsics — its internal optical parameters — to restore more natural proportions. The post describes this as effectively “stepping back” from the subject after the fact, without requiring the user to physically move or reshoot.
For portraits specifically, this combination — reframing for better composition and correcting wide-angle distortion — makes the feature most directly useful for selfies and front-camera photos, where both problems are common.
Integration in Google Photos
The feature ships as part of the existing Auto frame UI in Google Photos. It processes eligible photos that contain people and offers the re-composed version as the second rendition option within the Auto frame candidates. The post says it is a “single-action improvement to the photo” — the user does not need to set camera parameters or interact with any 3D controls. The system selects the new viewpoint automatically.
The post emphasizes identity preservation as a design requirement. The 3D point map estimation model was specifically configured to reconstruct human bodies and faces with fidelity to limit reconstruction artifacts that could alter how a person looks. The classifier guidance step at inference reinforces this by constraining the generative model’s creative latitude in regions that contain original content.
The collaboration spanned Google DeepMind and Google’s Platforms and Devices teams. Key contributors listed in the post include Thiemo Alldieck, Marcos Seefelder, Hannah Woods, Pedro Velez, and Michael Milne, among others.
Auto frame represents a meaningful step beyond AI-assisted cropping. By grounding editing in 3D scene understanding, Google has made reframing a photo something the model can do in a principled way rather than through texture synthesis alone. Whether users notice the distinction matters less than whether the results look right — and the post’s framing of identity preservation as a central design constraint suggests that was the actual bar the team was working to meet.