Ethan Mollick reviews early access to GPT-5.5, noting capability gains and remaining gaps

This article summarises analysis from One Useful Thing by Ethan Mollick. The observations and assessments described are Mollick’s own.

Ethan Mollick, writing in One Useful Thing, reports that he was given early access to GPT-5.5 and describes it as a notable step in a continuing improvement curve, while cautioning that capability gaps remain.

Model, app, and harness framing

Mollick organises his assessment around three concepts he says readers should use to understand AI: models (the underlying AI systems, such as GPT-5.5, Claude Opus 4.6, or Gemini 3.1), apps (the products used to interact with models, such as chatgpt.com or Claude Code), and harnesses (the tool frameworks that allow models to take multi-step actions). He writes that OpenAI has made advances in all three areas with the GPT-5.5 release.

Coding performance

To illustrate GPT-5.5’s capabilities, Mollick gave the same coding prompt to several models — from OpenAI’s o3 reasoning model to the open-weights Kimi K2.6 and the new GPT-5.5 Pro — asking each to “build a procedurally generated 3D simulation showing the evolution of a harbor town from 3000 BCE to 3000 AD.” Mollick writes that only GPT-5.5 Pro actually modelled an evolving town rather than generating replacement buildings over time. He also reports a speed improvement: GPT-5.4 Pro completed the task in 33 minutes; GPT-5.5 Pro completed it in 20.

Image generation

Mollick describes a new image model he refers to as “GPT-imagegen-2,” which he says can render high-quality text in images and handle a wide range of visual prompts. He demonstrates this with his recurring “Otter Test,” which he has used across multiple newsletter editions to track AI image quality.

Agentic use and remaining limits

Mollick describes using OpenAI’s Codex, powered by GPT-5.5, to process several hundred anonymised data files from his own prior crowdfunding research — a mix of STATA, CSV, XLS, and Word files — and produce a literature review, hypothesis, and formatted academic paper from four prompts. He writes that he “would have been very happy if this paper was the outcome of a 2nd year PhD project,” while noting the hypothesis was not particularly interesting and that standard causal concerns remained.

On long-form writing, Mollick identifies persistent weaknesses: “a love of the uncanny; overly complex ideas that do not fully pay off; weird metaphors; too many ornate sentences; dialogue where every character speaks in the same clipped tone.” He also notes, apparently as a recurring observation: “and the name ‘Mara.’”

Mollick closes by stating that GPT-5.5 is “clearly not the end of this process, but it is a noteworthy step along the way,” and that capability gains appear to be accelerating compared to a year prior.

This piece is based solely on Mollick’s account in One Useful Thing. The GPT-5.5 model had not been publicly released at the time of his writing.