GPT-5.5 system card: safety-first framing, parallel compute variant, 200 early-access partners

OpenAI released a system card for GPT-5.5 describing the predeployment safety process, the model’s intended use profile, and the methodology used to evaluate its more capable variant, GPT-5.5 Pro. The document is an official disclosure, not a technical report, and its scope reflects that: it covers how the model was tested and what safeguards were applied, rather than architecture or training details.

What the system card says the model is for

OpenAI’s system card describes GPT-5.5 as “a new model designed for complex, real-world work, including writing code, researching online, analyzing information, creating documents and spreadsheets, and moving across tools to get things done.” The framing is explicitly agentic — the model is not positioned as a chat interface but as something that executes tasks across multiple tools.

The card articulates four behavioral properties that distinguish GPT-5.5 from prior models: it understands the task earlier, asks for less guidance, uses tools more effectively, and continues checking its work until the task is complete. These are claimed relative to “earlier models” without specifying which. Each of these properties is operationally significant for agentic deployments — the ability to work with less upfront instruction and to self-verify reduces the burden on the orchestration layer and on users managing long-running tasks.

Safety evaluation process

OpenAI says GPT-5.5 was subjected to its “full suite of predeployment safety evaluations” and evaluated under the Preparedness Framework. The safety evaluation included targeted red-teaming for advanced cybersecurity and biology capabilities — two domains that the Preparedness Framework treats as high-risk because they could plausibly accelerate harm at scale.

Feedback was collected from nearly 200 early-access partners across real use cases before release, according to the card. This pre-release partner program serves multiple functions: it surfaces edge cases that internal red-teaming might miss, it provides a set of practical use cases that can be analyzed for misuse patterns, and it builds in a feedback loop before public exposure. The card describes this as part of releasing the model “with our strongest set of safeguards to date, designed to reduce misuse while preserving legitimate, beneficial uses of advanced capabilities.”

That phrase — “while preserving legitimate, beneficial uses” — is worth noting. OpenAI has calibrated its language to acknowledge that stronger safeguards carry a cost in terms of capability reduction, and that the goal is to minimize that cost rather than to maximize restriction. The system card doesn’t specify where that calibration landed or how it was measured.

GPT-5.5 Pro: same model, different setting

The system card introduces an important methodological point about GPT-5.5 Pro, which is described not as a separate model but as “the same underlying model using a setting that makes use of parallel test time compute.” The default evaluations in the card were run on GPT-5.5, and the document states that OpenAI “generally treat GPT-5.5’s safety results as strong proxies for GPT-5.5 Pro.”

The qualifier “generally” is significant. The card says OpenAI separately evaluates GPT-5.5 Pro “in certain cases” where the parallel compute setting “could materially impact the relevant risks or appropriate safeguards posture.” This distinction matters for interpreting any safety claims made about the Pro variant: most of its safety characterization derives from the base model’s evaluation, with targeted additions where the enhanced compute setting changes the risk profile.

The evaluation setting is also explicitly noted: results in the system card describe “evaluations we ran in an offline setting,” meaning they are not derived from real-world deployment telemetry but from controlled pre-release testing.

What the card does not cover

System cards are, by design, disclosure documents rather than technical papers. This one does not describe model architecture, training data, parameter counts, or benchmark numbers. It does not address pricing, API availability timelines, or comparisons to other models. Those details appear in other OpenAI materials but are outside the scope of what this card addresses.

The overall posture of the release — safety evaluation first, partner feedback before broad availability, an explicit proxy methodology for the Pro variant — reflects a process OpenAI has been building toward for several years. Whether the Preparedness Framework and red-teaming translate to meaningful risk reduction in deployment, and how the model performs when it encounters novel misuse patterns not covered by pre-release testing, are questions the card raises but cannot answer.