GPT-5.5 is OpenAI’s most capable agent AI model to date

OpenAI released GPT-5.5 on April 23rd as what it calls “a new class of intelligence for real-world work and agent enhancement.” That framework is intentional. OpenAI says this is the most capable agent AI model to date, built from the ground up to plan, use tools, see its own output, and perform tasks independently.

GPT-5.5 is the first retrained base model since GPT-4.5 and is co-designed with NVIDIA’s GB200 and GB300 NVL72 rack-scale systems. The real difference, the company says, is that with GPT 5.5, tasks that previously required multiple prompts and human “course correction” can now be taken over more completely. This model is being rolled out to Plus, Pro, Business, and Enterprise users of ChatGPT and Codex. API access continued on April 24th.

benchmark

OpenAI’s strongest performance is in Terminal-Bench 2.0, a benchmark that tests command-line workflows that require planning and tool coordination in a sandbox environment. GPT-5.5 scores 82.7%, compared to 75.1% for GPT-5.4 and 69.4% for Claude Opus 4.7.

In SWE-Bench Pro, which evaluates GitHub issue resolution, GPT-5.5 reached 58.6%, solving more issues in a single pass than previous versions. OpenAI also introduced Expert-SWE, an internal benchmark with a median human-estimated completion time for tasks of 20 hours. GPT-5.5 scored 73.1%, up from 68.5% for GPT-5.4.

On Long Context Inference, 1 million token MRCR v2 (a search benchmark that tests whether a model can find a specific answer buried in a large document), GPT-5.5 scores 74.0% compared to 36.6% for GPT-5.4.

However, in MCP Atlas, Scale AI’s model context protocol tool usage benchmark, Claude Opus 4.7 leads with 79.1% and no score was recorded in GPT-5.5. OpenAI included that lack in its own benchmark table. This at least shows the company’s confidence in the big picture.

Token efficiency, pricing reality

API access is priced at USD 5 per million input tokens and USD 30 per million output tokens, which is exactly double the price of GPT-5.4. OpenAI’s defense is that GPT-5.5 completes the same Codex task with fewer tokens than GPT-5.4, so the effective cost is about 20% higher when efficiency is taken into account, a claim verified by independent testing organization Artificial Analysis.

GPT-5.5 Pro is available to Pro, Business, and Enterprise users and priced at USD 30 per million input tokens and USD 180 per million output tokens. It applies additional parallel test-time computations for more difficult problems and leads 90.1% of the list of published models in BrowseComp, OpenAI’s agent web browsing benchmark.

For token efficiency, it’s worth stress testing against real-world workloads before committing to switching models. For 10 million output tokens per month, GPT-5.5 standard costs $300 vs. $250 for Claude Opus 4.7, 20% less task repetition and retries due to the model’s superior agent performance, and calculations vary by use case.

actually

According to Open AI, more than 85% of employees now use Codex on a weekly basis in departments such as engineering and marketing. In one example, the communications team used GPT-5.5 to process six months’ worth of speaking request data. This model was able to build a scoring and risk framework to automate low-risk approvals.

Greg Brockman described the release as “a real step towards the kind of computing we can expect in the future,” while lead scientist Jakub Paciocki noted that the model’s progress over the past two years has felt “surprisingly slow.”

OpenAI says GPT-5.5 matches the per-token latency of GPT-5.4 in production while providing performance with a higher level of intelligence. Larger, more capable models often take longer to serve, but that tradeoff was avoided here.

Whether benchmarking leads to increased productivity for the teams running the actual agent pipeline is a question that will take several weeks to properly answer. Terminal-Bench scores are promising for unattended terminal agents and DevOps automation. The gap in MCP Atlas is notable for those building heavily on orchestration with tools.

See also: OpenAI brings GPT-5.5 to Codex for coding tasks

(Image source: “The Agent” Fossil Watch by MarkGregory007 is licensed under CC BY-NC-SA 2.0.)

Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expos in Amsterdam, California, and London. This comprehensive event is part of TechEx and co-located with other major technology events such as Cyber Security & Cloud Expo. Click here for more information.

AI News is brought to you by TechForge Media. Learn about other upcoming enterprise technology events and webinars.

versatileai

See Full Bio

What's Hot

GPT-5.5 is OpenAI’s most capable agent AI model to date

What is optical interconnect and why Lightelligence’s $10 billion debut claims it’s important for AI

Adaptive ultrasound imaging with physics-based NV-Raw2Insights-US AI

What is optical interconnect and why Lightelligence’s $10 billion debut claims it’s important for AI

Adaptive ultrasound imaging with physics-based NV-Raw2Insights-US AI

A billion-dollar startup with different ideas for AI

Disney invests $1 billion in OpenAI, licenses over 200 characters for Sora AI tool

The most comprehensive evaluation suite for GUI agents!

Diffusers welcome stable spread 3

Most Popular

Disney invests $1 billion in OpenAI, licenses over 200 characters for Sora AI tool

The most comprehensive evaluation suite for GUI agents!

Diffusers welcome stable spread 3

Don't Miss

GPT-5.5 is OpenAI’s most capable agent AI model to date

What is optical interconnect and why Lightelligence’s $10 billion debut claims it’s important for AI

Adaptive ultrasound imaging with physics-based NV-Raw2Insights-US AI

Subscribe to Updates

What's Hot

GPT-5.5 is OpenAI’s most capable agent AI model to date

benchmark

Token efficiency, pricing reality

actually

Related Posts