OpenAI Agents SDK improves governance with sandboxed execution

OpenAI introduces sandbox execution that enables enterprise governance teams to deploy automated workflows while controlling risk.

Teams moving systems from prototype to production face difficult architectural compromises regarding where operations occur. Using a model-agnostic framework provided initial flexibility, but did not fully exploit the capabilities of the frontier model. Model provider SDKs remained close to the underlying model, but often lacked sufficient visibility into the control harness.

Further complicating matters, while managed agent APIs have simplified the deployment process, they place severe restrictions on where the system can run and how sensitive corporate data can be accessed. To solve this, OpenAI is introducing new features in the Agent SDK to provide developers with a standardized infrastructure featuring model-native harnessing and native sandbox execution.

The updated infrastructure adjusts execution to the natural behavior patterns of the underlying model, improving reliability when tasks need to be coordinated across different systems. Oscar Health provides an example of this efficiency with unstructured data.

Healthcare providers tested new infrastructure to automate clinical record workflows that could not be handled reliably with older approaches. The engineering team needed an automated system to extract the correct metadata while correctly understanding patient contact boundaries within complex medical files. By automating this process, providers can more quickly analyze patient history, speed care coordination, and improve the overall member experience.

Rachael Burns, staff engineer and AI technology lead at Oscar Health, said: “With the updated Agent SDK, we are now production-ready to automate critical clinical record workflows that our previous approach could not handle reliably enough.

“The difference for us was being able to not only extract the right metadata, but also correctly understand the boundaries of each patient in a long, complex record. As a result, we were able to more quickly understand what was happening with each patient on a given visit, helping our members with their care needs and improving their experience with us.”

OpenAI uses model-native harnesses to optimize AI workflows

To deploy these systems, engineers must manage vector database synchronization, control hallucination risks, and optimize expensive computational cycles. Without a standard framework, internal teams often resort to building brittle custom connectors to manage these workflows.

The new model-native harness reduces this friction by introducing configurable memory, sandbox-aware orchestration, and file system tools like Codex. Developers can integrate standardized primitives such as tool usage via MCP, custom instructions via AGENTS.md, and file editing using patching tools.

With skill using shell tools and step-by-step disclosure through code execution, the system can also perform complex tasks sequentially. This standardization allows engineering teams to spend less time updating core infrastructure and focus on building domain-specific logic that directly benefits the business.

Precise routing is required to integrate autonomous programs into traditional technology stacks. When autonomous processes access unstructured data, they rely heavily on search systems to retrieve relevant context.

To manage the integration of diverse architectures and limit operational scope, the SDK introduces manifest abstraction. This abstraction standardizes the way developers describe workspaces, allowing them to mount local files and define output directories.

Teams can connect these environments directly to leading enterprise storage providers such as AWS S3, Azure Blob Storage, Google Cloud Storage, and Cloudflare R2. Establishing a predictable workspace gives your model precise parameters about where to place inputs, write outputs, and maintain organization during long production runs.

This predictability prevents the system from querying the unfiltered data lake and limits queries to a specific validated context window. Data governance teams can then better track the origin of any automated decisions from local prototype stage to production deployment.

Enhanced security with native sandbox execution

The SDK natively supports sandboxed execution, providing an out-of-the-box layer that allows you to run your programs within a controlled computer environment that includes the necessary files and dependencies. Engineering teams no longer need to manually piece together this execution layer. Deploy your own custom sandbox or take advantage of built-in support from providers like Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Vercel, and more.

Risk mitigation remains a top concern for enterprises deploying autonomous code execution. Security teams should expect that any system that reads external data or executes generated code will face prompt injection attacks and exfiltration attempts.

OpenAI addresses this security requirement by separating the control harness from the compute layer. This separation isolates credentials and completely isolates them from the environment in which model-generated code runs. Separating the execution layer prevents injected malicious commands from accessing the central control plane or stealing primary API keys, protecting the broader corporate network from lateral movement attacks.

This separation also addresses the issue of computational cost regarding system failures. Long-running tasks often fail prematurely due to network timeouts, container crashes, or API limitations. If a complex agent executes 20 steps to create a financial report and fails at step 19, rerunning the entire sequence consumes expensive computing resources.

If the environment crashes under the new architecture, the loss of the sandbox container does not mean the entire production run is lost. Since system state remains externalized, the SDK leverages built-in snapshots and rehydration. The infrastructure can restore state in a new container and restart exactly from the last checkpoint if the original environment expires or fails. Eliminating the need to restart expensive, long-running processes directly translates into reduced cloud computing spending.

Scaling these operations requires dynamic resource allocation. The isolated architecture allows you to invoke single or multiple sandboxes at runtime based on the current load, route specific subagents to isolated environments, and parallelize tasks across many containers to reduce execution time.

These new features are generally available to all customers via API, with standard pricing based on token and tool usage without requiring custom procurement agreements. The new harness and sandbox features are launching first for Python developers, with TypeScript support planned for a future release.

OpenAI plans to introduce additional features such as code modes and subagents to both the Python and TypeScript libraries. The vendor plans to expand its broader ecosystem over time by supporting additional sandbox providers and providing more ways for developers to connect their SDKs directly to their existing internal systems.

SEE ALSO: Commvault launches “Ctrl-Z” for cloud AI workloads

Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expos in Amsterdam, California, and London. This comprehensive event is part of TechEx and co-located with other major technology events such as Cyber Security & Cloud Expo. Click here for more information.

AI News is brought to you by TechForge Media. Learn about other upcoming enterprise technology events and webinars.

versatileai

See Full Bio

What's Hot

Takeda Pharmaceutical signs USD 600 million AI drug discovery agreement with Insilico

Google DeepMind and A24 begin research partnership

NVIDIA BioNeMo accelerates human clade science

Takeda Pharmaceutical signs USD 600 million AI drug discovery agreement with Insilico

Google DeepMind and A24 begin research partnership

NVIDIA BioNeMo accelerates human clade science

Achieve density and score across distributions with one transformer

How NVIDIA AI-Q reached #1 on DeepResearch Bench I and II

New in llama.cpp: Model Management

Most Popular

Achieve density and score across distributions with one transformer

How NVIDIA AI-Q reached #1 on DeepResearch Bench I and II

New in llama.cpp: Model Management

Don't Miss

Takeda Pharmaceutical signs USD 600 million AI drug discovery agreement with Insilico

Google DeepMind and A24 begin research partnership

NVIDIA BioNeMo accelerates human clade science

Subscribe to Updates

What's Hot

OpenAI Agents SDK improves governance with sandboxed execution

OpenAI uses model-native harnesses to optimize AI workflows

Enhanced security with native sandbox execution

Related Posts