OpenAI’s financial trajectory is highly dependent on infrastructure costs, which are the driving force behind the development of the new custom OpenAI Jalapeno chip. Application-specific integrated circuits (ASICs), developed in conjunction with Broadcom, represent a direct attempt to alleviate the large capital expenditures associated with third-party hardware.
Nvidia currently boasts an estimated 75% profit margin on its high-end processors, but OpenAI operates on tighter margins, maintaining profits of about 33 cents on the dollar even after taking into account huge operating expenses. Running large language models at scale is economically expensive.
Last year, it cost OpenAI a staggering $8.4 billion to keep ChatGPT servers responsive. The platform currently attracts 900 million weekly users and its operating costs are expected to reach approximately US$14 billion this year. Over the next eight years, OpenAI has committed approximately US$1.4 trillion to computing power. That’s a huge gamble for a company that currently generates US$25 billion in annual revenue.
Designing hardware for LLM inference
Dubbed the company’s first “intelligence processor,” the OpenAI Jalapeño chip is purpose-built for large-scale language model (LLM) inference, rather than general-purpose AI workloads. OpenAI provided the core architecture design based on a specific model roadmap and service system, and Broadcom managed the silicon engineering and high-performance networking integration.
TSMC is responsible for the physical manufacturing in Taiwan, while Celestica is responsible for building the boards and rack systems. According to OpenAI, early lab samples are already running state-of-the-art workloads, including an unreleased GPT-5.3-Codex-Spark model, at targeted production frequencies and power.
Richard Ho, head of hardware programs at OpenAI, said this architecture minimizes data movement and brings actual utilization closer to theoretical peak performance. Unlike general-purpose accelerators that apply traditional AI workloads, this architecture specifically balances compute, memory, and networking resources to solve the data movement bottlenecks inherent in interactive LLM services.
To achieve this at scale, the platform integrates Broadcom’s Tomahawk networking silicon directly into the design, allowing custom processors to communicate across large clustered data center environments.
vertically integrated flywheel
By moving to custom silicon, OpenAI moves from just a software layer to a vertically integrated infrastructure company. This full-stack strategy spans the entire pipeline: chip architecture, software kernel, memory system, network scheduling, and final application layer. Similar to the tight coupling between Apple’s proprietary hardware and iOS, OpenAI can now optimize its infrastructure based on a precise internal model roadmap.
This integration powers a continuously operating flywheel. Improved infrastructure efficiency reduces the cost of both training and serving models. More affordable services lead to better, more responsive products that allow user numbers and revenue to be reinvested into next-generation custom infrastructure.
Overcoming latecomer profits
By introducing its own silicon, OpenAI enters a landscape where its main competitors have spent nearly a decade developing their own hardware. Google began rolling out Tensor Processing Units (TPUs) in 2015 and now manages about a quarter of the world’s AI computing power outside of Nvidia’s supply chain.
Amazon has shipped more than 1 million custom chips, while Meta and Microsoft continue to expand their own infrastructure.
“Jalapeño is part of a long-term full-stack infrastructure strategy to make computing richer,” said Greg Brockman, president and co-founder of OpenAI. “By designing more of the stack in-house, we can deliver more intelligence more efficiently.”
To close this timeline gap, OpenAI accelerated the development phase. The OpenAI Jalapeño chip went from a clean sheet design to tape-out manufacturing, the final step before physical production, in just nine months. The engineering team achieved this schedule by leveraging OpenAI’s proprietary language model to automate and optimize parts of the hardware design process.
This creates a unique feedback loop in which the models provided to the user are actively leveraged to build the physical infrastructure that will run future iterations. Initial deployment of hardware in data centers is expected to begin by the end of 2026.
Broadcom CEO Hock Tan confirmed that the company is working with infrastructure partners, including Microsoft, to scale its deployment in preparation for gigawatt-scale data center consolidation.
(Photo provided by OpenAI)
See: Omio uses OpenAI models to scale travel product development
Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expos in Amsterdam, California, and London. This comprehensive event is part of TechEx and co-located with other major technology events. Click here for more information.
AI News is brought to you by TechForge Media. Learn about other upcoming enterprise technology events and webinars.

