As inference costs rise, enterprises are rethinking their AI infrastructure

AI spending in Asia Pacific continues to rise, but many companies still struggle to derive value from their AI projects. Most systems are not built to perform inference at the speed or scale that real-world applications require, so many rely on infrastructure to support AI. Industry research shows that despite significant investments in GenAI tools, many projects fail to meet ROI goals due to issues.

This gap shows how much AI infrastructure impacts performance, cost, and the ability to scale real-world deployments within a region.

Akamai seeks to address this challenge with Inference Cloud, built with NVIDIA and powered by the latest Blackwell GPUs. The idea is simple. If most AI applications need to make decisions in real time, those decisions should be made close to the user rather than in a faraway data center. Akamai claims that this change will allow enterprises to control costs, reduce latency, and support AI services that rely on instantaneous responses.

Jay Jenkins, CTO of Cloud Computing at Akamai, explained to AI News why this moment is forcing enterprises to rethink how they deploy AI, and why inference, not training, is the real bottleneck.

Why AI projects are difficult without the right infrastructure

Jenkins says the gap between experimentation and full-scale implementation is much larger than many organizations expect. “Many AI initiatives fail to deliver the expected business value because companies often underestimate the gap between experiment and production,” he says. Despite strong interest in GenAI, progress is often hampered by large infrastructure fees, high latency, and difficulty running models at scale.

Jay Jenkins, Akamai Cloud Computing CTO

Most enterprises still rely on centralized clouds and large GPU clusters. However, as usage increases, these setups become too expensive, especially in regions far from major cloud zones. Latency also becomes a big issue when a model needs to perform multiple inference steps over long distances. “AI is only as powerful as the infrastructure and architecture it runs on,” Jenkins said, adding that delays often undermine the user experience and the value the business expected it to deliver. He also cited multi-cloud setups, complex data rules, and increased compliance needs as common obstacles that slow the transition from pilot projects to production.

Why inference requires more attention than training

Across the Asia-Pacific region, AI adoption is moving from small-scale pilots to live implementation in apps and services. Jenkins points out that when this happens, it’s the daily inference that consumes the most computing power, not the occasional training cycle. As many organizations deploy languages, vision, and multimodal models in multiple markets, the demand for fast and reliable inference is growing faster than expected. This is why inference is the main constraint in this region. Today, models must operate in a variety of languages, regulations, and data environments, often in real time. This puts tremendous pressure on centralized systems that were not designed for this level of responsiveness.

How edge infrastructure improves AI performance and cost

Moving inference closer to the user, device, or agent could reshape the cost equation, Jenkins said. Doing so reduces the distance the data has to travel and makes the model respond faster. You can also avoid the cost of routing large amounts of data between major cloud hubs.

Physical AI systems (robots, autonomous machines, smart city tools, etc.) rely on millisecond decisions. These systems do not perform as expected when inference is performed remotely.

Savings from more localized deployments can also be significant. Jenkins said Akamai’s analysis found that companies in India and Vietnam significantly reduced the cost of running image generation models by placing workloads at the edge rather than in a centralized cloud. Improved GPU utilization and lower egress charges played a large role in these savings.

Where edge-based AI gains traction

Early demand for edge inference will be strongest from industries where even small delays can impact revenue, safety, and user engagement. Retail and e-commerce are the first to adopt, as shoppers often abandon slow experiences. Personalized recommendations, search, and multimodal shopping tools all perform better when inference is local and fast.

Finance is another area where latency directly impacts value. Jenkins said workloads such as fraud checking, payment authorization, and transaction scoring rely on a series of AI decisions made in milliseconds. Performing inference closer to where data is created enables financial companies to respond faster and keep data within regulatory boundaries.

Why cloud and GPU partnerships are even more important now

As AI workloads grow, enterprises need an infrastructure that can handle them. Jenkins says this has led to closer collaboration between cloud providers and GPU manufacturers. One example is Akamai’s collaboration with NVIDIA, which deploys GPUs, DPUs, and AI software in thousands of edge locations.

The idea is to build an “AI distribution network” that distributes inference across many sites, rather than concentrating everything in a few regions. This helps improve performance, but also supports compliance. Jenkins points out that nearly half of large organizations in APAC struggle with different data rules across markets, making local processing more important. Now, new partnerships are shaping the next phase of AI infrastructure in the region, especially for workloads that rely on low-latency responses.

These systems have security built in from the beginning, Jenkins said. Zero trust controls, data-aware routing, and fraud and bot protection are becoming standard parts of the technology stack offered.

Infrastructure required to support agent AI and automation

Running agent systems that make many decisions in sequence requires an infrastructure that can operate at millisecond speeds. Jenkins believes the region’s diversity makes this difficult, but not impossible. Connectivity, rules, and technical readiness vary widely from country to country, so AI workloads must be flexible enough to run where it makes the most sense. He points to research showing that most companies in the region are already using public clouds in production, but expects many to rely on edge services by 2027. That transition requires an infrastructure that can keep data in-country, route tasks to the nearest appropriate location, and maintain functionality even when networks are unstable.

What companies need to prepare for next

As inference moves to the edge, enterprises need new ways to manage operations. Jenkins says organizations should expect a more distributed AI lifecycle where models are updated across many sites. This requires better orchestration and strong visibility into the performance, cost, and errors of core and edge systems.

Data governance is more complex, but also easier to manage when processing remains local. With half of the region’s large enterprises already struggling with regulatory disparities, it helps to locate inference closer to where data is generated.

Security also requires additional attention. Extending inference to the edge improves resiliency, but it also means every site must be secure. Businesses need to secure their APIs and data pipelines to protect themselves from fraud and bot attacks. Jenkins points out that many financial institutions already rely on Akamai’s management in these areas.

(Photo provided by Igor Omilaev)

Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expos in Amsterdam, California, and London. This comprehensive event is part of TechEx and co-located with other major technology events. Click here for more information.

AI News is brought to you by TechForge Media. Learn about other upcoming enterprise technology events and webinars.

versatileai

See Full Bio

What's Hot

As inference costs rise, enterprises are rethinking their AI infrastructure

ArtAny AI: The best AI art generator for creating stunning visual works

Open ML considerations in EU AI law

Open ML considerations in EU AI law

How the Royal Navy uses AI to reduce recruitment workload

20x faster TRL fine-tuning with RapidFire AI

Paris AI Safety Breakfast #3: Yoshua Bengio

Google launches Nano Banana Pro, focused on more reliable AI art generation

AI company Klay Vision signs licensing agreement with major label

Most Popular

Paris AI Safety Breakfast #3: Yoshua Bengio

Google launches Nano Banana Pro, focused on more reliable AI art generation

AI company Klay Vision signs licensing agreement with major label

Don't Miss

As inference costs rise, enterprises are rethinking their AI infrastructure

ArtAny AI: The best AI art generator for creating stunning visual works

Open ML considerations in EU AI law

Subscribe to Updates

What's Hot

As inference costs rise, enterprises are rethinking their AI infrastructure

Why AI projects are difficult without the right infrastructure

Why inference requires more attention than training

How edge infrastructure improves AI performance and cost

Where edge-based AI gains traction

Why cloud and GPU partnerships are even more important now

Infrastructure required to support agent AI and automation

What companies need to prepare for next

Related Posts