The field of artificial intelligence (AI) has experienced explosive growth in recent years, driven by advances in large-scale language models (LLMs) and breakthroughs in deep learning. However, many experts argue that the relentless scaling of text-based models featuring billions or even trillions of parameters has reached a point of diminishing returns. So where is the next frontier in AI? Dhwanit Agarwal, a PhD in computational science from the University of Texas at Austin, a gold medalist from the Kanpur Institute of Technology, and a recognized leader in machine learning and generative AI. According to , the future may lie in vision AI, especially at scale. Controllable generation of images and videos.
Scaling your LLM: limits to growth
Over the past few years, text-based models such as GPT have achieved a staggering number of parameters, from 400 billion to over 2 trillion. The context window has been expanded to handle up to 2 million+ tokens. This brute force scaling fundamentally relies on vast amounts of data and computational power, and has undoubtedly revolutionized natural language processing (NLP).
But researchers like Dwanit Agarwal believe we are approaching a plateau. From his vantage point as an AI engineer with more than 10 patents and numerous papers published in prestigious conferences such as CVPR and journals such as the Journal of Computational Physics, Dhwanit explains:
“Data resources for text-based models are becoming saturated, and simply making these models bigger already faces diminishing returns.”
In essence, the once exponential benefits of scaling LLM may soon diminish, prompting the AI community to explore other avenues for innovation.
Vision AI’s untapped potential
Although LLMs have reached unprecedented scale, image and video generative vision models are still significantly smaller, typically limited to around 30 billion parameters. This is just a small portion of what LLM has accomplished, and there is plenty of room for growth in the vision AI space.
More data, less saturation
Unlike textual data, which is approaching saturation, the world of visual data such as images and videos remains vast and underutilized. The magnitude of training data and model parameters in this area has not yet matched the levels seen in LLM, indicating great potential for further development.
Controllable Generation: The Next Leap
The future of visual content creation goes beyond scaling to having more control over the output produced. Current state-of-the-art models often behave like “broad brushes”, producing loosely guided output according to prompts. Truly breakthrough applications require greater precision, Dhwanit emphasizes.
“To truly disrupt the media industry, we need finer brushes—advanced models that allow artists and designers to manipulate the style, composition, and detail of their work with surgical precision.”
The transition to controllable, AI-driven power generation has the potential to transform industries from entertainment to advertising and create significant economic value.
AI agents: bridging models and tasks
While generative vision models are growing, another exciting development is the rise of AI agents. It is a system that allows you to link multiple generative AI models and external tools to complete complex multi-step tasks.
Connecting models for practical applications
Imagine an AI-driven workflow that combines:
Text generation for analyzing research reports, Vision AI for creating attractive advertising graphics, domain-specific tools such as project management software and equity research platforms.
AI agents can orchestrate these diverse functions, saving countless hours and achieving unprecedented productivity. Whether it’s equity research or media production, these agent systems can perform complex workflows that previously required significant human oversight.
Why agents matter
AI agents bridge the gap between large-scale generative capabilities and specialized tasks by “thinking” and “acting” across multiple domains. This synergy could be the next big milestone in AI after the LLM boom.
Academics and R&D: Reviving Innovation
Due to the costly nature of training large models, the modern AI era is largely dominated by industry-driven efforts. But Dwanit Agarwal, whose academic achievements include a PhD from the University of Austin and a gold medal from the Kanpur Institute of Technology, believes the focus is returning to academia.
“With LLMs hitting a wall with scaling alone, the spotlight is on new architectures, smarter data usage, and hybrid systems – areas where academia has historically excelled.”
Rather than simply scaling up, researchers are rethinking innovative approaches, including:
New architectures: Dynamic networks, hypernetworks, and next-generation transformer variants. Efficient training: How to learn from small, carefully selected datasets without incurring prohibitive computational costs. New modalities: Go beyond text and 2D images to include 3D, VR, AR, and real-time sensor fusion.
These academic advances could usher in the next wave of AI advancements.
final thoughts
As the AI landscape evolves, it is clear that generative AI is at a crossroads. Although LLM has demonstrated the power of large-scale models, it now faces practical and theoretical limitations. Vision AI represents an exciting new frontier, with untapped potential for massive innovation and fine-grained control.
At the same time, AI agents offer a glimpse of a future where different models and domains work in harmony to automate complex tasks and drive efficiency and creativity to new heights. Meanwhile, academia is reclaiming its role as a melting pot of innovation, developing new architectures and modalities that will shape the next decade of AI.
In this changing landscape, experts like Dhwanit Agarwal believe that vision AI and controllable generation are redefining the boundaries of digital creativity, and that the most transformative breakthroughs will come from those brave enough to think beyond today’s limits. I believe it comes from people.
About the author
Dhwanit Agarwal is an experienced AI engineer and researcher with a PhD in Computational Science from the University of Texas at Austin. A gold medalist from IIT Kanpur, Dhwanit has published widely in top conferences such as CVPR and leading journals such as the Journal of Computational Physics.
With over 10 patents in AI, we continue to push the boundaries of machine learning, generative AI, and next-generation visual content creation. Connect with him on LinkedIn.