Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Lossless compression tailored to AI

June 30, 2025

Easy to train your model using H100 GPU on nvidia dgx cloud

June 30, 2025

Best Pytorch Quantization Backend

June 29, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Monday, June 30
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Research»Three “fundamental” prerequisites for drug data projects
Research

Three “fundamental” prerequisites for drug data projects

By December 7, 2024No Comments9 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

(Adobe Firefly)

Many organizations rushing to implement AI into pharmaceutical and genomics research have reversed their priorities. Before moving forward with deep learning and large-scale language models, you need not only a solid data infrastructure, but also true data knowledge, says the developer of data platforms for scientific discovery, including a range of life science applications. argues Stavros Papadopoulos, CEO and founder of TileDB. ,

While the breakthroughs brought about by AI are understandably exciting, the underlying data challenges facing life sciences organizations run deep. According to Deloitte’s 2024 Global Life Sciences Sector Outlook, nearly 40% of the potential productivity gains in pharmaceuticals from AI will come from research and development. The report estimates that large pharmaceutical companies could save between $5 billion and $7 billion over five years if they close the gap in AI adoption. But these benefits depend on rigorous infrastructure, better data governance, and new approaches to collaboration. This is precisely the basis Papadopoulos insists on before fully embracing AI.

Currently, many research teams are still stuck in inefficiency and fragmentation. Despite triple-digit growth forecasts, only about 16% of drug discovery efforts are using AI. Data scientists often spend up to 80% of their time preparing data rather than analyzing it. There is also a pay gap. According to Glassdoor, pharmaceutical data scientists earn an average annual salary of about $124,000, but top technology companies pay well over $200,000, and some high-paying jobs in the field can earn around $1 million. Includes a substantial compensation package. This environment makes it difficult to recruit and retain top data talent in life sciences, especially since only a handful of computational biologists at large pharmaceutical companies can bridge the gap between biology and data engineering. It’s getting difficult.

Data reality check

Despite triple-digit growth forecasts, only about 16% of drug discovery activities currently use AI. — Deloitte 1

In addition to talent shortages, the data itself often goes against the traditional structure of many life sciences organizations. “99% of data is not tabular, but 99% of the research and solutions out there are focused on tables.” From Pandas in Python (for flexible data manipulation and analysis) to SQL (for relational data A suite of data science tools, from R’s Tidyverse (for data “organizing” and analysis) to Tableau (for interactive visuals and dashboards), is table-centric. Although these tools serve a variety of purposes, they often highlight a deep-rooted preference for data that can be flattened into rows and columns. The core problem is not the data, but the bad architecture and mindset. To address this, Papadopoulos proposes three “radical” premises, each setting the stage for a more mature, infrastructure-first approach, and hinting at a fourth consideration that will inevitably follow. I am.

Premise 1: “Don’t touch AI without data infrastructure”

Case in point: In some cases, a calculator is better than an LLM.

A recent preprint, From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting, highlights how LLM math problem-solving abilities can be enhanced by incorporating external tools such as calculators . Similarly, the “MathChat” framework, described in another preprint, uses conversational interactions between LLM agents and user proxy agents to tackle complex mathematical problems using code execution. has been.

“If you ask the LLM to divide a decimal, it can give you the wrong answer because it’s probabilistic and not a processor,” Papadopoulos says. “But if you connect it to a calculator tool, you’ll see that the LLM says, ‘Oh, that’s a math question. You should ask the calculator.'” Needless to say, the calculator is much more efficient than the LLM. It’s a great tool.

Regarding the bright, shiny object syndrome associated with many AI projects, Papadopoulos says bluntly: “You shouldn’t touch AI unless you have a data management infrastructure in place.” Without robust governance, security, cataloging, and unified access controls, AI adoption will only increase the disruption of existing data. In contrast, starting with a disciplined, database-centric foundation ensures that when AI is eventually deployed, it will be a simpler, well-structured approach rather than a disruptive and costly over-engineered approach. It is guaranteed to work as a power multiplier, rather than an approach that often yields slightly better results. Machine learning model.

To make the AI ​​dream a reality, Papadopoulos advocates a data-first mindset. “Focus your efforts on building the best data management system possible. Once you’ve done the work safely manually, bring in AI to automate the tasks.” With a strong infrastructure, AI is the final piece of the puzzle, allowing you to seamlessly leverage a well-organized and well-managed data store. AI systems can then act as natural language interfaces to complex data ecosystems. Scientists can simply interact with the system instead of wrangling query languages. “This is where AI comes in. AI is not going to give you something crazy,” Papadopoulos says. You can interface with the system using natural language. This is the greatest value of AI for me. Understand what your users want and execute their queries. ”

Assumption 2: “Unstructured data does not exist”

The idea that certain data types are inherently “unstructured” is a fundamental misconception. Papadopoulos argues that every dataset, no matter how complex, contains unique patterns.
“Unstructured data does not exist…White noise may be the closest thing to having no structure, but it still follows a uniform distribution.”

Papadopoulos continued: “We leave[data]’unstructured’ because we don’t have a proper system to structure it, and that’s what causes the problem.” said.

finding order in chaos

“All data – tables, images, RNA, DNA, point clouds, satellite imagery – is essentially an array of values,” explains Papadopoulos. Even encrypted text and seemingly random signals can have patterns under the proper lens. Challenge: Current modeling approaches and SQL-centric tools often flatten multidimensional data into rows and columns, removing important context.

Implementation challenges

Relying on table-centric models incorporates rich genomic, image, and clinical data into a rigid two-dimensional schema. This discrepancy leads to loss of insight. For example, genome sequences require a structure that maintains hierarchical and multidimensional relationships, while imaging and clinical data require a format that captures complexity without artificially simplifying it.

architectural approach

Dr. Stavros Papadopoulos

Dr. Stavros Papadopoulos

Employing architectures that recognize that all data is inherently structured, such as multidimensional arrays and schema-on-demand design, allows for more nuanced modeling. A domain-specific query language can reflect scientific workflows rather than forcing everything into a SQL bottleneck.

Impact on life sciences

Recognizing that no data is truly “unstructured” frees researchers from traditional constraints. Life sciences organizations can uncover natural patterns in their data rather than skewing it to fit outdated models. This change paves the way for more accurate insights and the foundation on which advanced analytics and AI can thrive.

Assumption 3: “Currently, there are no best data practices, only bad ones.”

In Papadopoulos’ view, current norms do not even rise to the level of best practice.
“Right now there are no best practices. We’re only seeing bad practices.”

Although previously avoided, he warns that organizations that rush into AI without doing the groundwork are setting themselves up for failure. Instead, you must start by building a database-centric, secure, and discoverable data ecosystem before turning to AI. Once these foundations are in place, AI can serve as a query interface to integrated systems, allowing researchers to leverage data through natural language rather than working on fragmented ad-hoc setups.

Chart a path to better data practices

Beyond RAG: Agents, Tools, and True Data Integration

Search Augmentation Generation (RAG) helps large language models retrieve relevant text snippets (often PDFs), but is insufficient for complex scientific data. “RAG is almost exclusively PDF-only,” Papadopoulos says. That is, it cannot handle the complexity of multidimensional data, genomic data, etc.

A broader approach: We advocate employing the LLM as an orchestrator who knows which specialized tools and databases to refer to. Rather than just retrieving text, LLM can interact with integrated data infrastructures, query domain-specific APIs, perform calculations through calculators, navigate multidimensional arrays, and more. can. This turns LLM into a powerful proxy that can go far beyond simple text searches and ask the “right questions” of the data ecosystem.

Once disciplined data management is established, the LLM can act as an orchestrator, leveraging specialized tools, databases, or computational engines behind the scenes. Scientists don’t need to be API experts. Their focus remains on the research question at hand.

Papadopoulos’ roadmap challenges the status quo. He calls for a complete rethinking of data infrastructure, explicitly adhering to fundamental principles similar to FAIR (searchable, accessible, interoperable, reusable) from the start, rather than half-measures. is recommended. “If you have the discipline from day one to think about discoverability, accessibility, and everything else, you have information security covered,” he explains. By incorporating principles like FAIR into their architecture, organizations can ensure that their data is not only well managed and protected, but also well positioned for future interoperability and reusability. Masu.

Start with a robust infrastructure: Treat all scientific information, including genome sequences, images, and clinical data, as inherently structured. “Focus your efforts on building the best data management system possible,” he says. Prioritize governance and security: Authentication, authorization, and auditing should be designed in from day one. “If you start with a database approach and discipline, you have information security covered,” Papadopoulos said. Establish discipline and consistency: Consolidate all data sources into a unified, discoverable repository that aligns with your research workflow. This allows you to find “whatever you want.” Master manual workflows before automating: Understand patterns and bottlenecks. Only then should you leverage AI to automate repetitive tasks. Without a structured system to query large-scale language models (LLMs), no amount of intelligence will yield meaningful insights. Use AI as the final layer: With a solid infrastructure in place, AI can be a powerful ally and a natural language interface to a rich data ecosystem. AI works in harmony with data, rather than forcing messy data.

Essentially, these “radical” assumptions are not about adding complexity. They aim to remove barriers. When organizations realize that all data is structured and can be well managed, AI will move from hype to practical accelerator, and scientists and researchers will move from endless rotation of big data to true innovation. You will be able to concentrate.

“For the next five years as AI evolves, focus your efforts on understanding the data management story,” Papadopoulos said.

References:

This data point comes from Deloitte’s 2024 Global Life Sciences Sector Outlook report, published on May 31, 2024, which states: It will take the next three to five years. ”

author avatar
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleGoogle DeepMind on NeurIPS 2024
Next Article Unleashing the power of AI with A1: The future of artistic creation

Related Posts

Research

Lossless compression tailored to AI

June 30, 2025
Research

Can scholars write journal papers using AI? What the guidelines say

June 27, 2025
Research

Professor UAB builds user-friendly tools to find hidden AI security threats

June 26, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

BitMart Research: MCP+AI Agent – A new framework for AI

May 13, 20251 Views

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20251 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

BitMart Research: MCP+AI Agent – A new framework for AI

May 13, 20251 Views

The UAE announces bold AI-led plans to revolutionize the law

April 22, 20251 Views

The UAE will use artificial intelligence to develop new laws

April 22, 20251 Views
Don't Miss

Lossless compression tailored to AI

June 30, 2025

Easy to train your model using H100 GPU on nvidia dgx cloud

June 30, 2025

Best Pytorch Quantization Backend

June 29, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?