Data rules it all in AI-based drug discovery

Automation of data-related tasks is crucial for the success of big pharma and SME biotech in drug discovery

Receptor.AI Company
Receptor.AI
Published in
4 min readAug 1, 2022

--

The success of AI-based drug discovery depends on the quality of data used for model training. Collection, curation and evaluation of the data is a key step in any AI-based project in any subject area. Data preparation may take up to 50% of the time for data scientists in drug discovery, which makes it tempting to optimise and shorten this step. All pharma companies that did this failed because they violated one of the most fundamental principles of data science: “garbage in — garbage out”. The quality of data and automated tools for data processing is absolutely essential for the success of any AI-based drug discovery project. Thus, the Receptor.AI drug discovery platform is developed from a data-centric perspective.

AI-based drug discovery is a young field, but all the low-hanging fruit has been picked already. All structured public databases are incorporated into the training sets by all major companies, so the competition mainly involves the model architectures and training protocols. In this situation, the crucial way to success is the easy, fast and seamless incorporation of the data, produced or owned by the companies, into the AI workflow.

Nobody knows the data better than the experts, who obtained them, but, at the same time, chemists and biologists usually lack deep knowledge in data science. What is essential for a biomedical expert is not necessarily important for the AI data engineer and vice versa. The data, which are perfectly fine for a wet lab biologist, could be utterly unusable for machine learning.

Moreover, the high complexity of data itself in the drug discovery field makes pharma and biotech companies struggle with it on several levels at once. Recent research indicated that an overwhelming 90% of pharma companies struggle with complex data infrastructure or high workloads; 88% struggle with integration and compatibility of different AI/ML technologies; 86% struggle with the frequent updates of the toolchain. These numbers indicate that even before the actual ML magic happens, most players have significant difficulties in data management.

In addition to these significant problems, there is also several other pitfalls when working with data for drug discovery:

  • IT security requirements and issues.
  • Lengthy and laborious model deployment cycle.
  • Continuous monitoring of model performance.
  • Duplication of effort and miscommunication between different R&D teams and departments.
  • Allocating and managing ML-related infrastructure.

The successful AI platform for drug discovery should address these problems and guide the data science team through the rough path of scientific data preparation and maintenance. Automation is the key here because manual data management has proven itself ineffective and expensive in the art of drug discovery.

Pharma and biotech companies often think in terms of infrastructure and computing costs for data science departments, but in reality, there is much more than this. The total cost per model application includes not only infrastructure and computations but also the labour required to build, deploy, maintain and manage the AI models and the underlying data. This additional cost could be surprisingly high, especially if there is a lot of manual work involved. In addition to paying the money, pharma companies also pay time for manual procedures in their workflow, which has a pronounced negative effect on overall drug discovery projects.

There is a general consensus that pharma and biotech using AI pipelines need to optimise and automate all processes related to drug discovery ML workflow. Such automation not only prepares the models for production but also allows automated management and quality assessment of the data, seamless re-training of the models using the AutoML pipeline, easy incorporation of custom datasets into preconfigured endpoint placeholders to match the quality of a successful drug. This significantly reduces the “cost per call”, and the data science team can focus on maximising drug safety and efficacy impact instead of performing repetitive management tasks.

Receptor.AI drug discovery platform is built from the ground up to address these issues. It is intended to support the whole R&D team and to minimise human intervention to all routine processes. The platform provides the following major advantages:

AI-automated data management and control:

  • AI-assisted data preparation
  • Data monitoring and quality control
  • Proprietary feature generation and selection

Automated AI model creation and quality control:

  • Project specific AI model architecture selection
  • AI model training, tuning and deploying
  • Pretrained AI drug discovery models and preconfigured AI-powered placeholders to match the quality of a successful drug
  • AI model performance monitoring
  • Collaboration between R&D departments.
  • Tight integration within the single platform.

The system is ambitious in its goals but very pragmatic in terms of providing a success-oriented drug discovery environment. It is streamlining the whole bunch of ML-related processes of the R&D team and provides a rich set of comprehensive drug discovery endpoints in form of pre-trained models and preconfigured AI-placeholders for drug quality compliance. The platform imposes strict limits on what data your drug discovery project relies on and how it is processed to discover safe and effective drugs.

--

--

Receptor.AI Company
Receptor.AI

Official account of RECEPTOR.AI company. We make the cell membranes druggable to provide new treatments for cancer and cardiovascular diseases.