Beyond the traditional omics: AI insights into RNA modifications

The patterns of post-transcriptional RNA modifications are complex but AI makes sense of them

Receptor.AI Company
Receptor.AI
Published in
4 min readJul 20, 2021

--

Image from: https://www.genengnews.com/news/most-comprehensive-atlas-of-the-human-transcriptome-built/

The diversity of RNA molecules in cells goes far beyond the textbook description of matrix, transport and ribosomal RNAs. In addition to a reach repertoire of non-coding regulatory RNAs there is a number of post-transcriptional RNA modifications which, play a crucial role in many biological processes. One of the most important modification in methylation of adenosine and formation of N⁶-methyladenosine (m⁶A).

N⁶-methyladenosine

This modification regulates such processes as RNA splicing, translation and RNA stability maintenance, immune response, DNA damage repair and development.

In order to fully understand the regulatory role of N⁶-methyladenosine the whole transcriptome of different cells should be analysed and mapped precisely. However, experimental identification of RNA modifications is tricky and requires advanced and expensive sequencing techniques using nanopores or complex biochemical assays. Thus it is tempting to predict the m⁶A sites by computational techniques. Indeed, AI-based models can analyze existing sequenced transcriptome data and determine the preferred patterns of RNA methylation.

Different techniques were proposed to solve this problem. The most successful of them are based not only on the sequence of particular site but also on physico-chemical properties, contextual information, spacial organization of RNA and the changes of these features in time.

So far most of such models were built for a single species only (mostly for humans or mice), which limits our ability to study evolutionary conserved sites of RNA methylation and to discover fundamental regulatory mechanisms that are shared among different species.

In the recent work, published in the journal Nucleic Acids Research, new AI-based approach for studying RNA methylation is proposed. The authors used a machine learning technique called the multi-task curriculum learning to predict m⁶A sites across different species. This is a learning strategy that trains a group of related tasks simultaneously while sharing information among different tasks.

Training the model on the transcriptomes of different species in such manner seems to be an obvious use of multi-task learning but it appears to be not so simple. The genomes of different species are of different sizes which makes it difficult to train multiple models in parallel. In such situation another ML technique, called a curriculum learning, comes to the rescue. In this technique the learning starting from the relatively easy tasks and gradually proceeds to more difficult ones. In terms of the genomes the algorithm can start form the “basic” genome and continue to the “advanced” ones while keeping the advantages of the multi-task learning. Such sequence of species was determined by the evolutionary distances between them on the phylogenetic tree. The multi-task learning model itself consisted of the shared feature extractor and several splitting classifiers, each corresponding to the prediction task of a species.

Schematic illustration of the ML pipeline. Figure from https://academic.oup.com/nar/article/49/7/3719/6179353#235656307

The authors used three available trascriptomics datasets: sramp17 from the Ensembl database, nano20 from direct nanopore RNA sequencing data, and RMBase v2.0, which contains the m⁶A annotations of seven species obtained on different cell lines.

Application of this AI model allowed to find important correlations between RNA methylation and the binding sites for various RNA-binding proteins. In addition to this a comprehensive analysis of the gene ontology of the genes with highest m⁶A capacities across the species becomes possible. The highest levels of m⁶A content was found in the groups of genes which are associated with structural molecular activity, protein binding, and sequence-specific DNA binding.

The levels of m⁶A content in the groups of genes with different functions across the species. Figure from https://academic.oup.com/view-large/figure/235656293/gkab124fig7.jpg

This work demonstrates how modern machine learning techniques may help in such complex problems as regulation of gene expression by means of post-transcriptional RNA modifications. Despite the huge amounts of genomic and transcriptomic data, their analysis and interpretation remains extremely complex. AI-based predictions of RNA modification sites are now valuable tools for comparing different species and revealing previously unknown correlations between RNA methylation and the binding of regulatory proteins.

The transcriptomics information is an important part of diverse omics data, which are used by the core technology of Receptor.AI — the multiparametric Connectivity-to-Cognitivity knowledge graph. This graph integrates the data about diseases, drugs, targets, ligand, omics and clinical observations from a large variety of sources and provides a solid basis for learning our AI models. Particularly, transcriptomics data is used in our polypharmacology module, which allows us to predict interactions of the leads not only with the single target, but with the whole set of targets relevant for given disease.

--

--

Receptor.AI Company
Receptor.AI

Official account of RECEPTOR.AI company. We make the cell membranes druggable to provide new treatments for cancer and cardiovascular diseases.