Artificial Intelligence applied to the diagnosis and treatment of rare diseases

That Artificial Intelligence is coming to our daily lives is nothing new. We have all become accustomed to talking to chatbots on different websites, for example, which are based on Artificial Intelligence, which thanks to the most advanced algorithms, the feeding of large databases processed by Big Data and learning processes, are able to replace basic human actions, in many cases with greater accuracy.

Chatbots, applications that make it possible to make the Mona Lisa speak, bring Cervantes back to life or talk to the most fashionable YouTuber are perhaps the most anecdotal and fun part of this technology. But there is much more: Artificial Intelligence is being applied in many, many fields of science, adding technological advantage to the great advances that have already been made in the past (and are being carried out in the present).

Among others, research and development focused on improving the quality of life of patients affected by so-called Rare Diseases. In this field, new advances and possibilities are emerging in the reanalysis and reinterpretation of whole exome sequencing datasets for unsolved rare diseases using machine learning approaches.

We have only just begun, but looking at the results we already have in different publications and scientific papers, we can predict a very promising process.

Rare diseases and AI

Rare diseases affect the lives of 300 million people worldwide. Thanks to rapid advances in bioinformatics and genomics, it is already possible to discover the genetic causes of about 30% of known rare diseases.

New tools and the availability of high-throughput sequencing data have enabled reanalysis of previously undiagnosed patients. In this review, we have systematically compiled the latest advances in the discovery of genetic causes of rare diseases using machine learning methods.

Currently, with the tools available to our scientists, the standard clinical diagnosis of a rare disease, 80% of which is due to genetic causes, can take up to 30 years. However, using genomic technologies and bioinformatics analysis, we are able to discover that genetic exception of 20-30% of rare diseases using high-throughput sequencing (HTS) of the whole exome with a diagnosis rate of 40%.

Figure 1. Overview of machine learning strategies for whole gene sequencing reanalysis from single variant analysis to more complex genomic events (gene-gene interactions). 1. Predicting the impact of sequence alterations/mutations. This strategy consists of predicting the effect of a sequence change on the protein. 2. Variant re-annotation strategies attempt to re-annotate variants following the availability of new information/discoveries. 3. Variants that alter splice isoform frequencies are predicted using methods of this strategy. 4. Differences in protein folding/structure are evaluated in this category. 5. Oligogenic analysis is a strategy for the analysis of digenic (gene pairs) and oligogenic diseases. For each strategy, examples of tools for reanalysis of rare diseases by machine learning are presented. Source:

Although the rate of diagnosis has improved due to HTS, due to these challenges, there are large amounts of unexplored genomic datasets, leading to costly non-diagnosis and lack of actionable information for patients.

As the amount and complexity of genomic data increases, researchers are turning to artificial intelligence (AI) and machine learning (ML) to reanalyze existing data to answer healthcare and research questions. ML is a process by which machines can be given the ability to learn from a set of data. In terms of application to genomics, several areas have been explored to predict from validated data the effect of a mutation/alteration of the genome.

It is important to note that an essential element of reanalysis is data sharing, and therefore, to increase efforts in reanalysis of existing NGS datasets and improve resolution of the causes of rare diseases, researchers and consortia should adhere to the FAIR (findable, accessible, interoperable, and reusable) principle of data sharing.

You can read the entire paper at