The genetic information for the development and functioning of all living organisms is contained in their DNA molecules. These molecules store and transport information, in a structured form, with the orders and rules that mark the functioning of the cellular machinery, which we can understand today

What is it?

Similar to any data storage medium, this information is recorded in the form of a code in the DNA.

DNA consists of two long chains joined together in the form of a double helix with a chemical backbone composed of sequences of four repeating building blocks, called bases, whose names are Adenine, Guanine, Cytosine and Thymine. We can assimilate the appearance of each of these bases in the DNA sequence to a letter in the genetic code, abbreviated as A, G, C and T. And we can take these four letters as those that make up the words of the cellular language.

DNA is a very stable and large molecule. This is an advantage for preserving the information contained, but does not make it functional for communicating it to the rest of the cell since the DNA is confined to the cell nucleus. To transmit the information contained in DNA, cells make a copy of this information, already fragmented, to another analogous molecule with slight chemical modifications, the messenger RNA (mRNA). In the MRNA. Thymine (T) is replaced by a very similar molecule, the Uracil (U). So its letters become A, G, C and U. Smaller and single-stranded, the messenger RNA is able to navigate through the cell to specific organs where the instructions contained in the transmitted information are executed: the ribosomes. These instructions are none other than those necessary for the creation of proteins and the ribosomes are able to interpret them and manufacture these proteins.

Proteins, like DNA, are long chains composed of repeated chemical building blocks called amino acids (in this case there are 20). Within DNA and RNA there are combinations of 3 bases grouped together that inform the ribosome of the specific amino acid to be added to the chain of a protein, equivalent to a word.

For example, if the genetic code read has the sequence UGC, the ribosome will add one unit of the amino acid cysteine to the protein, but if a CGA group is read, an arginine will be added. Other base groups will indicate when the protein sequence begins and ends.

The fact that we understand this code of information transmission in living beings, together with the advanced knowledge of the physico-chemistry of these molecules, makes it a process that can be analyzed with computer tools to search for applications. One of them is to create small molecules with structures similar to DNA and RNA, the so-called oligonucleotides, which with this same cellular language are able to interfere or give instructions to the cells in a controlled way so that they behave in a certain way. Despite being a complex task, nowadays, the use of information technologies, the application of advanced algorithms and the irruption of artificial intelligence offer incredibly effective tools for the design of these molecules, seeking from the outset the design of safer and more effective drugs.