If you look at the components of cells or the entire (e.g. human) organism, you will essentially find a relatively manageable spectrum of substance classes. As is well known, water is the most common molecule in the human body, so all other compounds act in this watery environment. In addition to some ions, the following are larger compounds:
- Lipids, i.e. fats, which are especially important because they form our cell membranes, the outer boundary layers of cells
- Carbohydrates, i.e. sugar, are primarily energy suppliers and stores
- Proteins, the most diverse class of substances; they form scaffolding and other structures; above all, protein complexes also catalyze biological reactions, in this case they are called enzymes
- Nucleic acids that exist in two forms: deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)
Lipids and carbohydrates are not discussed further here, but their function is explained in the background article on cell biology. So we turn to the latter two substance classes: nucleic acids and proteins.
Structure and Replication of DNA
The name deoxyribonucleic acid arises on the one hand from its discovery context (the Tübingen chemist Friedrich Miescher first isolated the substance from cell nuclei in 1869 and accordingly called it nuclein), and on the other hand from its chemical structure. The backbone of the DNA, the spiral of the stairs, so to speak, is formed by a special sugar, deoxyribose, whose ring-shaped units are linked to one another by phosphate groups. This backbone is always the same: in all parts of the genome, in all humans, even in all animals and plants known to date.
The “sequence” of the DNA and thus its individual uniqueness results from the sequence of the rungs of the spiral staircase, the so-called nucleotides. These rungs are formed from two nucleotides that are opposite each other. As can be seen in Figure 1, adenine always pairs with thymine (they form two hydrogen bonds between each other) and cytosine always pairs with guanine (they form three hydrogen bonds between each other). The abbreviations for these four nucleotides are A, T, C and G. In this way you can specify a DNA sequence as a sequence of these four letters, e.g. ACGTGTGCATGTCTGA. Since the nucleotides always have the same partner, the opposite strand of the double helix is clearly determined and is therefore usually not specified. The nucleotides thus follow one another in two parallel strands, forming hydrogen bonds with the respective nucleotides of the opposite strand. These strands are wound into the famous double helix. But that’s not all. In order to get from this double helix to the chromosomes visible with a light microscope, this DNA thread is twisted several times; more on this in the article on the genome.
Instead, we want to dedicate ourselves to the replication of the DNA, i.e. the process in which the DNA strands are copied so that after cell division the two daughter cells again contain the entire genetic code. For this purpose I would like to take a short historical look back and present a piece of work that has already been described by many others as the “most beautiful experiment in biology”.
In the last paragraph of the article in which James Watson and Francis Crick proposed the double helix as the structure of DNA in 1953, they write:
It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. [2]
Nevertheless, it took another 5 years before Matthew Meselson and Franklin Stahl discovered the principle of DNA replication in an ingenious experiment. They let bacteria grow in a special sewing solution that contains the heavier nitrogen isotope N15. DNA contains so much nitrogen that bacteria that have built the heavy isotope into their genome can be separated by centrifugation from bacteria that use the normal form of nitrogen N14 in their DNA. When placing the bacteria in a normal (N14 containing) nutrient medium, after the first cell division they received a generation of bacteria with medium-heavy DNA, after the second cell division the bacterial population split into two groups: one half had medium-heavy DNA (50% N15, 50% N14) whereas the other half only seemed to have the normal N14 built into their DNA. This indicated that DNA is replicated semi-conservatively, i.e. that the strands are separated and the complementary strand is rebuilt for each mother strand. Both daughter cells now receive a strand that was already present in the mother cell and a newly synthesized strand.
Several protein complexes are involved in this semiconservative replication, of which only two should be mentioned here: so-called helicases separate the two double strands from one another, upon which DNA polymerases build the new corresponding counterstrands along the two parental DNA strands. Biologists describes all enzymes that build (polymerize) nucleic acids as polymerases. Corresponding to the two types of nucleic acids, a distinction is therefore made between DNA polymerases and RNA polymerases. The latter will be discussed in the following section.
Transcription: from DNA to mRNA
In the previous section, we briefly discussed regulatory sections that control the activity of genes, i.e. when they are being “read”. This includes areas located immediately before the gene (promoters) and areas that are a considerable distance before the gene, in the middle of the gene, or even far behind the gene (enhancers). These promoters and enhancers represent binding sites for certain proteins (transcription factors), some of which are only present in the cell nucleus under certain conditions. This regulates where, when and to what extent the corresponding gene is read (expressed). RNA polymerases are then making an RNA copy or transcript of the genes that are currently “needed”. But what does that mean, an RNA transcript?
From a chemical point of view, RNA differs from DNA in only two ways: the sugar component of RNA has an additional oxygen atom (this is why DNA is called deoxyribonucleic acid based on the lack of this oxygen) and the nucleotide Thymine has been replaced by the similar nucleotide Uracil. Accordingly, an RNA code could be, for example, AGCUGCUAAGCUG. This RNA molecule, which was generated as the oppsite strand from a DNA sequence, is now transported out of the cell nucleus into the cytoplasm, where it serves as a template for the construction of a protein. Because of this messenger function, this form of RNA is also called messenger RNA (mRNA for short).
RNA – a long underestimated molecule
So the genes, as DNA segments, are safely stored in the genome of the cell nucleus. The ultimate product of many genes are proteins, of which at any time a different number and composition are required, depending on the condition of the cell. The mRNA thus represents the mediator; they are the copies of all those genes whose protein blueprint is to be sent out into the cytoplasm at a given time. For a long time it was believed that the function of RNA as a chemical compound was essentially limited to this transmission capacity. One could hardly be any more wrong.
Elsewhere I have mentioned that probably over 50% of the genome is transcribed, although only 1-2% actually code for proteins. This means that there is an enormous variety of transcripts floating around in our cells, which – we can assume – perform a quite broad spectrum of functions. Many of these non-protein-coding RNA molecules (non-coding RNAs, ncRNAs for short) are involved in regulating the activity of genes. Like many proteins, they can bind to the DNA and increase or decrease the accessibility of the gene for the RNA polymerase, whereby the gene is read more or less. Two other forms of functionally active RNA molecules are explained in the following section when it comes to explaining how a protein is built from mRNA.
Translation: from RNA to Protein – The Genetic Code
So far I have only described how a section of one type of nucleic acid (DNA) is transcribed into a piece of another nucleic acid (RNA), which we called transcription. In the next step, however, the really magical happens: according to the mRNA template, in the cytoplasm a very specific protein is built, i.e. a molecule of a completely different class of substances.
Proteins are made up of amino acids, the exact chemical description of which seems unimportant here. It is, however, relevant for our purposes to understand that all 20 amino acids that make up our proteins have a section in which they are all the same. At this end of the molecule, the amino acids are chained together as the protein is built. But even more important for the protein function is the specific part of the molecule that protrudes outwards away from the chain. Here the amino acids differ, some are charged, others electrically neutral; some orientate themselves preferentially towards the water, others away from the water (one speaks of hydrophilic and hydrophobic residues). Hydrophobic parts of a protein therefore clump e.g. together (in order to be exposed as little as possible to the surrounding water), which gives the protein an individual structure. Some proteins form helical structures; these occur primarily with proteins whose special task is to provide tensile strength or stability, for example when it comes to the keratin of hair or fingernails. Other proteins fold in such a way that some parts of it can quickly accept and release chemical groups; such proteins play a central role as enzymes in our metabolism (e.g. in the ubiquitous breakdown of sugar for energy production).
But back to the production of a protein: how is the mRNA translated into a sequence of amino acids? The main actor in this process is the ribosome, a complex of protein and ribosomal RNA (rRNA for short). Another class of RNA molecules also plays a decisive role here, transfer RNA (tRNA). And the last component that needs to be introduced here is the central magic formula, the translation table: the genetic code. We have not yet discussed the form in which the four nucleotides of DNA or RNA encode the 20 different amino acids. If two nucleotides were to code for an amino acid, there would be 2 ^ 4, i.e. 16 possibilities. That would mean we could only code for 16 different amino acids. In order to map 20 different amino acids clearly on the DNA or RNA, coding in triplets is required. Always three consecutive nucleotides code for a specific amino acid. Since this gives 3 ^ 4, i.e. 64 different possibilities with four different nucleotides, there are some amino acids that can be coded by different triplets. Interestingly, the genetic code is universal: it is (essentially) the same in all animals and even plants. This universal translation is often represented in what is known as a code sun, as shown in Figure 3.
The mRNA comes from the nucleus into the surrounding cytoplasm. This is full of ribosomes, which immediately begin to bind to the mRNA and scan the sequence as they run along it. As soon as they come across the sequence AUG, the universal start codon (a codon denotes a triplet on the mRNA), they begin your work. In doing so, they bind a tRNA, which is specific for every possible codon. The tRNA is a clover-shaped piece of RNA at one end of which is the corresponding counterpart for a specific codon (the anticodon), while the corresponding amino acid is at the other end. The tRNA with the anticodon UAC (and thus the counterpart for our start codon AUG) is always loaded with the amino acid methionine. And since AUG represents the start codon for every protein, the first amino acid in the structure of a protein is always methionine (even if this methionine is often removed later).
From here, the ribosome slides one triplet further and “catches” the corresponding tRNA again, which then specifically transports the corresponding amino acid to the next codon. As you slide on, the amino acids are linked by means of so-called peptide bonds, creating a growing chain of amino acids that forms the protein. The respective tRNA, then discarded of its amino acid, is released, whereupon it can be loaded again with its specific amino acid by certain specialized enzymes.
While the amino acid chain is still growing, the electrical and chemical properties of the incorporated amino acids (which can, for example, have hydrophobic or hydrophilic sections) act and cause the protein to fold in a specific way. For example, a transcription factor can be formed, i.e. a protein that is involved in reading genes on the DNA. And this is how the circle starts all over again …
[1] Picture modified according to CC BY-SA 3.0 from Zephyris, original: https://en.wikipedia.org/wiki/DNA (8.2.2016)
[2] Watson, J. D.; Crick, F. H. C. (1953): Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid. In Nature 171 (4356), pp. 737–738. DOI: 10.1038/171737a0.