We have reached a stage where the amount of studies that describe the function of a particular gene in a particular model organism and under particular conditions has become unmanageable. At the same time, articles that describe a completely new cellular mechanism that may occur in all animal and plant cells, but has remained hidden so far, became extremely rare. That's why the two articles that appeared in the journal Nature last week are so special. And to understand what makes them special, we need to have a closer look on a paradoxical observation in the field of genetics that has become more prevalent over the last few years.

Genetic engineering methods such as CRISPR nowadays allow the targeted mutation of a gene, so that the function of the encoded protein is completely lost. This is called a genetic knock-out. Other methods, such as the use of RNA interference, allow the downregulation of gene function by blocking the mRNA transcript of the corresponding gene or initiate their degradation. As a result, the translation of these mRNA molecules is disturbed, so that much less of the corresponding protein is formed. However, a few functional molecules remain, which is why we call those methods knock-down procedures. Absurdly, however, it was repeatedly observed that these knock-down approaches lead to a stronger effect than the previously mentioned knock-out approaches, which nevertheless lead to a complete loss of the protein under investigation. How is that possible? This remained a mystery for quite some time…

 

What are non-sense mutations

To study the function of a particular gene, researchers often knock it out to see what effect this has on the system under investigation. In the 21st century, usually CRISPR or other nucleases that can be targeted against a particular sequence of the gene are used to achieve this. When thinking about the genomic location to target, most researchers picked an area that lies relatively close to the beginning of the so-called coding sequence. The coding sequence is the part of the gene that directly represent the plan to build a protein in the form of the famous triplet code. The triplet code and therefore the so-called reading frame is always determined by the start codon ATG, that is to say from this sequence of ATG the following part of the gene is always read in three-letter words. Letting go of a programmable nuclease on such a sequence often leads to the loss of a few letters. That can have a huge impact on the reading frame. Here's an example. Let's look at the following (admittedly only moderately meaningful) text:

the fat cat sat top mat and big dog ran bit


The "the" is now representative of the ATG and therefore provides the reading frame, i.e. from there on, three letters are always taken as a word. Now if we have a mutation (eg generated by CRISPR) in which only the C of "cat" has been lost, then that changes our nice little text as follows:

the fat ats att opm ata ndb igd ogr anb it


The little bit of sense, which we could perhaps extract from the text before, is now completely gone. In such a case we are talking about a missense mutation because the triplets that follow the mutation now code for completely different amino acids and the protein product of the mutated gene has nothing to do with the product of the non-mutated gene.


Now there is not only the defined start codon ATG in the genetic code, but also defined STOP codons, actually three of them: UAG, UAA and UGA. All three of these triplets signal the ribosomes, the protein factories, to stop their work and terminate the protein chain here. If we assume in our cat text genetics BIT and ATA are such STOP codons. Our mutation, the loss of the single Cs, has led to our mutant protein being severely shortened because ATA appears relatively quickly in the mutated reading frame. In this case, we would call the ATA a premature STOP codon and mutations that cause such PTCs are called non-sense mutations.

 

Generating Mutants with CRISPR

Such non-sense mutations can either be of natural origin, or someone has "helped" for research purposes and has mutated a gene in cultured cells or a model animal. This is just a matter of probability. A CRISPR system, for example, without any additional bells and whistles creates a double strand break on the DNA. This can be repaired by the cell, but often small errors happen. Quite by chance, either a pair of base pairs are lost or a few are added. Each of these deletions or insertions, which consist of exactly three or a multiple of three base pairs do change individual amino acids but they do not shift the reading frame, which is why the protein is usually still relatively functional. In two-thirds of the deletions or insertions (collectively also called "indels"), however, the reading frame shifts. The probability of a any of these altered triplets being a STOP codon is quite high. Cells in which this resulted in a non-sense mutation are then selected to study in detail the loss of function of the corresponding gene ("knock-out"). 

In fact, this type of generation of mutations has only become possible in the last one to two decades. Before that, other techniques had been used, but they are often able to downregulate a gene and not switch it off completely. So the research field was thrilled with the possibility of seeing much clearer and stronger effects by completely paralyzing the gene. However, as I have already mentioned, often quite the opposite was true. The knock-down led to more dramatic effects than the knock-out. How can that be? 

 

Non-sense mediated mRNA decay

The first step in understanding this paradox is non-sense mediated mRNA decay (or NMD for short). In mammals, most so-called primary transcripts, that is, a crude RNA version of the transcript of a gene, are first spliced ​​in the nucleus before they are transported to the cytoplasm, where they serve as a template for the construction of a protein. Splicing removes certain parts of the primary transcript, introns, and joins the remaining exons that contain the coding sequence. At all those sites where introns were removed, small protein complexes at the exon junctions remain. These protein complexes, called exon junction complexes (EJCs), are recognized and bound by so called up-frameshift (Upf) proteins immediately after the RNA leaves the nucleus. In a first round of translation, i.e. reading the RNA through the ribosome linked to the protein synthesis, all EJCs are removed as the ribosome passes them. The ribosome does not fall off the RNA until a STOP codon is detected and stops protein synthesis. Normally, such a STOP codon appears only in the last exon and thus there are no more EJCs behind it. However, if the ribosome falls off from the RNA by a premature STOP codon, it will not remove the EJCs that follows. Those EJCs that remain on the RNA are then rapidly recruiting the RNA degradation machinery. The non-sense mutation (unless it is just in the last exon) thus leads to the RNA being very unstable and rapidly degraded. 

So far so good. The NMD mechanism has been known for some time, but with it alone one cannot explain why the mutation leads to a weaker phenotype. And now we come to the two works that have recently appeared in Nature

 

NMD Induces Genetic Compensation 

The two new articles (again: here and here) have shown that recruiting the RNA degradation machinery is not the only thing the Upf proteins do. In fact, when the RNA is chopped into pieces, one of these proteins, Upf3a, grabs a bit of it and travel back to the nucleus. There they work together with a large protein complex called COMPASS. COMPASS plays an important role in the epigenetic modification of DNA, more specifically, this complex attaches a certain chemical label to the promoter regions of genes that are to be read. COMPASS can work together with the Upf proteins as they bring a piece of the RNA that has just been degraded back to the nucleus. Together, they can now scan across the genome and find sequences that are similar to the sequence they are carrying. Such sequences indicate a certain relationship between a gene and the very gene whose RNA was degraded due to the mutation and whose function is therefore lacking. By activating precisely such genes functionally related genes can be upregulated, i.e. transcribed. And that is what balances the function that was affected by our mutation.

 

Ergo: Design your CRISPR carefully!

Now what does this mean for the researchers? Well, you should just think really well if you design your genetic tool, e.g. your CRISPR / Cas9 system in terms of selecting the sequence you want to target. For example, if you want to recreate a single known point mutation that causes disease in humans in an animal model, then you should try to introduce exactly that mutation. If this causes a genetic compensation in the model animal, then this probably also takes place in humans and you might have produced a very good disease model. However, if you want to study the complete loss of function of a gene and do not want to cause any compensation mechanisms, then you should not just create a premature STOP codon in one of the first exons. An alternative would be, for example, to program the CRISPR system so that a complete exon on which key portions of the protein are encoded is lost. Thanks to these two papers, quite a few research laboratories around the world are currently revising their CRISPR knock-out strategy.