What is a Gene?

So what are genes? – Well, they have something to do with DNA. If you don’t know more about genes than that, there’s no shame in it. The term “gene” has always been vague and it has remained so to this day. To illustrate how that is, this article has been divided into two parts: the first, historical part, explains the origin and development of the term gene, while the second part relates to today’s conception of a gene. The aim is to make clear that despite, or perhaps due to the amount of research on this question, not even geneticists today are completely sure what exactly the genes are ultimately about.

Brief historical recapitulation on the concept of genes

The term “gene” was first used in 1909 by the Danish botanist Wilhelm Johannsen. He wrote then

The word gene is fully free from any hypothesis: It only expresses the securely ascertained fact that at least many properties of the organism are conditional on individual, separable and thus independent ‘states’, ‘basis’, ‘dispositions’ found in the gametes – briefly, just what we want to call genes [1]

So, according to Johannsen, a gene should be a pure unit of thought or calculation without having to or even should be speculated about its – let’s call it – “material correlates”. Nevertheless, the past century has been shaped by tracking down this correlate. Even before Johannsen was shown that the cell nucleus, later – more precisely – the chromatin, a structure that owes its name solely to the fact that it can be colored with suitable dyes, is the carrier of genetic information (even though its units were only later called ”genes”). At the turn of the century to the twentieth century it became clear that the stainable substance in the cell nuclei, the chromatin, is identical to the chromosomes, thread-like structures that have been observed for some time with the help of microscopes in cells during cell division. It was therefore clear that these structures do not appear during cell division and then disappear again, but rather remain permanently in the cell nuclei in the form of chromatin and carry the recently rediscovered Mendelian characteristics. Now the hunt for the genetic information and what we had previously called its “correlates” really started.

An outstanding research group in this context was the group led by Thomas Hunt Morgan, who began to study the inheritance of traits with the help of the fruit fly (which can be grown very easily and inexpensively) with enormous numbers of individuals. In their “fly room” at Columbia University they were able to investigate the coupling of genes, i.e. the observation that not every factor is inherited independently of one another (as Mendel had described it), but that certain characteristics often are inherited together. They began to describe how frequently this link was broken and suddenly had the brilliant idea that this could be related to how far from one another the correlates of these traits sit on the chromosomes. With this idea, a linear sequence of genes, and the experimental data on the frequency of their common or separate inheritance, they were now able to generate gene maps, i.e. to localize individual features that can be recognized and described in fruit flies on the chromosomes.

Top row: Thomas Hunt Morgan (reprinted with kind permission from archives.caltech.edu), Calvin Bridges (reprinted with kind permission from archives.caltech.edu) und Edward Lewis (Caltech yearbook, public domain). Bottom: A Drosophila melanogaster genetic linkage map generated by Thomas Hunt Morgan’s Team. This was the first successful gene mapping and provided critical evidence for the Boveri-Sutton Theory of Inheritance. The map shows the relative positions of allelic characteristics on the second Drosophila chromosome. The distance between the genes (map units) are equal to the percentage of crossing-over events that occurs between different alleles. Image and Legend from Twaanders17 for wikimedia.org, reprinted under CC BY-SA 4.0 License. Many thanks.

This insight was followed by a series of further questions, above all the question of the substance, if there is one, the genes. Because chromosomes, as biochemists have analyzed, consist of proteins, i.e. protein molecules, and a relatively simple chemical compound that was called nucleic acid due to its occurrence in the cell nucleus. The common sense in biological sciences back then was that really only proteins could actually be the carriers of genetic information. Proteins can form complex structures and – this was already known at the time – catalyze biochemical reactions, whereas the somewhat cryptic molecules DNA and RNA seemed to have a far too simple structure to be able to encode information in any way.

Another milestone in the history of genetics is an experiment by the British physician Frederick Griffiths from 1928. At that time the pathogen causing the Spanish flu was already known and Griffiths was working on a vaccine. He knew that if he injected the pathogen, a so-called smooth strain of pneumococci, into mice, they would die quickly. Another strain, known as the rough strain, of pneumococci, is harmless and can be injected into mice without causing damage. If you kill the disease-causing smooth strain with heat, then it will also be harmless and the mice injected with it will survive. When, however, he mixed and injected the killed bacteria of the smooth strain together with living bacteria of the previously harmless rough strain, then the mice die. From this, he concluded quite correctly that something in the killed cells can change, or how he called it, transform the living but harmless cells in such a way that they also have a deadly effect. But what it was (from a chemical point of view) that this transformation, i.e. this transfer of information or ability (in this case the pathogenicity) causes, remained unclear.

Frederick Griffith’s experiment demonstrates the “transforming principle“, that is, information can be transferred from one bacterial strain to another. Picture from Madeleine Price Ball, reprinted under CC BY-SA 3.0 License. Many thanks.

It was not until 1944 that an ingenious experiment by three researchers, Oswald Avery, Colin MacLeod and Maclyn McCarty, showed, to the surprise of all experts, that DNA and not protein is the information-carrying molecule. They isolated the cellular components that could be used as information carriers – DNA, RNA, lipids, carbohydrates and the proteins that are still considered favorites – and noticed that only DNA was able to transfer biological properties from one bacterial species to another, i.e. to transform them. So not until now, we have come so far in the history of genetics as I initially assessed the basic knowledge of an exemplary reader: genes, the units of our heredity, have something to do with DNA.

Suddenly DNA was the focus of interest and the race to determine the structure of DNA began. For reasons of economy and because this story has already been told so often, we should only refer to the central outcome this chapter in the history of genetics: in the end it was James Watson and Francis Crick who, in 1953, with the rather problemtic use of an X-ray diffraction pattern generated by Rosalind Franklin, suggested a double helix as Structure of the DNA molecule and hit the mark.

It was now clear that this chemically so simply structured acid, DNA, must be the carrier of the genetic information. And now you also knew their chemical and spatial structure. However, it was still not clear how DNA is able to store and also transport information, in the case of bacteria also “horizontally”, i.e. within one and the same generation, in the case of us animals only over the generations, from parents to their children.

I do not want to go into any more detail here on the work that finally elucidated this mechanism, although this may represent the most important step in the history of genetics. For the most part, it was thanks to Marshall Nirenberg’s research group that, in meticulous work, it actually clarified the way information is stored, i.e. the genetic code, and thus completed our standard model. Always exactly three bases, i.e. units on the DNA (adenine, cytosine, thymine or guanine) code for a specific amino acid from which all our proteins are built. And amazingly, this code is absolutely universal: which three bases are translated into which amino acid is the same in every organism. With this, the most important questions about the nature of information storage and transmission were largely clarified. From these insights, Francis Crick formulated the so-called “Central Dogma of Molecular Biology” a few years later: The flow of information always takes place from DNA (via RNA) to protein, never the other way around; an idea that led research in this area for decades, although it quickly became clear that this idea falls short.

At this point we conclude that over time the idea that a gene corresponds to a delimitable piece of DNA has consolidated. This piece of DNA gets transcribed into RNA in the cell nucleus and this RNA is translated into a protein with a structure defined by the DNA sequence. The reader can find out more about the processes that are so central to any biology in the article DNA-RNA-Protein. But now briefly to the prevailing ideas about genes and their structure today.

The “gene” today: a concept in crisis?

Modern biology often shows some ambivalence; the term gene is a prime example. Partly out of conviction that Johannsen’s agnostics have been overcome, partly because of sheer disregard for it, it has been and still is being discussed quite vividly, what the material correlates of genes are about. And this balancing act between the rapidly growing knowledge about the nature of our genetic makeup and the original gene concept means that today, perhaps more than ever, we are unsure of what to do with this term. For example, Helen Pearson in her 2006 article with the simple and descriptive title “What is a gene?” wrote

The more expert scientists become in molecular genetics, the less easy it is to be sure about what, if anything, a gene actually is. [2]

As has been described in the first section, a “one gene one protein” hypothesis was the result of milestones in the history of genetics until the mid 20th century: a defined section on the DNA strand provides the blueprint for the production of a certain protein, which then has structural or catalyzing functions in the cell. Even if I want to discuss in the following, why this view is shaken so massively that the usefulness of the term gene may have to be questioned in general, I think it is helpful to first give a synopsis from the current doctrine that reflects rudimentary what notions the practicing researcher (or doctor) has of a gene in everyday life:

A gene has a defined beginning and a defined end; thereby also a defined length. The start of a gene is marked by certain characteristic sequences. These sequences form binding sites for the proteins that are involved in making an RNA copy of the segment. Similarly, at the end of a gene there are certain sequences that cause the proteins to fall off the DNA. By definition, although these regulatory elements do not belong to the region that is translated into RNA, they belong to the gene. This also includes elements of DNA that are further away if they have an influence on the regulation of the activity, i.e. the extent to which the gene is read. Such long-range effects of so-called enhancer regions are actually not uncommon in the genome and we are far from having identified them all.

So only part of the gene is actually transcribed into RNA (how this happens is explained here), these parts of the gene are called exons. In between are areas that are left out, i.e. do not end up in the RNA copy (the transcript) of the sequence, the introns. The transcript now consists of a string of all exons of a gene (on average that is about eight). Of this RNA sequence, however, only a part is read and translated as a blueprint for the corresponding protein (this process is also explained in more detail here); this part is accordingly referred to as the coding sequence.

After a certain gene is presented, a very specific protein is built in this way. At the same time as the race for the complete sequencing of the human genome, there were also a lot of large-scale bets on the number of genes (in the sense just described) that humans now have. Initial estimates ranged between 100,000 and 500,000. Interestingly, with increasing insight into the genome, this number has been revised downwards and today we are at around 20,000 genes (even a few fewer than, for example, in the mouse).

Exons, i.e. sections that are translated into RNA sections and then serve as building instructions for proteins, only make up about 1-2% of the genome, which led to the rest of the genome often being referred to as “junk DNA” – reminiscents of the past and now useless garbage. This view also had to be revised substantially and today we estimate that probably over 50% of our genome is at least transcribed into RNA. However, only a small part is then used as a template for protein construction, the rest of the transcripts remain RNA. This insight has made RNA a booming research object in the past few decades. Initially regarded as a boring intermediary between DNA and protein, it is now experiencing a heyday and research projects are strung together that reveal and describe a wide variety of biological functions for RNA.

This also explains to a large extent the problem that we have today with the concept of “gene”. So is it only a gene when it’s ultimately the basis for a protein? Or is everything a gene that is read and encodes a product which then has a function in the organism, may this gene product now consist of protein or RNA? A consortium of leading scientists from this field dealt with this question for two days (with a lot of yelling and scolding according to interviews) in order to come up with a relatively vague definition:

“A locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions” [2]

If one should really agree on this definition, then the number of genes in humans might have to be revised upwards by orders of magnitude and / or the boundaries between the genes become immeasurably blurred. It looks – at least in my opinion – more as if the concept of the gene as a rigid unit is losing more and more of its meaning; it just seems out of date to capture our genetic material. Only the future can show whether it will really go down in the light of more recent research in the insignificance and disappear without replacement and / or whether it will perhaps be superseded by other, similarly intended concepts.


[1] Johannsen, W. (1909): Elemente der exakten Erblichkeitslehre: Deutsche wesentlich erweiterte Ausgabe in fünfundzwanzig Vorlesungen: G. Fischer. Available online at https://books.google.de/books?id=yoBUAAAAMAAJ Translation from Nils Roll-Hansen in Roll-Hansen, Nils. “The Holist Tradition in Twentieth Century Genetics. Wilhelm Johannsen’s Genotype Concept.” The Journal of Physiology 592.Pt 11 (2014): 2431–2438. PMC.

[2] Pearson, Helen (2006): Genetics: What is a gene? In Nature 441 (7092), pp. 398–401. DOI: 10.1038/441398a.

This website is using cookies to improve the user-friendliness. You agree by using the website further. Privacy policy