A LESS THAN FORMAL TREATMENT OF BIOLOGY CONCEPTS FOR THE BUDDING BIOINFORMATICS WANNABE
As a student assistant, I have been learning about bioinformatics for the last three months. During this time, I have found it to be an organization-obsessive rationalist in the house of sciences. It is not enough to get things right; you also have to arrange them in the proper order, and then try to make some sense out of them.
Not so strictly speaking, bioinformatics brings molecular biology and computer science together, often with the help of the unifying power of the mighty Internet. It stores the onslaught of biological data coming out of research labs and provides tools for interpreting and analyzing this data.
Now that I have become the only person knowledgeable in bioinformatics among my artsy friends, I am expected to know everything about biology AND bioinformatics (Boolean expression intended). So for the interest of those inquisitive minds who wish to increase their knowledge of biology, I have attempted to put together an informal list of general biological questions AND their relationship with bioinformatics.
1. What are DNA and RNA?
Deoxyribonucleic Acid (DNA) is found in the nucleus of cells. It carries the genetic information required for growth, development, and replication of an organism. DNA is a double stranded helical molecule made of subunits called nucleotides. Nucleotides are made of one of four bases (Adenine, Cytosine, Guanine and Thymine), deoxyribose (a sugar) and phosphate. DNA sequence is a shorthand notation of the base composition of a DNA molecule.
Ribonucleic Acid (RNA) is a single stranded molecule similar to DNA in structure, but has ribose in place of deoxyribose, and in most cases, has uracil in place of thymine. There are different types of RNAs including messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA). RNA is synthesized from DNA by an enzyme called RNA polymerase and plays a number of important roles, most notably those involved in protein synthesis.
DNA and RNA sequences are fuels of bioinformatics, keeping it busy. Bioinformatics provides tools to store, interpret and analyze sequence data. It can find genes and the changes within them, relate them to one another, find similarity among them, predict their function and much much more.
2. What is the difference between a gene and a genome?
Genes are fundamental units of heredity. Alternatively, a gene can be thought of as a sequence of DNA nucleotides holding information about characteristics and biological function e.g. those that participate in eye or hair color, widow’s peak (v-shaped hair on forehead, I kid you not), height and even diseases like Hemophilia and Huntington’s disease. The complete set of genetic information (DNA) of an organism is called a genome. H.Influenzae was the first lucky organism to have all of its genes (~1700 of them) identified. The human genome project (HGP), which identified a whopping ~30,000 genes in humans, was completed in 2003. (More information about HGP can be found at the US Department of Energy’s Website, http://www.doegenomes.org/)
Genome data lets the fine folks in bioinformatics do what most tabloid reporters do i.e. find out more about the sequence in question, found out what type of relationships it might have, and quite likely produce unflattering pictures of them. Of course, bioinformaticians are expert reporters who gather all current information, dig up ‘old dirt’, and do not talk without some line of concrete evidence. See phylogeny if you really want to know more.
3. What is a model organism?
O.K. so I don’t look like Heidi Klum, but hey, turns out most of my genes are like hers. In fact, Heidi and I also share genes with some of our less sanitary neighbors living in garbage dumps, like fruit flies and mice. In essence, many fundamental biological aspects are conserved in organisms during evolution. However, it is obviously easier to study certain organisms than others. A model organism is therefore used for research in that it can provide insight into the biology of other organisms and also provide some semblance of convenience. In order to become a model you usually have to be small, readily available, easy to manipulate, and a quick breeder – all things that every CV should have. Once hired, models are (what else) manipulated and observed. Some famous names in the modeling world are Escherichia coli (bacteria), Saccharomyces cerevisiae (“baker’s yeast”), Drosophila melanogaster (fruitfly), Mus Musculus (house mouse), and Caenorhabditis elegans (roundworm). And of course, the perk for being a model organism – have all your genes identified! Mind you, they pay the price by being constantly hounded about relationships and the like (genetic ones only).
4. What is phylogeny?
The evolutionary relationship between different organisms is called phylogeny. Phylogenetic relationships can occur at many levels such as genes, proteins and species. Bioinformatics tools are used to find similarity in genes and identify evidence for the existence of common ancestors. These relationships are usually presented in the form of phylogenetic trees in which branches represent the divergence patterns of different organisms or parts of organism (e.g. genes). A web project featuring tree of life is currently in process of completion and can be accessed at http://tolweb.org/tree/phylogeny.html.
5. What are homologous genes/proteins?
Homologous genes/proteins have a common ancestor but they may or may not have a common activity. In bioinformatics, homologs are used to find phylogenetic relationships. Homologs can be searched using bioinformatics tools like BLAST, and Homologene.
6. Where did proteins come from? Isn’t bioinformatics about the genes?
Proteins are just as important as genes for bioinformaticians. Protein data is used to understand protein function, to make three-dimensional structures, to understand protein interactions, to even understand interactions between the cells of an organism. And similar to how kings conveyed messages to each other with the help of transcribers and translators in the old days, proteins are produced from genes via both transcription and translation processes. Specifically, this is a two-step process in which both DNA and RNA are involved to assemble amino acids (the building blocks of proteins).
First, a gene from DNA is transcribed into mRNA. Next, mRNA uses something called ribosome to translate RNA into amino acids, which subsequently join together to form proteins. To understand this process visually, it is perhaps best to see a simple animation developed at University of Nebraska. For the enthusiasts wanting more details, there is a detailed animation featured at ‘Biology’ (textbook) website. (Click on protein synthesis).
7. What is alternative splicing?
A gene is first transcribed as pre-mRNA from the DNA. Briefly, in eukaryotic cells, pre-mRNA has intronic regions (which are considered non-coding) and exonic regions (which are considered to be coding). Intronic regions are removed or spliced from the mRNA, whereas combinations of the exonic regions are retained by the final mRNA product. This splicing process and combinatorial approach can yield different mature mRNAs. In turn, this will lead to the translation of different proteins with potentially different functions – all originating from the same gene. It allows complexity in genomes because one gene actually has the potential to make many proteins. Just think of the possibilities!
8. How can I find a protein sequence or a gene sequence?
There are many searchable databases available to the public for searching a sequence. A comprehensive list of protein and nucleotide databases can be found at Expert Protein Analysis System (Expasy)’s website at www.expasy.org/links.html. Do not let the length of the list scare you. Databases are our friends.
9. What is a mutation?
Any change in the DNA sequence is called a mutation. It may be caused by external factors such as ultraviolet rays or it can be a result of error during DNA or protein synthesis. A change in DNA may lead to the production of incorrect or non-functioning proteins. (Mutations may or may not grant ‘X-Men’ status.) Depending on the importance of the biological function of the protein in the organism, mutations can cause genetic disorders and disease and are, therefore, studied extensively. To read more about different types of mutations, and diseases caused by mutations, visit 2can, the European Bioinformatics Institute’s educational website.
So there you have it—a smattering of some of the concepts that appear frequently when exploring bioinformatics. The list is not complete by any chance. Heck, it will just keep growing once you dive in the field but that’s the whole point of learning – to learn new things (and brush up on stuff you thought you were never going to see again in life). Enjoy the thrill ride!