Genes and Their Properties
From WikiGenetics
Contents |
[edit] What exactly is a gene?
The genetic material, called deoxyribonucleic acid or DNA, is present in most of the cells that make up an organism. DNA is a linear molecule, and long pieces of DNA form structures called chromosomes. A molecule of DNA has a specific shape: it is a double helix in which two separate DNA strands are held together by chemical bonds called hydrogen bonds. Each DNA strand is composed of units called nucleotides, and each nucleotide contains one molecule called a “base.” There are four bases: A, T, C, and G. To form the DNA double helix, the bases of one strand bond to the bases of the other strand. The general rule is that base A specifically binds to base T, and base C specifically binds to base G. When nucleotides (and thus bases) are arranged in a particular way, they form genes. That is, genes are specific segments of DNA located along chromosomes. Because different genes are made up of different sequences of bases, different genes contain different information about the body’s traits and functions. Thus, genes are the basic units of hereditary information. Since chromosomes occur in pairs, genes also occur in pairs; there are two matching genes for each trait.
Like a molecule of DNA, genes are arranged in a specific way. In general, genes have both coding and non-coding sequences. Coding sequences are those sequences that are used as a template for protein synthesis during the process of translation. (See "Products of Genes" below.) The coding sequences of a gene are called exons. Non-coding sequences do not provide information about the protein for which the gene codes, but these sequences often play important roles in regulating gene expression and protein synthesis. There are different types of non-coding sequences within each gene, including introns, promoters, 5’ untranslated regions (5’ UTRs), and 3’ untranslated regions (3’ UTRs).
[edit] Anatomy of a Gene
When describing the structure of a gene, it is common to refer to sequences that are “upstream” or “downstream” from a particular point. This designation relates to the structure of DNA. Because of the base-pairing in a DNA double helix, two DNA strands can come together only in a specific way; the bases of one strand must align with the bases of the other strand. Each end of a DNA strand is designated either the 5’ end or the 3’ end, and to form the DNA double helix, the 5’ end of one strand matches up with the 3’ end of the other strand. Thus, in the DNA double helix, one strand has a 5’ to 3’ orientation, and the other strand has a 3’ to 5’ orientation. In terms of gene structure, sequences described as “upstream” are towards the 5’ end of the DNA strand. On the other hand, sequences described as “downstream” are towards the 3’ end of the DNA strand. For a particular gene, the promoter is upstream of the exons and introns. (Insert figure)
For most genes, only a small portion of the gene is interpreted to give the final product, such as a protein. (See "Products of Genes" below.) In general, the coding sequences, called exons, are separated by intervening, non-coding sequences called introns. Whereas exons contain genetic information that will be used to make the protein, introns do not. Both exonic and intronic sequences are transcribed into an RNA molecule (see "Products of Genes" below), but only exons contribute to the protein. Before being translated into a protein, the RNA undergoes splicing, a process whereby the intronic RNA sequences are removed and discarded and the exonic RNA sequences are joined together to give a shorter, “mature” messenger RNA (mRNA) molecule. Splicing occurs specifically at the exon/intron boundaries; these boundaries are called the splice junctions, and they are characterized by particular sequences. Generally, intronic DNA sequences start with GT and end with AG. A third intronic sequence important for proper splicing is called the branch site. This site is usually at the downstream end of an intron, and an A is located at this site. Although only exons contribute to the final protein product, mutations are not limited to exons. DNA sequence changes can occur anywhere in a gene – in exons, in introns and splice sites, and in other non-coding sequences. Mutations in different parts of a gene can have various effects. Some mutations are pathogenic and affect the protein for which the gene codes; however, other mutations are non-pathogenic.
[[Image: | The DNA sequence of a gene and its corresponding RNA sequence.]]
Like introns, the promoter of a gene also does not code for protein. However, the promoter is very important for regulating whether a gene is expressed or not expressed. When a gene is expressed (turned “on”), the gene is transcribed into an mRNA molecule. When a gene is not expressed (turned “off”), the gene is not transcribed; an mRNA molecule is not made. In most cases, the promoter region is located upstream of the exons and introns, within about 200 base pairs of the transcription start site. Promoter regions contain different sequences; some of these sequences contribute to the core promoter, and some of these sequences contribute to the proximal promoter. When proteins called transcription factors bind to specific sequences within the promoter, transcription is initiated and the gene is expressed. Generally, transcription factors that bind to core promoter sequences are sufficient for initiating transcription. Transcription factors that bind to proximal promoter sequences modulate the level of transcription. Because different types of cells have different transcription factors, various genes are expressed or not expressed in each cell.
Two other components of genes are the 5’ UTR and 3’UTR. The 5’ UTR is located at the upstream end of the first exon, and the 3’ UTR is located at the downstream end of the last exon. Similar to introns, these sequences are transcribed into mRNA but do not contribute to the final protein product. However, they are thought to play a key role in regulating translation of mRNA into protein.
In addition to promoters, 5’UTRs, and 3’ UTRs, there are other DNA sequences that affect gene expression, even though there are not contained with the gene itself. These sequences are called enhancers and silencers. Like promoters, enhancers and silencers regulate transcription; however, these sequences are located at variable distances from the gene(s) they regulate. It is thought that once proteins bind to the enhancer or silencer, the DNA between the promoter and the enhancer or silencer loops out, which enables the proteins bound to the enhancer or silencer to interact with the proteins bound to the promoter. Whereas enhancers increase transcription, silencers decrease transcription.
[edit] Genotype to Phenotype
DNA is the encyclopedia that defines each individual human, animal, plant, or any other organism. DNA is made up of genes that can dictate how an organism will develop and how they will function. DNA (and therefore also genes) are made up of millions of nucleotides (also called bases). These are the letters of the encyclopedia. Certain nucleotides together will read as an amino acid; these have meaning to the cells within an organism and will act in a certain way. Just as certain letters strung together create words which each have different meanings when read together. An organism’s specific code of nucleotides that create their DNA is considered their genotype. Genotypes can differ because of changes in nucleotides that have no effect on the organism, or because changes to some of the nucleotides had a great effect. The way in which an organism develops and functions, looks, acts, and grows is its phenotype.
For many genetic diseases, a specific change in a gene or a section of DNA will cause a certain genotype, which often has an expected phenotype. For example, a change in the amount of DNA resulting in an extra (third) copy of chromosome 21 is the genotype “trisomy 21”. This change in DNA has a corresponding phenotype that is very specific and recognizable to many people, it is called Down syndrome.
There are many genetic conditions, however, that do not have a clear correlation between the change in the genotype and the change in the phenotype. The expression of a change or mutation in a genotype can be modified by the effects of age, by other genes, and by effects of the environment. Some major types of variations in the expression (phenotype) of a change or mutation in a genotype are reduced penetrance, variable expressivity, and pleiotropy. The idea of penetrance is the probability that a gene will be expressed at all. Many genes are said to be ‘fully penetrant,’ which means that a person who inherits that gene is guaranteed to express it and have the associated phenotype. When a person has inherited a genotype that is known to have an associated phenotype but completely fails to express it, the gene is said to have reduced penetrance. In a family where a gene like this is inherited, there can be people in each generation with the gene, but only some of them will express the associated phenotype. Pleiotropy is a type of variable expression. A gene that causes many different phenotypes in the people that inherit it is a pleiotropic gene. Persons that inherit such a gene will all express it, just in a variety of ways; this can sometimes create a false impression of multiple conditions. An example of a genetic condition with pleiotropy is Marfan syndrome. Persons affected by Marfan syndrome can have a combination of some or many of the expressions of the changed gene including effects on the eyes, height, the heart, the vasculature and other connective tissue in the body. Variable expressivity is exactly as it sounds; persons with the same gene or changed gene in their genotypes can have phenotypes that vary. An example of variable expressivity is with a genetic condition called cystic fibrosis that requires two inherited mutations in the CFTR gene to have the associated disease phenotype. Cystic fibrosis is a common genetic disease in the Caucasian/Northern European populations in which there are over 1000 known mutations in the CFTR gene, but the different mutations (genotypes) do not clearly correspond with how significantly the person is affected by cystic fibrosis (phenotype). Essentially, persons with cystic fibrosis who are and who are not related can all have inherited the same exact mutations, but can all present clinically with very different symptoms and degrees in severity of cystic fibrosis.
The correlation of genotype to phenotype can be straightforward and it can also be very complicated; this relationship is dependent on the gene or genes inherited, any changes in those genes, and the effects of other genes and the environment. Many genetic conditions have an expected correlation of how the related genotype is expressed and how the changes or mutations in the gene may or may not predict how it is expressed.
[edit] Introns, Exons, and Differential Expression
[edit] Regulation of Genes and Templates
[edit] Products of Genes
Much of the genetic information inherited is stored in DNA. There is a general one-directional flow of genetic information from DNA to RNA to proteins. Proteins are necessary throughout the body for many reasons including the structure and function of all cells. The purpose of DNA is to hold the directions to create all of the different proteins the body can make. When a protein is needed, the DNA can be used as a template to make the protein needed using two processes, the first process is called transcription and the second is called translation.
DNA is a double-stranded unit of composed of four different types of bases; adenine, thymine, guanine, and cytosine. The bases, also called nucleotides, are commonly abbreviated by their first letter, adenine is “A,” cytosine is “C,” etc. The bases also have a specific pairing pattern of A's and T's together and with C's and G's together between the two strands of DNA
To be able to understand the code of the bases in DNA, a two-step process must be done to generate the needed proteins. The first process is called transcription. Transcription is a process that occurs in the nucleus of the cell, as the nucleus is where DNA is located. The double-stranded DNA passes through enzymes called RNA polymerases which are able to unwind the double helix structure and make a single-stranded RNA. This RNA is a single-stranded unit of bases that match the code from the DNA. Through the process of transcription, the RNA polymerase scans along the double-stranded DNA looking for the bases that code for a promoter region. This is the signal for the RNA polymerase to begin generating a single-stranded RNA transcription of the DNA. There will also be a terminating region to stop the RNA from transcribing any further down the DNA strand. Within the area of the promoter region there may also be coding for regulatory elements; these are regions that can be used to increase or decrease or stop the process of transcription by acting on the promoter. The product of transcription, called an RNA transcript, can play different functional roles depending on the code it was generated from. Some RNA transcripts are meant to continue on in the process to become proteins in the body and must next go through translation. Other RNA transcripts are used to help facilitate the translation process. The RNA transcript that will become a protein is called ‘messenger RNA’ and is often abbreviated mRNA. This mRNA transcript can undergo changes in length and also have sections removed before going through translation. Those processes of modifying the mRNA allows for a more precise product to be created. Translation involves reading the code of bases of the mRNA strand and is done outside of the nucleus of the cell. The code of bases, A, T, C, and G, are read in groups of three along the mRNA strand in the 5’ to 3’ direction, these groups of three bases are called ‘codons.’ The process of translation is to use the pattern of codons to generate a polypeptide sequence using a complex of ribosomes and translation-facilitating RNA units. There are multiple combinations of bases that will be different codons that will code for the same polypeptide (also called amino acid). As each codon is read, the corresponding polypeptide is added, creating a chain of polypeptides that matches the mRNA codons. As the complex reads the mRNA strand, it will come to a codon that acts as a start site for translation, this is the point at which the polypeptide chain is initiated. Later along the mRNA strand, there will be a group of 3 bases that acts as a stop codon, ending translation and ending the polypeptide sequence. These codons are important as they regulate which sections of the mRNA transcript will be translated into protein. A polypeptide (amino acid) chain is the basic structure for proteins. The polypeptide chain translated from the mRNA strand can undergo further modifications inside and outside the cell that will further specialize it for a specific function in the body. The process of translation is how the cell uses the language of the DNA that has been transcribed into mRNA to synthesize a corresponding polypeptide chain which can be further modified into a more complex protein.
The processes involved in using the DNA code to generate proteins required in cells throughout the body are complex and require many regulatory and accessory units to facilitate them. Problems at any point in these processes or with any parts of the DNA, the bases, the RNA polymerase, the mRNA, the ribosome complex, or creating the polypeptide chain could create a variety of complications within and outside of the cells.
[edit] References
[edit] Genes
Brooker, R.J. (2005). Translation of mRNA. Genetics Analysis & Principles, 2. (pp. 334-353). Boston, Massachusetts: McGraw-Hill.
Nussbaum, R.L., McInnes, R.R., Willard, H.F., and Boerkoel, C.F. (2004). Patterns of Single-Gene Inheritance. Thompson and Thompson Genetics in Medicine, 6. (pp. 51-63). Philadelphia, Pennsylvania: Saunders.
Strachan, T. and Read, A.P. (2004). DNA Structure and Gene Expression. Human Molecular Genetics 3. (pp. 8-24). New York: Garland Publishing.
Strachan, T. and Read, A.P. (2004). Human Gene Expression. Human Molecular Genetics 3. (pp. 277-291). New York: Garland Publishing.
[edit] See Also
[edit] External Links
Wikipedia entry on the gene.
