Dr. PI’s
Mtbook a microbial genome in a virtual book
Short description
Dr. PI‘s Mtbook is web-project that displays complex genomic information in a novel way. It solves the problem of how to present the plethora of information, that is deduced from the genomic DNA-sequence of an organism. In Dr. PI’s Mtbook genomic sequence of Mycobacterium tuberculosis (Mtb, the tubercle bacillus) and its deduced information is packed into the format of a virtual book. Four levels of information are displayed and tighltly interconnected: DNA, map, protein and function.
- The genomic DNA-sequence is displayed with it’s more than 4 million nucleotides in over 1000 virtual pages, identifying segments that contain information for genes by specific coloring.
-
Map-pictures are dynamically created to show the context of multiple genes, their position, size and function.
-
Protein-pages give a description of each encoded gene and links to external databases for more detailed information.
-
Function-lists show in which functional categories genes are grouped.
The new and user friendly look, allows to understand the concept of genetic information even for lay people, such as gymnasial students. Researchers in the third world will have easy offline access to the genome, Researchers and students in the developed countries can profit from the easy online acess. The “nonscientific” display as a book and the color coding of DNA-segments encoding genes, together with the intuitive clickable maps give full control to navigate the genome. No more complex databases with cryptic searches.
Dr. PI’s Mtbook is a pilot project that could be adapted to similarly display information from any other organism, making the genomic information easier accessible for a wider audience. By using a combination of appealing html-design and creative java-scripting a combination of easy display and flexible linking of genomic information is obtained.
Dr. PI’s Mtbook, a microbial genome in a virtual book
Detailed description
What about is Dr. PI‘s Mtbook?
Dr. PI’s Mtbook is a pilot project that puts the complete genetic information of Mycobacterium tuberculosis, the tubercle bacillus, into an easily accessible and intuitively understandable form. The potential of Dr. PI‘s Mtbook lies in it‘s easy adaptability to any genomic information, replacing the search through the jungle of genomic information databases with a convenient walk through the library of virtual genome-books. The novel format of a virtual book combines the easy access to specific pages of a real book with the immediate lookup potential of the internet. Dr. PI’s Mtbook is made up of four parts. DNA, map, genes and functions. Subsequently I discuss, how each of this parts of Dr. PI‘s Mtbook distinguishes itself from the conventional way of displaying genomic information and what the benefits of the specific display are.
The site: Dr. PI‘s Mtbook can be accessed at http://www.drpi.ch/mtbook. Alternatively, it can also be put in its entirety with all its functions onto a CD-ROM, allowing for easy offline access. Upon entering the site, a book cover is displayed, suggesting that this book contains the complete genomic information. Now enter the site to see the DNA-level! By clicking on the top line on „Dr. PI‘s Mtbook“ a navigation bookmark with links to the main sections of the site appears as a stand-alone window. This bookmark is always available while surfing the microbial genome.
DNA: The genetic information within the bacterium is contained on a circular chromosome with 4‘411‘531 basepairs of double stranded DNA. In Dr. PI‘s Mtbook this information is displayed on 1362 book pages, each with 54 lines of 60 characters, displaying 3240 nucleotides per page. This portioning of the genome into clearly identified and adressable segments makes navigation in the genome much easier than previously used numberings like nucleotide numbers, gene names etc. Each page is clearly numbered at the top right corner and can be printed in its entirety. The viewed file contains 10 pages, 9 of which are hidden. They can be accessed without reloading the file quickly by clicking the left or right navigation-arrow in the top line. Upon reaching the last page of a file, the next file is automatically loaded. Jumping page by page through the genome is therefore intuitive and simple.
-
The DNA segments that code for a protein are labeled with a specific color. These “open reading frames” (ORFs) always start with a Start codon (e.g. TTG or ATG) and always end with a stop codon (e.g. TAG, TGA, TAA). The ORFs in Dr. PI‘s Mtbook are colored in blue. Starting codons are underlined, stop codons are labeled by strike-through. Noncoding segments are distinguished by their italic typing and the light grey color.
-
The name of encoded genes, according to the sanger terminology is marked at the right page border (e.g. Rv0001). They link directly to the description of the specific gene (see below).
-
Sometimes the end of a gene overlaps with the beginning of the next genes. These segments are difficult to identify in the conventional display, where each gene is shown as a separate entry in a database. In Dr. PI‘s Mtbook overlapping genes are easily spotted by the different coloring of the specific segment. (e.g. Rv003/Rv004 on page 2).
-
DNA-Segments that encode an RNA-gene are labeled in green italics (e.g. ileT and alaT on page 3).
-
Some genes within the genome are oriented in the opposite direction. In these cases, Dr. PI‘s Mtbook display the reverse complementary sequence in bright-blue color with ORFs running from lower-right to upper-left (e.g. Rv0008c on page 3).
The combination of these display features allows to view the information about coding and noncoding sections in the context of the genomic sequence. When printed from the Browser in grey-scale, all the information shown on the page by color coding remains distinguishable, making the output a handy and valuable worksheet of a genome segment. In addition The layout of a book gives a nice and userfriendly interface, that people are used to handle.
Map: The DNA-page contains „time“ markers on every other page (e.g. [00:01] on the bottom of page 2). These markers give a „time-mapping“ of the circular chromosome by dividing it into 12 segments (corresponding to hours) each with 60 minutes. [04:01] thus signifies the genes that are located at „four o‘clock and one minute“ in the circular chromosome, namely the ribosomal-RNA-genes). Clicking on such a marker opens a new window with a map of the corresponding region with the following features:
-
The time tag on the top of the map shows the position of the map in hours and minutes, allowing for intuitive location of the DNA-segment.
-
Three pages of Dr. PI‘s Mtbook are shown, with a link in the top line, that brings you back to the specified DNA-page.
-
The bottom line shows the nucleotide number in kb (thousand nucleotides), putting the map position into relation with conventional numbering schemes.
-
The genes that are encoded in the map area are displayed as colored boxes, whose length and position indicates the length of the coding sequence. The color of each gene is chosen according to its function (see below).
-
For each gene, the coding direction is indicated with an arrow. Genes are labeled either with their RvName (if the function is not known) or with the Gene Name. These names carry a link to the protein page, that describes the details of that gene.
-
Arrows on top-left and top-right allow easy, browsing through the genome map page by page.
-
A multifunctional search box on the left bottom moves the map to any desired segment. You can enter a Mtbook page number between 1 and 1362, a map position in the format hh:mm, a Rv gene number e.g. Rv1234, a RNA-gene number e.g. Rna042, or a gene name e.g. rpoB.
This map is generated on the fly from a script that contains a list of all genes, their positions and functions. Navigating with the map is intuitive an easy and leads you immediately to your gene of interest which is displayed in the context of the genome. From there access to the DNA-pages or to the description of the gene products is possible.
Genes: The Rv-numbers on the DNA pages are hyperlinks to the corresponding gene descriptions. Clicking on Rv0001 on page 1 displays information for one of the tour thousand encoded genes, namely:
-
Size of the open reading frame and its time-map position with a link to the map
-
a backlink to the DNA-page of Dr. PI‘s Mtbook
-
the name of the gene and its abbreviation
-
the functional classification of the gene with a link to the function window (see below).
-
for some genes a specific comment about properties and functions is added.
-
The Protein information, number of amino acids, complete amino acid sequence in a Text-window
-
Links to external websites (mainly the pedant site) with detailed information for this specific M. tuberculosis gene such as: Pedant report, prosite patterns, intragenome comparison, blastp homologues.
-
lookup links to PubMed, that searches the medline-database for the gene abbreviation, (with or without M. tuberculosis) and the gene name
-
lookup link to the sanger genome database entry (Rv number) of this gene
-
lookup link to the tigr genome database entry (MT number) of the gene
Only the information that gives reasonable information for the displayed gene are shown. Arrows in the top line allow walking from one gene to the next and back.
Function: Clicking on the function link opens up a window and displays the Category, the class and the group of the function (e.g. for Gene dnaA (Rv0001) the category is II. A.5). In addition, all genes that are in the same group are shown with their gene name (linking to the gene page) and their Rv-Name (linikg to the DNA-page). Functions are grouped according to their color along a rainbow. Clicking on the rainbow displays the categories with a list of the genes.
Navigation: The main window shows either the DNA or the Gene page, the help window shows either the map or the function. Thus the different levels of the genome are connected in Dr. PI‘s Mtbook in multiple ways.
What problems are solved by Dr. PI‘s Mtbook?
Dr. PI‘s Mtbook allows to view every segment of a genomic sequence in full detail. Start codons, stop codons, coding sections, intergenic regions, forward and reverse open reading frames, overlapping orfs are easily visible and understandable without giving numbers. The displayed DNA-information can be simply printed. Any segment of the genome can be accessed by page and overviews are obtained through the map that shows the relative position of genes in the genomic context on a broader scale. Jumping between information levels (DNA, gene, map, function) is possible for every gene.
Which elements are new in Dr. PI‘s Mtbook?
- Dr. PI‘s Mtbook is the first attempt to show a DNA sequence in the format of a book. This “old fashioned” format has the advantage of being simple and familiar. It puts the huge package of information into slices, that can easily be viewed and handled. In addition the page numbering gives the idea of „where you are“ and which segments are upstream or downstream.
-
The display of open reading frames within the genomic sequence makes the context easier to understand and the identification of intergenic regions simple.
-
Different coloring of forward, reverse or overlapping orfs helps to orient on the double stranded genome.
-
Direct linking of DNA , gene, function and map gives complete access to multiple aspects for each individual gene.
-
The complete site can be put onto a CD and operates as stand alone unit.
What is the usefulness of Dr. PI‘s Mtbook?
Dr. PI‘s Mtbook makes complex information acessible in a very simple format of a book that is familiar to everybody. It allows to navigate forward and backward in the genome on the nucleotide-level of genomic DNA, on the map-level as well as looking at the genes one by one or browse through the list of functions.
What is innovative about Dr. PI‘s Mtbook?
Genetic information is usually stored in complex databases with lots of numbers, duplications and redundancies. When searching for a specific gene in those databases, you easily get lost or are guided to links that lead nowhere. Dr. PI‘s Mtbook puts together different levels of information from a complete genome and combines them in a novel display. The navigation and the script-generated maps and function-lists allow adaptation to other genomes.
To whom does Dr. PI‘s Mtbook address?
Dr. PI‘s Mtbook addresses to the following audiences:
-
The general public is able to see what kind of information a genome contains and can get an idea of the size (pages of the book) and complexity of a genome.
-
Researchers working with this organism can have the genome available in a simple format.
-
For the scientific community in life science research Dr. PI‘s Mtbook could be a sample how genetic information of any organism (including humans) can be made accessible on the internet.
How does Dr. PI‘s Mtbook make use of the internet technologies?
Dr. PI‘s Mtbook has a size of less than 20 MB. It contains 328 html files, 514 files total and has 64‘764 links, whereof 38‘347 are external or javascript-links. It combines a static website layout with multiple javascript-based functions that automatically generate links from simple gene lists. Functional categories, Map scales and gene boxis are created dynamically. This allows for easy adaptation and modification of the displayed information.
Acknowledgement: The support with java-scripting by Dr. Lutz Slomianka, Unizh, during the development of this poject is greatly appreciated.
Short Portrait of Dr. PI.
Dr. Paul Imboden, Ph.D.
Aiming for Logic, Clarity, Precision.
- * 1957
- 1983 Biochemistry Studies at the ETH Zürich
-
1987 Ph.D in Chemistry at the University of Berne
-
1994 Post Doc studies in Molecular Biology and Microbiology at the University of Berne
- 1996 Stanford University, Schoolnik -Lab.
Publications on PCR Methods and Tuberculosis.
Bioconsulting: Support and Management of Science Projects http://www.bioconsulting.ch
E-learning Expert, Department of Biochemistry, University of Zürich
Webdesign for Science and Education. http://www.drpi.ch
For a more detailed CV in german : http://www.drpi.ch/admin/CV2004PI.pdf
@PI 2004
|
 |