Genome Analysis

When you take all the known sequence of DNA together that makes up your entire genetic blueprint and then put all the very specific changes that are unique to you in its exact expected position, the resulting sequence of DNA is your genome. The unique change in each position is your GENOTYPE at that position. In fact, your genome is a total representation of all your genotypes. Every single person will have a unique genotype; just like every single person looks different. What you look like when someone sees you is the sum total of your genotype (genome) and how it works with the environment you are in. As you may have guessed, that is called your Phenome. For each genotype, technically there is a unique PHENOTYPE. For example, the mutation for Sickle Cell anemia (base change of A to T at a specific position in the HBB gene) results in a genotype which is either A/A (homozygous wild type) or T/T (homozygous mutant) or A/T (heterozygous) and a phenotype, which is a sickled cell (sickle shape) for the T/T genotype and round or oval shape for the other two genotypes. The total phenome is what you are sort of seeing as the person and what you cannot see (with all their genotypes) is the genome. But as you can see from the Sickle Cell example, the phenotype for a specific genotype is not obvious until you know what you are looking for. That is why a genome analysis helps. Once you have the genotypes available for a lot of conditions for which the genotypes are specifically known, then looking for signs will show if it is obviously there or not. Depending on the environment, it may not be obvious except in some cases.

Since most people have almost 98% of their genomic sequence identical, you can safely assume that the real goal is to figure out the differences in the positions that really matter. This effort has a technical term, and it is called – genotyping. In other words, figuring out your genotype or those changes that might really create a health problem for you, is called genotyping. Genotyping has become so good that millions of genotypes or specific locations spread throughout the genome can be identified in one effort. There are two main ways to do this. One is genotyping by DNA Sequencing and the other is genotyping by running Genome Arrays/Chips.

Genotyping by Genome Arrays.

This is done by placing or attaching known DNA sequences with the normal and the mutated information for a specific location on what is known as a DNA chip. With this method, a lab technician can take your DNA and put it on the DNA chip and see if all the information agrees with known normal sequence or agrees with known mutated sequence. For example, if all the known locations (spread through all the genome) is known for 1000 inherited diseases, then in one attempt, a laboratory technician can take your DNA and query all 1000 locations linked to each of the genetic disease and determine your genotype for all 1000 locations. In this one example, if 999 queries came back as normal and one came back as mutated, then from that specific location (or sequence) you know what disease it is. That is how genotyping works. This is a simple way to look for all known genetic differences that might be associated (or linked) to a specific disease or trait. In genetics, a trait is a specific characteristic of an individual – like height (tall vs short or blue eye vs brown eye). Genotyping by arrays can be used to map millions of genetic information in one attempt and is therefore a simple and quick way to determine genotypes accurately. Many companies like 23andMe as well as Ancestry.com etc. use this technique among others to do genome analysis. GenoTypica also uses an established array provided by Illumina to do genome analysis for its customers (see Illumina arrays).

Genotyping by DNA/Genome Sequencing

Let us see first what happens with this method when you send your sample in. Since the 23 pairs of chromosomes you have together has more than 3.2 billion letters (A,T,G and C), it is not easy to figure this out quickly. So, someone in a lab isolates the genomic DNA (total DNA from your sample), breaks it into tiny pieces and then figures out the arrangements of letters in those tiny pieces. To make sure that they get at least 90% of the sequence, they have to do this 20 to 50 times. In other words, with an average of 150 letters being read each time, you can imagine that if you created small pieces you have to read 3.2 billion divided by 150 bases to get it right assuming all pieces were available equally. To make it simple; if you are sequencing 30,000 bases, you will need 200 attempts at one time to get all 30,000 bases sequenced. Since theoretically this is a one-shot attempt, you will not get it all done right. So, if you randomly break it to 4000 pieces, you might get 100% of the 30,000 bases sequenced. Some pieces might get sequenced many times and some pieces may get only one read but you will get all 30,000 bases read. This is a 20 times coverage or written as 20X coverage. You will read this type of statements on websites if you send your sample in for sequencing. This might still only give you 97% or 98% accuracy. If you do 50X or 100X coverage, you might get close to 99% accuracy which is very good. So, imagine if the human genome is 3.2 billion bases. You have to do millions of reads to get 20X or theoretically, a twenty times coverage of your genome. Typically, you need 50X coverage or even 100X coverage to get accurate sequence. Again, as mentioned in the array section, since all the relevant information is known about genes that affect your health, all you are doing with sequencing is also comparing your sequence information to a known reference sequence and then determining if the genotypes that are important agrees with known mutations or agrees with the known wild type (normal) sequence. This is all done on a computer. This exercise is still time consuming and expensive and not that accurate if you are only getting 20X or 30X coverage.

To give it some context, the first “complete” human genome was sequenced in the early 2000s. For doing one complete sequence (it was only 90% complete), it took about $100 Million in funds provided by the US Government.

https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genome-project

Then by the end of 2022 (22 years later), the first 100% complete sequence was published (of a single genome).

https://sites.google.com/ucsc.edu/t2tworkinggroup

It was not easy to go from 90 to 100 percent. A review of the link above and also papers listed at the bottom of the page linked here will tell you how hard it was.

https://www.genome.gov/news/news-release/researchers-generate-the-first-complete-gapless-sequence-of-a-human-genome

Today, we know from all this work and millions of samples sequenced, that about 2% of our entire genome codes for something for which we know the function for. These functional sequences code for proteins and RNA molecules that make up cells, our organs and our entire body and how it is organized and how it functions from birth to death. Changes in these coding sequences end up creating 90% of our health problems that are genetically controlled or influenced. Though the estimate for the total number of genes vary (25,000 to 40,000), the number of functional sequences (genes) that are turned on and off during your life-time will also change.

If someone decides to sequence their entire genome, then it is called whole genome sequencing (WGS). If someone decides to sequence just the 2% that codes for something useful, then it is called Exome sequencing. Whether it is Exome or the entire genome that is sequenced, today’s technology generates a lot of data. To genotype by sequencing, you take the sequence that you generated from your sample and compare it to a reference sequence. You will find out if there are differences. You can also compare your sequence to all the known genetic variations that cause diseases. It takes a lot of computational work to finally get a complete picture and create a report that tells you what your final genotype is. The problem with sequencing is that even exome sequencing does not guarantee success with 100% of all of the exome sequences. So as much as it sounds good, where sequencing really benefits are the discovery efforts. It is the scientific community (academic as well as corporate) that benefits most from whole genome or whole exome sequencing. It helps with the discovery of new genes, variations that are not known and also allows the study of different ethnic groups (population studies). It is DNA sequencing that allows the discovery of all the known mutations and cause of genetic diseases, genetic pre-disposition etc. that allows us to create diagnostic tests and also treatments with new therapies. It also helps with the development of accurate tests that can be run as single gene tests or if needed, millions of tests together on arrays and chips! So, now you know how arrays and chips are developed from sequencing data and how sequencing data is used.

We as a company believe in relevant and useful data only. This allows the cost to be reasonable and allows genomic technologies to be accessible to everyone. If you do whole genome sequencing, the entire sequence information for any person is a large data set which requires large amount of computing power to figure out (and then store the information which is about 150 GB) when you know that you only need 2% of the information. So, not really worth doing unless you are a researcher (Scientist) who is trying to discover something new. For most of us, we just need to know if our genetic blue-print has the mutations that might be creating potential problems (potential wellness problems) for us. See Genetic Health for more information.

In summary, Genomics is the study of the genome of a particular organism and how it is organized and how it functions in relation to the environment that the organism is in.

More details on this topic and written in simple language can be found here:

https://www.genome.gov/about-genomics

https://www.genome.gov/About-Genomics/Introduction-to-Genomics

DNA

Genes & The Genome

Genetics