In recent years, public health emergencies caused by epidemics have led to the use of genome sequencing to identify and characterize viral pathogens. Rapid acquisition of high quality viral genomic sequences is critical for understanding viral pathogen origin, transmission and epidemiological spread. Ultimately deciphering the molecular characteristics of viruses accelerates the development of diagnostic assays and vaccine development and drug design and is important in understanding the role of evolutionary variants in viral spread through the affected population.
This virus contains a single molecule of RNA that contains the genetic information to produce new progeny in infected cells. To determine the viral genome “code” viral RNA is isolated, and enzymatically converted to DNA. In a process called targeted sequencing, specific overlapping regions of the viral sequence are amplified using sequence-specific primers and PCR, and these DNA fragments are then analyzed using next generation DNA sequencing. The complete viral genome can then be “assembled” from the overlapping DNA fragments.
The first genome sequence of the SARS 2 Virus was reported in the journal Nature in early February of 2020 from samples obtained by bronchoalveolar lavage fluid from a 41-year-old man hospitalized in late December of 2019 in the Central Hospital of Wuhan, China. Within the DNA sequences obtained from the sample investigators found a novel viral genome with similarities to two Betacoronaviruses – a coronavirus associated with humans (SARS-CoV Tor2) and a coronavirus associated with bats (bat SL-CoVZC45). The SARS CoV2 genome encodes several genes necessary for the viral life cycle. These include viral proteases critical for cell entry, an RNA-dependent RNA polymerase (RdRP), several structural proteins, including spike protein (S), membrane protein (M), envelope small membrane protein (E) and nucleocapsid protein (N).
As the virus has grown into a pandemic infection, the rise of mutated viral variants has occurred within certain geographical locations. Using genome sequencing from isolated viral RNA, it is possible to detect and quantify circulating viral lineages (including novel, potentially more transmissible and/or pathogenic) in the human population. This has important implication regarding the surveillance of potentially more infectious and dangerous viral variants and the ability to proactively implement specific public health measures to help control variant spread.
In collaboration with virologist Dr. Marco Salemi, Professor in the UF Department of Pathology, the ICBR has begun to use high throughput DNA sequencing technology to help identify COVID19 variant spread within the state of Florida. RNA is isolated by Melanie Cash in the Salemi Lab is sent to the ICBR Gene Expression Core, where Dr. Yanping Zhang and her team have developed an automated procedure using a robotic liquid handler called the Mosquito® that automates the conversion of viral RNA into DNA. In response to the very large sample volume Dr. Zhang has modified a standard protocol that uses single test tube for each sample into a 384-well format. Each well of the plate contains a unique molecular bar code – a short, synthetic DNA sequence that gets incorporated into the DNA during the synthesis reaction. All the 384 samples are then combined into a “library” that is then sent to the ICBR NGS Sequencing Core led by Dr. David Moraga. Using the Illumina NovaSeq6000 instrument, Dr. Moraga can sequence 768 patient samples overnight – generating millions of short DNA “reads”. Using HiPerGATOR the “raw” DNA reads are then analyzed by Dr. Alberto Riva, Scientific Director of the ICBR Bioinformatics Core. Complete viral genomes are assembled from the millions of short DNA reads using the unique molecular bar code to identify which sequences are associated with each sample. Assembled genomes are then compared to a database of known COVID19 variants and each sample is assigned a variant type. Because the date and location of the sample collection are known, Dr. Salemi can use a custom-designed algorithm to build a phylogenetic relationship tree showing variant spread over time. This information will be shared within an international COVID19 genome sequence database and will help to understand the origins of variant spread in the state of Florida.
“The ICBR has been phenomenal in developing a real time sequencing and bioinformatic pipeline for SARS-CoV-2 genomic tracking. In less than one month we were able to successfully genotyping more than 2,000 samples, an amazing accomplishment thanks to the extraordinary professionalism of the ICBR genomic and bioinformatic groups”
–Dr. Marco Salemi