Viral Predictor of mRNA Evolution (VPRE)

Background

1) COVID-19 has been devastating in so many different aspects - the loss of human lives, the financial and social impact on both a global and an individual level.

2) At the core of the pandemic is a single protein, the spike protein that the coronavirus uses to enter human cells.

3) Modeling evolution with computation is one of the greatest standing challenges in biology. Traditional bioinformatics methods are very computationally intense and require a great deal of data to be informative.

Objectives

We aimed to model the evolution of the SARS-CoV-2 spike protein using deep learning.

Methods

We aimed to explore the utility of using variational autoencoder and gaussian process to model the evolution of the SARS-CoV-2 spike protein.

1) First, we train our variational autoencoder to accurately encode SARS-CoV-2 spike protein sequences into 3 decimal numbers.

2) Once our VAE is trained, we feed it spike protein sequences, it turns them into coordinates.

3) Those coordinates are then entered into our Gaussian process to fit sequences from the past, and then projects its functions into the future.

4) The predicted coordinates are decoded by the VAE back into probable spike sequences that might arise as SARS-CoV-2 evolves.

Results

Overall, we got 17 predictions, aach is associated with a frequency score that represents the percentage of sampled Gaussian Process predictions that map to the corresponding sequence.

In particular, the top six predictions show up in all three time points with varying frequencies.

In short, predicting one and two months into the future is our control experiment that benchmarks our model. The takeaways are that our VAE is capable of reproducing accurate spike protein sequences. And our GP functioned as expected when trained on a small dataset.

Deliverables

Project website: https://virosight.ca/

Presentation: https://youtu.be/VzHnVvEXLqg

Manuscript: https://www.biorxiv.org/content/10.1101/2021.12.04.471198v1.full.pdf

Presentations

Canadian International Genetically Engineered Machine Competition. cGEM foundation. Canada. October 2020. [oral]

Life Science Research Night. The University of British Columbia. Canada. November 2020. [oral]

Harvard National Collegiate Research Conference (NCRC) . Harvard University. Online. January 2021. [plenary speaker & poster]

Multidisciplinary Undergraduate Research Conference (MURC). The University of British Columbia. Canada. March 2021. [oral]

Stanford Research Conference (SRC). Standford University. Online. April 2021. [plenary speaker & poster]



Get in touch