SARS-CoV-2 Protein Structure Prediction

Weißenow, Konstantin - Daniel, Berenberg - Christian, Dallago - Michael, Heinzinger - Valentin, Plugaru - Ian, Fisk - Reinhard, Schneider - Burkhard, Rost

Bioinformaticians from Rostlab at TU Munich, the LCSB and the Flatiron institute joined forces to participate in a worldwide effort to predict 3D structures of Sars-CoV-2 proteins. The project was organized by the team behind CASP (Critical Assessment of protein Structure Prediction), who selected ten viral proteins for which no experimental structure is available, nor can homology based modeling be used to infer structure. The goal is to obtain consensus structures, which would give insight into the molecular mechanisms of the virus, as well as aid vaccine development and the evaluation of possible drug targets. To this end, the team from TUM and LCSB, helped by collaborators at the Flatiron institute, trained multiple Deep Learning systems, leveraging evolutionary information from multiple sequence alignments, to predict pairwise distances of amino acids in proteins. The hardest targets of previous CASP competitions were used to assess the reliability of predictions. Computer generated distance maps were then used as constraints to simulate protein folding and obtain 3D structures.

Predicted distance map of the virus protein nsp6.

Three nodes of the Iris HPC cluster with four Tesla GPUs each (in addition to three similar nodes at the Flatiron institute and one node from the UCC at TUM provided by IBM) were used to prepare datasets, as well as to train DL models, enabling high quality submissions within the initiative's short-term deadline of April 6th. Visualizations of predicted structures from this project are available at [1], while evaluations and consensus results of the worldwide effort can be found at [2].

Predicted structure of virus protein ORF8.


Posted by OpenPower Team