is an interpretation tool designed for clinicians interested in the genotype/phenotype relationships of clinical variants found in 58 genes related to conditions investigated in neonatal screening programs.
Missense, nonsense and frameshift single nucleotide variants (SNVs) annotated in the ClinVar database are retrieved in real time and presented in the structural context of the original protein. For each variant, binary classifications (tolerated/deleterious) obtained from 16 popular predictors are shown, together with a consensus ternary classification (tolerated/uncertain/deleterious).
Alternatively, you can also see predictions for all possible single amino acid variants (SAVs) arising from SNVs in these genes.
Disclaimer. This resource is intended for research purposes only. The authors are not responsible for its use or misuse and assume no liability or responsibility for any error, weakness, incompleteness or temporariness of the resource and of the data provided.
FINANCIACIÓN/FINANCEMENT FEDER
Proyecto Cofinanciado al 65% por el Fondo Europeo de Desarrollo Regional (FEDER) a través del Programa Interreg V-A España-Francia-Andorra. Projet Co-financé à hauteur de 65% par le Fonds Européen de Développement Régional (FEDER) dans le cadre du Programme Interreg V- Espagne-France-Andorre.
PirePred © 2020 Javier Sancho's Lab. All Rights Reserved.
Main selection panel
The PirePred server’s Main Screen offers the user the possibility to select −through the Main Selection Panel (Animation 1)− either one gene, protein or disease among 58 frequently investigated ones in neonatal screening programs.
The entries in each of the three selection modes are listed alphabetically (Animation 1).
When a gene, protein or disease is chosen in the Main Selection Panel ([1] in Animation 2), all the associated Single Nucleotide Variants (SNVs) reported to date in the ClinVar database are retrieved and depicted in the Variants Panel ([2] in Animation 2). Likewise, a link to predictions of all possible amino acids variants from ClinVar's SNV entries is given in this panel (also available at the Buttons band at the right-hand part of the server screen). At the same time, the structure of the concerned protein is shown in the JSmol Structure Visualization Panel ([3] in Animation 2) in its reported or modeled biological assembly.
At the top of this panel, a self-explanatory text indicates whether the structure has been experimentally determined or it is a model, provides a link to the coordinates file stored in the Protein Data Bank (for experimentally solved structures), and indicates its sequence coverage (the residues of the protein that are shown in the structure). Additional information on the protein structure shown in Panel 3, as the elucidation method (X-ray, NMR or Cryo-EM), the resolution, the missing residues (if any), the sequence coverage, the server used to model (by homology or threading) the protein structures (or fragments of them) not experimentally solved, and some quality parameters of the models, is poured into the ‘Structure/model information’ button () placed at the Buttons Side Band ([4] in Animation 2). Also, a ‘Help/Credits’ button ( ) allowing the user to access this text appears on that band. By clicking one out of the SNV buttons in Panel 2 (e.g. ) the cartoon representation of the protein in Panel 3 zooms out and centers at the affected residue, which is labelled and highlighted in green.
Besides, a new self-explanatory text appears indicating either the amino acid replaced (for missense variants) or the putative structural consequences triggered at the protein level (for nonsense or frameshift variants (Figure 1)).
Wherever an SNV button in Panel 2 is accompanied by an exclamation mark at its right, the affected residue is located in a missing part of the protein structure, and it is therefore not showable. When a variant is selected in Panel 2, if the protein contains more than one chain, only one is emphasized in the Visualization Panel, the remaining chains being depicted with high transparency for the sake of a clearer visualization.
Missense
Experimental structure (pdb: 6jbj): residues 2 to 606
The wild type residue highlighted in green is replaced in the variant.
Nonsense
Experimental structure (pdb: 6jbj): residues 2 to 606
The variant will lack the red coloured region. As a consequence, the structure of the synthesized part may be quite different.
Frameshift
Experimental structure (pdb: 6jbj): residues 2 to 606
The structure of the gold coloured region is uncertain due to a complete change in sequence. As a consequence, the rest of the structure may be quite different.
In addition, selection of an SNV by clicking in Panel 2 generates four new buttons in the Buttons Side Band, namely: ‘Predictions for this variant’ ( ), ‘Structural context of this variant’ (), ‘Predictions for all variants’ () and ‘ClinVar’ ().
‘Predictions for this variant’ button
Once the ‘Predictions for this variant’ button ( ) is clicked, a pop-up window appears showing the available predictions for the selected protein variant (Figure 2). On the right side of the window, a square with the summary of the prediction information is shown. It includes the protein variant name, in one letter amino acid code, and the consensus classification, according to the method explained below. The color of the square also shows the consensus classification, being red for deleterious variants, green for tolerated variants and grey for uncertain predictions. On the left side, the summary of binary predictors used (all included in dbNFSP 4.1a1), as well as the rankscores (values) and binary predictions (underline color) thereof are shown in a scrolling column. Rankscores range from 0 to 1, being 1 the most probably deleterious score. Binary predictions are usually obtained directly from the scores, but some predictors such as LRT incorporate extra information for the prediction.
As a guide for the user, we offer a summary of the predictions obtained following a simple 'Majority vote' algorithm, in which a variant is assigned to a class that is predicted by more than a given percentage of the methods, depending on the class. This 'consensus classification' is offered for informative purposes and it is not intended as a proper prediction. As it is based on predictions provided by different preexisting predictors, it would be difficult to guarantee that results obtained following a conventional training/testing workflow would be free from overfitting issues that might originally affect to some of those predictors. Therefore, such a workflow has not been used in this case. However, reliable clinical data from ClinVar has been used to ensure an optimal level of statistical correctness in this consensus, having into account parameters such as accuracy, recall, Predictive Value for Deleterious variants (DelPV), Predictive Value for Tolerated variants (TolPV) or Matthews Correlation Coefficient (MCC).
For the four predictors without a pre-established threshold for binary prediction (REVEL, MutPred, MVP, CADD), their threshold has been selected, using plots as the one in Figure 3, to ensure an optimal MCC value for the binary predictor while keeping its accuracy, recall and DelPV above 90% of the maximum value obtained for each parameter. TolPV is not taken into account for the definition of thresholds due to the low number of tolerated variants.
In this server, the terminology Tolerated/Deleterious is used to qualify the variants, as it is the most common among the included binary predictors. Deleterious variants are those predicted to cause or significantly increase the risk of developing a genetic disease. Some other terms in use referring to the same concept include pathogenic, disease-causing, damaging or non-functional. Tolerated variants are those predicted not to affect (or even reduce) the risk of developing a genetic disease. Similar terms in use elsewhere include neutral, benign, functional or polymorphism. Predictions are unavailable for frameshift variants and for those that change a stop codon (X) for an amino acid coding codon, as their effect will depend more on the effect of the new sequence generated than on that of the affected codon itself.
The consensus classification returned to the user is calculated by two rules:
Application of these simple classification rules to the subset of variants that are reliably annotated in ClinVar (those reviewed by ClinVar with at least one star and not classified as ‘Uncertain’ or ‘Conflicting interpretations of pathogenicity’, see Table 1), provides a consensus classification of either ‘Deleterious’ or ‘Tolerated’ for 94.8% of the variants. For these reliably annotated variants, we see that the PirePred consensus classification is informative, as it shows a non-cross-validated accuracy of 92.4%, a DelPV of 97.6%, a TolPV of 93.8% and a MCC of 0.58.
Type | Number of entries | Relative to the total |
---|---|---|
All ClinVar | 8880 | 100,00 % |
Reviewed with 1+ star | 5205 | 58,61 % |
1+ star and not Uncertain or Conflicting | 2095 | 23,59 % |
1+ star and Pathogenic or Likely Pathogenic | 1902 | 21,42 % |
1+ star and Benign or Likely Benign | 193 | 2,17 % |
The Tolerated/Deleterious heading shows the number of binary predictions of each type given by the individual predictors. For advanced users, the rankscore and binary prediction for each predictor are offered, as obtained from dbNFSP 4.1a1. To derive them, the scores for a given predictor are ranked from least to most probably deleterious. The rankscore for a variant is then the fraction of scores under its own score, thus ranging from 0 (most probably tolerated) to 1 (most probably deleterious). All predictors in PirePred base their binary predictions in their scores, except LRT and Mutation Taster which use additional information (homology, conservation…) for their predictions.
References:
1- Liu, X., Jian, X. and Boerwinkle, E., dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat., 32: 894-899. (2011).
Structural context of this variant
After clicking the ‘Structural context of this variant’ button ( ), a popup window like that shown in Figure 4 appears, which contains information relative to the structural context of the affected residue. Under the variant name, five properties of its structural context are given. When hovering over each of these properties, an explicative text box appears. The explanation of each field is the following:
Monomeric/multimeric relative exposure indicates the percentage of the residue that is exposed in the native protein (in the monomer or the biological assembly, respectively) compared to the average exposure of that type of residue in the unfolded state. Values over 100% show that the residue is overexposed and under 15-20 % show that it is buried according to Ayuso-Tejedor et al.2 (the lower the value, the more significant the burial). A significant reduction in the relative exposure from the monomer form to the biological assembly means that the residue is buried in the biological assembly and, therefore, may alter the interaction of the constituent subunits if changed. Thus, if the ‘Monomer contact’ property shows ‘Yes’, this change is significant (the multimeric exposure is less than 90% of the monomeric exposure) and the variant is potentially deleterious as it might be unable to correctly form its biological assembly.
SITE and CSA indicate whether the residue is annotated as relevant for the function of the protein (checked experimentally or by homology) in the original PDB file or in the Catalytic Site Atlas database, respectively. Thus, if the value of any of them is ‘Yes’, the residue is likely relevant for the function and any variant may cause malfunction and, therefore, disease. If their value is ‘No’, it doesn’t mean that the residue is not relevant, but only that it hasn’t been found as relevant with significant proof or that it hasn’t been annotated as such.
References:
2- Ayuso-Tejedor, S., Abián, O., & Sancho, J. Underexposed polar residues and protein stabilization. Protein Engineering, Design and Selection, 24(1–2), 171–177. (2011).
‘Predictions for all single amino acids variants’ button
When clicking either the link over the text 'see all possible variants' in the Variants Panel or the ‘Predictions for all single amino acids variants’ button ( ) in the Buttons band (at the right-hand part of the server screen) a popup window shows (Figure 5) all possible protein variants that can be produced by SNVs in the coding gene, together with their predictions as retrieved from dbNFSP 4.1a, and the consensus classification given by the PirePred server. The rows contain the same information as the columns in ‘Predictions for this variant’ with the only additional information of the coloring of the first two columns. The first column is colored with the corresponding consensus classification, obtained as indicated before. It is filled with color up to the percentage of predictors that agree with the consensus out of the 16 total predictors or, for uncertain classifications, with the average between the numbers of tolerated and deleterious predictions. The second column is completely colored with the corresponding consensus classification.
The interpretation of the results is the same as in ’Predictions for this variant’, the only changes are the display position and the fact that the results are provided for all possible variants instead of just one of those present in ClinVar.
On the structures shown in the Visualization Panel
Of the 58 proteins encoded by the genes included in the PirePred server, 41 have experimentally determined structures available in the Protein Data Bank (PDB) with a reasonable structural coverage. However, in two of them the coverage is not so high and the experimentally solved fragment has been combined with a modeled fragment of the missing part of the structure obtained in one case from the GPCR-SSFE 2.0 server3 (multi-template homology modeling) and in the other case from the i-Tasser server4 (threading). The remaining 16 structures have been obtained (homology modeling) from the Swiss-Model server (single-template homology modeling)5.
Ten of these models were retrieved directly from its repository, while models for the remaining six proteins (of which either there was no reported model or the reported ones had low quality parameters) were built de novo. The models retrieved from the repository were chosen from the list available for each gene, based on the criterion of having the highest possible coverage and Qmean (integrated quality parameter), but also looking for a template matching the right oligomerization state predicted or stated for the protein. The same criterion was applied to select the best template for the six models built.
References:
3- Yang, J., Zhang, Y. I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Research, 43: W174-W181 (2015).
4- Worth, C. L., Kreuchwig, F., Tiemann, J. K. S., Kreuchwig, A., Ritschel, M., Kleinau, G., Hildebrand, P. W. and Krause, G. GPCR-SSFE 2.0—a fragment-based molecular modeling web tool for Class A G-protein coupled receptors. Nucleic Acids Res. 45(W1), W408-W415 (2017).
5- Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F.T., de Beer, T.A.P., Rempfer, C., Bordoli, L., Lepore, R., Schwede, T. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46(W1), W296-W303 (2018).
Example of use
Choose gene PAH in the Main Selection Panel (Animation 3) and then the R297C variant from the Variants panel. The Visualization Panel focuses on the residue (arginine 297) affected by the variant selected. Only one representative subunit (chain) of the tetrameric protein (the enzyme phenylalanine-4-hydroxylase) is ordinarily shown, the representation of the three others being set to a higher transparency for greater clarity.
Many different operations, e.g. selecting neighboring residues, performing distance calculations, etcetera, can be done using the Visualization Panel (right click opens an interactive menu for performing additional operations in JSmol for advanced users). As R297C is a missense variant, only the original residue (R297) replaced as a consequence of the genetic variant is highlighted (by depicting in green its alpha-carbon atom −as a small sphere− and the sidechain in stick representation). Had the variant been nonsense or frameshift, the C-terminal part of the protein starting from the variant point would have been emphasized by showing it in a slightly different backbone representation (ribbon) and in the same color coding used in the corresponding variant button at Panel 2 (red and gold, respectively, for nonsense and frameshift variants).
In nonsense variants, the red-colored region defines the segment that will not be synthetized because of the presence of a STOP codon in the mRNA whereas, in frameshift variants, the golden ribbon defines the segment that will contain a massively changed amino acid sequence as a consequence of a frameshift at the mRNA level, which can also change the length of the segment (see Figure 1 in 'Main selection panel' tab).
Clicking in the ‘Structural context of this variant’ button ( ) provides additional information that, in some cases, may help to understand why the variant is or is not Deleterious (Figure 6). For the R297C variant in the PAH gene, the information retrieved indicates that there is a significant reduction in the relative exposure of the residue in the monomer (69%) when it is associated to the other monomers in the biological assembly (15%), which means that the residue is probably located in the interaction region between the monomers. This is confirmed by the ‘Monomer contact’ property, which indicates ‘Yes’. Thus, replacement of R297 by a different residue may hinder the correct assembly of the protein and so its function. Moreover, the fact that the residue appears in a region annotated as part of a SITE in the PDB (‘SITE: Yes)’ suggests that an amino acid change such as a replacement of an Arginine by a Cysteine, may drastically modify the activity of the protein.
Clicking on the ‘Predictions for this variant’ button activates a popup indicating that 2 out of 16 servers predict this variant as tolerated, while 14 predict that it will be deleterious. The PirePred consensus classification obtained for this variant by applying the classification rules explained above is ‘Deleterious’ (Figure 7).
These structural aspects can be checked in situ in the Structure Visualization Panel by rotating/zooming the protein and guessing how such a modification may alter –at least locally− the native arrangement of amino acid residues in its surrounding.
If the user is interested in the predictions offered for a SNV in the PAH gene that is not reported in ClinVar (anyone that doesn’t appear in the Variants Panel, [2] in Animation 2 in 'Main selection panel' tab), clicking in the ‘Predictions for all single amino acids variants’ button display a larger table where the variant of interest can be found together with the corresponding individual predictions by the 16 predictors, and the PirePred consensus classification.
Server Implementation
The website uses Bootstrap 4 for the presentation in the client side (front-end). The user selects a gene, protein or disease and, through AJAX, the request is made to the server. Then, in the back-end, PHP connects with the ClinVar API and obtains the data query in XML. These are formatted and displayed in a list. Moreover, the protein structure related to the selected entity (gene, protein or disease) is shown through the open-source JavaScript viewer JSmol. Also, pre-generated tables with predictions for all the real and potential variants of each gene caused by a SNV are obtained from the dbNFSP 4.1a repository. All this information is returned to the user through the interface.
Website's browser compatibility
An updated browser is recommended and JavaScript must not be disabled.
Table 2 below shows the PirePred’s compatibility with the most extended browsers used with the Operating Systems (OS):
OS | Version | Chrome | Firefox | Microsoft Edge | Safari | Internet Explorer |
---|---|---|---|---|---|---|
Linux | Ubuntu 16.04 | 45 | 38 | n/a | n/a | n/a |
MacOS | 10.11 | 45 | 38 | n/a | 9 | n/a |
Windows | 7 | 45 | 38 | 12 | n/a | 10 |
Identified sequence issues
About PirePred
The PirePred server was designed and implemented by J. J. Galano-Frutos, H. García-Cebollada, A. López and J. Sancho, with contributions from J. Fernández-Recio and X. de la Cruz. Address correspondence to jsancho@unizar.es
Disclaimer. This resource is intended for research purposes only. The authors are not responsible for its use or misuse and assume no liability or responsibility for any error, weakness, incompleteness or temporariness of the resource and of the data provided.