BetaWrap Help


Using BetaWrap

What the program does: BetaWrap is a program that scores sequences for compatibility with the right-handed beta-helix fold. It incorporates structural features of the known beta-helix structures, as well as residue pair preferences learned from beta-sheets in non-beta-helices, to generate and score potential wraps of a query sequence into a beta-helical structure. It computes a score and P-value which reflect how well a query sequence fits into the structural template, and returns the top-scoring parses of the sequence into multiple rungs. For details of the algorithm refer to our RECOMB 2001 extended abstract, which can be downloaded here.

What it doesn't do: BetaWrap uses very little information about the sequences of the known beta-helices. It doesn't perform sequence comparisons to the known beta-helices or any other sequences in the PDB (you do have the option of running an additional search against a profile HMM built from the known sequences, see below). You should definitely do these comparisons for any sequences you are interested in, using for example the NCBI's BLAST service. BetaWrap is not a threading program, per se, in that it doesn't compare the sequence to any other possible template structures. As a result it is much faster than threading programs, but it won't notice if your sequence, which might make a mediocre beta-helix, would in fact make a fantastic transmembrane beta-barrel. You should consider using some version of threading or profile program for sequences picked out by BetaWrap (for example 3D-PSSM). But don't be concerned if threading doesn't support BetaWrap's prediction (as long as it doesn't find a highly significant alternative hit) -- the threading programs we tried did not do so well in recognizing similarity between many of the known beta-helices.

How well it works: BetaWrap has been shown to distinguish between beta-helices and non-beta-helices when run on a non-redundant version of the PDB. In addition, a seven-fold cross-validation indicated that BetaWrap is able to recognize beta-helices from one family when trained on structures from the other families. This gives us hope that the algorithm can recognize novel beta-helices from their sequence alone. Prediction of protein structure in the absence of detectable sequence similarity is still a risky business, however, and we have found sequences in larger databases which score well under the algorithm but which are not likely to have a beta-helical structure. Our experience indicates that the majority of these likely false-positives have a detectable sequence repeat. Because BetaWrap rewards for like-on-like stacking of certain residue types in the core of the beta-helix, it is occasionally fooled by sequences whose repeat lengths and sequences match up with the rung template. Thus a very significant score for a protein with a sequence repeat of less than 40 residues should be considered with some caution. The known right-handed beta-helices do not have detectable repeats at the sequence level. As described below, we offer the option of searching for two families of repeats which have occasionally fooled the algorithm. Reassuringly, both of these families have coiled folds like the right-handed beta-helix: one forms a left-handed beta-helix, the other an alpha-beta coiled fold.


Additional search options

Rung profile search: BetaWrap uses minimal sequence information about the known beta-helices. As a result, if the query has detectable similarity to the known sequences, higher confidence predictions and/or better alignments may be achieved using a search which incorporates sequence information. Using the HMMER package, we have constructed a profile HMM using structural alignments of multiple rungs from the known beta-helices. We have found this search tool to be fairly sensitive; it can be included in your search by checking the appropriate box at the bottom of the query form.

Pfam searches: As mentioned above, BetaWrap is occasionally fooled by sequences with sequence repeats. This has been primarily observed for sequences in two families: the hexapeptide repeat family and the leucine rich repeat family. The protein family database Pfam has generated profile HMM's for both of these families, and we've included the option of searching for these repeats at the bottom of the query form.


Interpreting your results

Statistical significance

Each sequence is assigned a raw score by the BetaWrap algorithm. This score measures the compatibility of the sequence with a beta-helical structure. A P-value is attached to this score which gives a rough estimate of the likelihood that a randomly chosen non-beta-helix sequence from the PDB would attain a similar score. Note that this P-value depends only the raw score -- it doesn't take into account either the length of the query sequence or the total number of query sequences. The P-value is estimated by fitting a normal distribution to the scores of the non-beta-helix sequences in a non-redundant version of the PDB. You can think of it as a more meaningful re-scaling of the raw score. P-values less than about 0.01 are worth a second look. In our experience, false positives with very good scores generally have a detectable sequence repeat; examples include a few of the leucine rich repeat and hexapeptide repeat proteins. You have the option of searching for these classes of proteins -- see the bottom of the query form.

The Pfam and HMMER results attach an E-value to sequence hits. This E-value is an estimate of the expected number of hits with equal or better scores, given the number of query sequences. These E-values are estimated empirically by a calibration process involving random sequences.

Wrap descriptions

A single rung of the prototypical beta-helix has three strands separated by three turns. Two of these turns are quite variable, but the T2 turn, which separates beta strands B2 and B3, is very well-conserved across the fold. BetaWrap exploits the fact that this turn is almost always two residues long, which means that the position of a rung can be described by two sequence positions -- the start of beta strand B1 and the start of beta strand B2. These are the two positions given for each of the rungs of wraps described in the search results. The residues are indicated by i and j in the figure to the right, which is a Rasmol image of residues 242-263 of Pectate Lyase C from Erwinia chrysanthemi. The structure of Pectate Lyase C was solved by Yoder, Keen, and Jurnak in 1993.


Pfam HMM's, Copyright (C) 1996-1999 Pfam Consortium.
HMMER software, Copyright (C) 1992-1998 Washington University School of Medicine.