Supplementary Information for:
Singh R, Xu J, and Berger B. Struct2Net: Integrating Structure Into Protein-Protein Interaction Prediction Submitted to the Pacific Symposium on Biocomputation, 2006.
A list of the 1000 less-characterized proteins is here. These proteins were chosen because relatively little data is available for them. As mentioned in the paper, our method uses 6 existing genomic/proteomic features: co-expression, co-essentiality, co-location, similarity in GO terms, similarity in MIPS terms, and interacting domains. For the set of less-characterized proteins, we counted the number of features with available data for each possible protein-pair. The histogram of the number of available features (for each protein-pair) is shown below. About 94% of the pairs have 2 or less features available, i.e. had atleast 66% of their feature set is missing. All the protein-pairs had atleast 2 missing features.
Clearly, use of structure-based methods would be valuable here. Using only the structure-based method, predictions were made for all possible protein-pairs in this set and the top 2000 scoring pairs (as per logistic regression) were chosen. This set of pairs is here.
Why top 2000? (and not, say, top 200): Based on current estimates, we assume that the yeast interactome has about 30,000 interactions. Also, yeast has about 6000 proteins. Suppose the size of the yeast interaction graph scales linearly with the number of proteins; then the size of a 1000-protein sub-network will be 30K * (1000/6000) = 5000. If the size of the interaction graph scales quadratically with the number of protein then the size of a 1000-protein sub-network will be 30K * (1000^2/6000^2) = 833. The true yeast interaction network is a scale-free network that scales somewhere between linearly and quadratically, with number of nodes. Thus, the number of interactions in a 1000-protein sub-network should be somewhere between 833 and 5000. We chose a cutoff between the two: 2000.
| Yeast Gene | Disease related to the Human homolog | Brief Disease Description | Predicted Interactions | Comments |
| PAT1 | Adrenoleukodystrophy (ALD) | ABC transporter; neurodegenerative disease | 26 | Set of predicted interactors enriched for lipid and fatty acid transport |
| RAD28 | Cockayne syndrome, (CSA) | Transcription-coupled repair;progressive neurological dysfunction;photosensitivity | 19 | Many DNA repair proteins in the set of predicted interactors |
| PEX7 | Rhizomelic chondrodysplasia punctata | Peroxisomal biogenesis disorder | 25 | |
| YAT1 | Carnitine palmitoyltransferase | Lipid metabolism defect; cardiomyopathy | 19 | Set of predicted interactors enriched for protein-misfolding related proteins and chaperones |
| TPI1 | Triosephosphate isomerase | Chronic hemolytic anemia and neuromuscular disorders | 16 | Set of predicted interactors enriched for hexose and monosaccharide metabolism |
| ADE13 | Adenylosuccinate lyase | Purine nucleotide biosynthesis defect; autism features | 4 |