Structural Classification of Proteins database
Content | |
---|---|
Description | Protein Structure Classification |
Contact | |
Research center | Laboratory of Molecular Biology |
Authors | Alexey G. Murzin, Steven E. Brenner, Tim J. P. Hubbard, and Cyrus Chothia |
Release date | 1994 |
Access | |
Website | http://scop.mrc-lmb.cam.ac.uk/scop/ |
The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.
The SCOP database is freely accessible on the internet. SCOP was created in 1994 in the Centre for Protein Engineering and the Laboratory of Molecular Biology.[1] It was maintained by Alexey G. Murzin and his colleagues in the Centre for Protein Engineering until its closure in 2010 and subsequently at the Laboratory of Molecular Biology in Cambridge, England.[2][3][4] As of January 2014, the work on SCOP has been discontinued and the last official version of SCOP is 1.75 (released June 2009). The prototype of a new Structural Classification of Proteins 2 (SCOP2) database has been made publicly available. SCOP2 defines a new approach to the classification of proteins that is essentially different from SCOP, but retains its best features.
Hierarchical structure
The source of protein structures is the Protein Data Bank. The unit of classification of structure in SCOP is the protein domain. What the SCOP authors mean by "domain" is suggested by their statement that small proteins and most medium-sized ones have just one domain,[5] and by the observation that human hemoglobin,[6] which has an α2β2 structure, is assigned two SCOP domains, one for the α and one for the β subunit.
The shapes of domains are called "folds" in SCOP. Domains belonging to the same fold have the same major secondary structures in the same arrangement with the same topological connections. 1195 folds are given in SCOP version 1.75. Short descriptions of each fold are given. For example, the "globin-like" fold is described as core: 6 helices; folded leaf, partly opened. The fold to which a domain belongs is determined by inspection, rather than by software.
The levels of SCOP are as follows.
- Class: Types of folds, e.g., beta sheets.
- Fold: The different shapes of domains within a class.
- Superfamily: The domains in a fold are grouped into superfamilies, which have at least a distant common ancestor.
- Family: The domains in a superfamily are grouped into families, which have a more recent common ancestor.
- Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein.
- Species: The domains in "protein domains" are grouped according to species.
- Domain: part of a protein. For simple proteins, it can be the entire protein.
The folds are grouped into "classes". The classes are the top level, or "root" of the SCOP hierarchical classification. The classes are displayed something like this:
- Classes:
- a. All alpha proteins [46456] (284)
- Domains consisting of α-helices
- b. All beta proteins [48724] (174)
- Domains consisting of β-sheets
- c. Alpha and beta proteins (a/b) [51349] (147)
- Mainly parallel beta sheets (beta-alpha-beta units)
- d. Alpha and beta proteins (a+b) [53931] (376)
- Mainly antiparallel beta sheets (segregated alpha and beta regions)
- e. Multi-domain proteins (alpha and beta) [56572] (66)
- Folds consisting of two or more domains belonging to different classes
- f. membrane and cell surface proteins and peptides [56835] (58)
- Does not include proteins in the immune system
- g. Small proteins [56992] (90)
- Usually dominated by metal ligand, heme, and/or disulfide bridges
- h. coiled-coil proteins [57942] (7)
- Not a true class
- i. Low resolution protein structures [58117] (26)
- Peptides and fragments. Not a true class
- j. Peptides [58231] (121)
- peptides and fragments. Not a true class.
- k. Designed proteins [58788] (44)
- Experimental structures of proteins with essentially non-natural sequences. Not a true class
- a. All alpha proteins [46456] (284)
The number in brackets, called a "sunid", is a SCOP unique integer identifier for each node in the SCOP hierarchy. The number in parentheses indicates how many elements are in each category. For example, there are 284 folds in the "All alpha proteins" class. Each member of the hierarchy is a link to the next level of the hierarchy.
The first few folds of the 284 folds in the "All-α proteins" class are displayed something like the following.
- Folds:
- 1. Globin-like [46457] (2)
- core: 6 helices; folded leaf, partly opened
- 2. Long alpha-hairpin [46556] (20)
- 2 helices; antiparallel hairpin, left-handed twist
- 3. Type I dockerin domain [63445] (1)
- tandem repeat of two calcium-binding loop-helix motifs, distinct from the EF-hand
- 1. Globin-like [46457] (2)
Each fold is followed by a description of that fold.
The domains within a fold are further classified into superfamilies, which, in turn, are classified into families. Within a fold, domains belonging to the same superfamily are assumed to have a common ancestor. However, this ancestor is presumed to be distant, because the different members of a superfamily have low sequence identities. The two superfamilies of the "Globin-like" fold are displayed something like the following:
- Superfamilies:
- Globin-like [46458] (4)
- alpha-helical ferredoxin [46548] (2) contains two Fe4-S4 clusters
No description is given for the "Globin-like" superfamily, presumably because its description is very like that of its fold, which has the same name.
Families are more closely related than superfamilies. Domains within a fold are placed in the same family if
- they have at least a 30% similarity in sequences, or, failing that,
- if they have some similarity in sequences, e.g., 15%, and perform the same function.
The similarity in sequence and structure is evidence that these proteins have a closer evolutionary relationship than do proteins in the same superfamily. Sequence tools, such as BLAST, are used to assist in placing domains into superfamilies and families. The four families in the "Globin-like" superfamily of the "Globin-like" fold are displayed something like the following.
- Families:
- Truncated hemoglobin [46459] (6) lack the first helix (A)
- Nerve tissue mini-hemoglobin (neural globin) [74660] (1) lack the first helix but otherwise is more similar to conventional globins than the truncated ones
- Globins [46463] (81) Heme-binding protein
- Phycocyanin-like phycobilisome proteins [46532] (26) oligomers of two different types of globin-like subunits containing two extra helices at the N-terminus binds a bilin chromophore
The families in SCOP may also be referred to using a SCOP concise classification string, sccs, which looks like, e.g., a.1.1.2 for the "Globin" family. The letter identifies the class to which the domain belongs; the following integers identify the fold, superfamily, and family, respectively.[7]
Within a family are protein domains. Proteins are placed in the same protein domain if they are isoforms of each other, or if they are essentially the same protein, but from different species. This is apparently done manually. The "protein domains" are further subdivided into species. ("Protein domains" are not on separate pages in the current release of SCOP; in pre-SCOP, they are on separate pages.) Here is how some of the 81 protein domains of the "Globins" family are displayed.
- Protein Domains:
- 7. Leghemoglobin [46481]
- 1. Yellow lupin (Lupinus luteus) [TaxId: 3873] [46482] (17)
- 2. Soybean (Glycine max), isoform A [TaxId: 3847] [46483] (2)
- 8. Non-symbiotic plant hemoglobin [46484]
- 1. Rice (Oryza sativa) [TaxId: 4530] [46485] (1)
- 9. Hemoglobin, alpha-chain [46486]
- 1. Human (Homo sapiens) [TaxId: 9606] [46487] (192)
- 2. Human (Homo sapiens), zeta isoform [TaxId: 9606] [68937] (1)
- 3. Horse (Equus caballus) [TaxId: 9796] [46488] (19)
- 4. Deer (Odocoileus virginianus) [TaxId: 9874] [46489] (1)
- 7. Leghemoglobin [46481]
The "TaxId" is the taxonomy ID number; it is also a link to the NCBI taxonomy browser, which provides more information about the species to which the protein belongs.
Clicking on a species or isoform brings up a list of domains. Here is how some of the 192 domains of the "Hemoglobin, alpha-chain from Human (Homo sapiens)" protein are displayed.
- PDB Entry Domains:
- 1. 2dn3
- automatically matched to d1abwa1
- complexed with cmo, hem
- 1. region a:2-141 [131583]
- 2. 1ird
- complexed with cmo, hem
- 1. chain a [66286]
- 3. 2dn1
- automatically matched to d1abwa1
- complexed with hem, mbn, oxy
- 1. region a:2-141 [131577]
- 1. 2dn3
Clicking on the PDB numbers is supposed to display the structure of the molecule, but the links are currently broken. (The links do work in pre-SCOP.)
Example
Most pages in SCOP contain a search box. Entering "trypsin +human" retrieves several proteins, including the protein trypsinogen from humans. Selecting that entry displays a page that includes the "lineage", which is at the top of most SCOP pages. The page includes the following information.
- Lineage:
- 1. Root: scop
- 2. Class: All beta proteins [48724]
- 3. Fold: Trypsin-like serine proteases [50493]
- barrel, closed; n=6, S=8; greek-key
- duplication: consists of two domains of the same fold
- 4. Superfamily: Trypsin-like serine proteases [50494]
- 5. Family: Eukaryotic proteases [50514]
- 6. Protein: Trypsin(ogen) [50515]
- 7. Species: Human (Homo sapiens) [TaxId: 9606] [50519]
Searching for "Subtilisin" brings up the protein, "Subtilisin from Bacillus subtilis, carlsberg", with the following lineage.
- Lineage:
- 1. Root: scop
- 2. Class: Alpha and beta proteins (a/b) [51349]
- Mainly parallel beta sheets (beta-alpha-beta units)
- 3. Fold: Subtilisin-like [52742]
- 3 layers: a/b/a, parallel beta-sheet of 7 strands, order 2314567; left-handed crossover connection between strands 2 & 3
- 4. Superfamily: Subtilisin-like [52743]
- 5. Family: Subtilases [52744]
- 6. Protein: Subtilisin [52745]
- 7. Species: Bacillus subtilis, carlsberg [TaxId: 1423] [52746]
Although both of these proteins are proteases, they do not even belong to the same fold, which is consistent with them being an example of convergent evolution.
Comparison to other classification systems
SCOP classification is more dependent on manual decisions than the semi-automatic classification by CATH, its chief rival. Human expertise is used to decide whether certain proteins are evolutionary related and therefore should be assigned to the same superfamily, or their similarity is a result of structural constraints and therefore they belong to the same fold. Another database, FSSP, is purely automatically generated (including regular automatic updates) but offers no classification, allowing the user to draw their own conclusion as to the significance of structural relationships based on the pairwise comparisons of individual protein structures.
SCOP successors
By 2009, the original SCOP database manually classified 38,000 PDB entries into a strictly hierarchical structure. With the accelerating pace of protein structure publications, the limited automation of classification could not keep up, leading to a non-comprehensive dataset. The Structural Classification of Proteins extended (SCOPe) database was released in 2012 with far greater automation of the same hierarchical system and is full backwards compatible with SCOP. In 2014, manual curation was reintroduced into SCOPe to maintain accurate structure assignment. As of February 2015, SCOPe 2.05 classified 71,000 of the 110,000 total PDB entries.[8]
SCOP2 is a prototype classification system that aims to more the evolutionary complexity inherent in protein structure evolution. It is therefore not a simple hierarchy, but a network connecting protein superfamilies representing structural and evolutionary relationships such as circular permutations, domain fusion and domain decay. Consequently, domains are not separated by strict fixed boundaries, but rather are defined by their relationships to the most similar other structures. As of February 2015, the SCOP2 prototype classifies 995 PDB entries.[8]
See also
References
- ↑ Andreeva, A.; Howorth, D.; Chandonia, J. -M.; Brenner, S. E.; Hubbard, T. J. P.; Chothia, C.; Murzin, A. G. (2007). "Data growth and its impact on the SCOP database: New developments". Nucleic Acids Research. 36 (Database issue): D419–D425. doi:10.1093/nar/gkm993. PMC 2238974. PMID 18000004.
- ↑ Hubbard, T. J.; Ailey, B.; Brenner, S. E.; Murzin, A. G.; Chothia, C. (1999). "SCOP: A Structural Classification of Proteins database". Nucleic Acids Research. 27 (1): 254–256. doi:10.1093/nar/27.1.254. PMC 148149. PMID 9847194.
- ↑ Lo Conte, L.; Ailey, B.; Hubbard, T. J.; Brenner, S. E.; Murzin, A. G.; Chothia, C. (2000). "SCOP: A Structural Classification of Proteins database". Nucleic Acids Research. 28 (1): 257–259. doi:10.1093/nar/28.1.257. PMC 102479. PMID 10592240.
- ↑ Andreeva, A.; Howorth, D.; Brenner, S. E.; Hubbard, T. J.; Chothia, C.; Murzin, A. G. (2004). "SCOP database in 2004: Refinements integrate structure and sequence family data". Nucleic Acids Research. 32 (90001): D226–D229. doi:10.1093/nar/gkh039. PMC 308773. PMID 14681400.
- ↑ Murzin, A. G.; Brenner, S.; Hubbard, T.; Chothia, C. (1995). "SCOP: A structural classification of proteins database for the investigation of sequences and structures" (PDF). Journal of Molecular Biology. 247 (4): 536–540. doi:10.1016/S0022-2836(05)80134-2. PMID 7723011.
- ↑ PDB: 2DN1; Park SY, Yokoyama T, Shibayama N, Shiro Y, Tame JR (July 2006). "1.25 Å resolution crystal structures of human haemoglobin in the oxy, deoxy and carbonmonoxy forms". J. Mol. Biol. 360 (3): 690–701. doi:10.1016/j.jmb.2006.05.036. PMID 16765986.
- ↑ Lo Conte, L.; Brenner, S. E.; Hubbard, T. J.; Chothia, C.; Murzin, A. G. (2002). "SCOP database in 2002: Refinements accommodate structural genomics". Nucleic Acids Research. 30 (1): 264–267. doi:10.1093/nar/30.1.264. PMC 99154. PMID 11752311.
- 1 2 "What is the relationship between SCOP, SCOPe, and SCOP2". scop.berkeley.edu. Retrieved 2015-08-22.
External links
- Structural Classification of Proteins
- Structural Classification of Proteins extended - The more automated successor of SCOP
- Structural Classification of Proteins 2 - The prototype of a new non-hierarchical classification system with more detailed representation of complex evolutionary relationships
- pre-SCOP - The developmental, or "preview" version of SCOP that will become the next released version.
- SUPERFAMILY - Library of HMMs representing SCOP superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
- Protein Structure Classification - a book chapter that discusses different protein classifications in detail.