FASTA Format |
Each sequence of inputting multiple sequences is in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a ">" symbol in the first column. For example:
>Q6PEC4,RAT
MPTIKLQSSDGEIFEVDVEIAKQSVTIKTMLEDLGMDDEGDDDPVPLPNVNAAILKKVIQ
WCTHHKDDPPPPEDDENKEKRTDDIPVWDQEFLKVDQGTLFELILAANYLDIKGLLDVTC
KTVANMIKGKTPEEIRKTFNIKNDFTEEEEAQVRKENQWCEEK
>B2GUZ0,RAT
MHRKHLQEIPDQSSNVTTSFTWGWDSSKTSELLSGMGVSALEKEEVDSENIPHGLLSNLG
HPQSPPRKRLKSKGSDKDFVIIRRPKLNRENFPGVSWDSLPDELLLGIFSCLCLPELLRV
SGVCKRWYRLSLDESLWQSLDLAGKNLHPDVTVRLLSRGVVAFRCPRSFMEQPLGESFSS
FRVQHMDLSNSVINVSNLHGILSECSKLQNLSLEGLQLSDPIVTTLAQNENLVRLNLCGC
SGFSESAVATLLSSCSRLDELNLSWCFDFTEKHVQAAVAHLPDTLTQLNLSGYRKNLQKT
DLCTLIKRCPNLVRLDLSDSIMLKNDCFPEFFQLNYLQHLSLSRCYDIIPETLLELGEIP
TLKTLQVFGIVPDGTLQLLREALPRLQINCAYFTSIARPTMDNKKNPEIWGIKCRLTLQK
PSL
>B2RZ99,RAT
MSHKQIYYSDKYDDEEFEYRHVMLPKDIAKLVPKTHLMSESEWRNLGVQQSQGWVHYMIH
EPEPHILLFRRPLPKKPKK
|
|
|
Two sequences of inputting two sequences are in FASTA format. For example:
>P07321,Mus
MGVPERPTLLLLLSLLLIPLGLPVLCAPPRLICDSRVLERYILEAKEAENVTMGCAEGPR
LSENITVPDTKVNFYAWKRMEVEEQAIEVWQGLSLLSEAILQAQALLANSSQPPETLQLH
IDKAISGLRSLTSLLRVLGAQKELMSPPDTTPPAPLRTLTVDTFCKLFRVYANFLRGKLK
LYTGEVCRRGDR
>P14753,Mus
MDKLRVPLWPRVGPLCLLLAGAAWAPSPSLPDPKFESKAALLASRGSEELLCFTQRLEDL
VCFWEEAASSGMDFNYSFSYQLEGESRKSCSLHQAPTVRGSVRFWCSLPTADTSSFVPLE
LQVTEASGSPRYHRIIHINEVVLLDAPAGLLARRAEEGSHVVLRWLPPPGAPMTTHIRYE
VDVSAGNRAGGTQRVEVLEGRTECVLSNLRGGTRYTFAVRARMAEPSFSGFWSAWSEPAS
LLTASDLDPLILTLSLILVLISLLLTVLALLSHRRTLQQKIWPGIPSPESEFEGLFTTHK
GNFQLWLLQRDGCLWWSPGSSFPEDPPAHLEVLSEPRWAVTQAGDPGADDEGPLLEPVGS
EHAQDTYLVLDKWLLPRTPCSENLSGPGGSVDPVTMDEASETSSCPSDLASKPRPEGTSP
SSFEYTILDPSSQLLCPRALPPELPPTPPHLKYLYLVVSDSGISTDYSSGGSQGVHGDSS
DGPYSHPYENSLVPDSEPLHPGYVACS
|
|
|
One sequence of inputting a sequence is in FASTA format. For example:
>B2GUZ0,RAT
MHRKHLQEIPDQSSNVTTSFTWGWDSSKTSELLSGMGVSALEKEEVDSENIPHGLLSNLG
HPQSPPRKRLKSKGSDKDFVIIRRPKLNRENFPGVSWDSLPDELLLGIFSCLCLPELLRV
SGVCKRWYRLSLDESLWQSLDLAGKNLHPDVTVRLLSRGVVAFRCPRSFMEQPLGESFSS
FRVQHMDLSNSVINVSNLHGILSECSKLQNLSLEGLQLSDPIVTTLAQNENLVRLNLCGC
SGFSESAVATLLSSCSRLDELNLSWCFDFTEKHVQAAVAHLPDTLTQLNLSGYRKNLQKT
DLCTLIKRCPNLVRLDLSDSIMLKNDCFPEFFQLNYLQHLSLSRCYDIIPETLLELGEIP
TLKTLQVFGIVPDGTLQLLREALPRLQINCAYFTSIARPTMDNKKNPEIWGIKCRLTLQK
PSL
|
PDB identifier |
PDB (Protein Data Bank)
identifier is specified in a 4-character PDB assigned identifier.
Users must input a PDB code which is presented in PDB before Dec 25, 2009,
otherwise, the user needs to upload a protein 3D structure file with PDB format.
|
E-value |
The E-value specifies the statistical significance of an alignment to obtain an indication of the reliability of the searching.
This setting is a threshold for reporting matching protein sequences against sequence database.
We followed previous works (Matthews et al., 2001;
Yu et al., 2004) to define 10-10 as the default value.
Operationally, homologous proteins can be defined as having an E-value 10-10 from BLAST.
If the E-value is greater than 10-10, the match will not be reported.
Lower E-value is more stringent, causing to fewer number of matches being reported.
|
Joint Z-score |
The residues of homologous proteins in each protein complex are considered as a binomial distribution.
With a random set of residues in homologous protein, the binomial distribution is capable to approach normal distribution.
Then, the interacting residues can be estimated by the Z-score of homologous proteins.
Joint Z-score is a quantitative degree to measure the similarity between two protein complexes.
We define the joint complex similarity as |
|
where Z1 denotes the Z-score of proteins 1 and its homolog 1'; and Zn is Z-score of proteins n and its homolog n' and so on.
|
Homologous Complexes in Each Species ( Ranking by Joint Z-score ) |
This server will return all of homologous complexes of each species, which are ranked by joint Z-score of each homologous complex.
|