Help

FASTA Format

Each sequence of inputting multiple sequences is in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a ">" symbol in the first column. For example:

>Q6PEC4,RAT
MPTIKLQSSDGEIFEVDVEIAKQSVTIKTMLEDLGMDDEGDDDPVPLPNVNAAILKKVIQ
WCTHHKDDPPPPEDDENKEKRTDDIPVWDQEFLKVDQGTLFELILAANYLDIKGLLDVTC
KTVANMIKGKTPEEIRKTFNIKNDFTEEEEAQVRKENQWCEEK
>B2GUZ0,RAT
MHRKHLQEIPDQSSNVTTSFTWGWDSSKTSELLSGMGVSALEKEEVDSENIPHGLLSNLG
HPQSPPRKRLKSKGSDKDFVIIRRPKLNRENFPGVSWDSLPDELLLGIFSCLCLPELLRV
SGVCKRWYRLSLDESLWQSLDLAGKNLHPDVTVRLLSRGVVAFRCPRSFMEQPLGESFSS
FRVQHMDLSNSVINVSNLHGILSECSKLQNLSLEGLQLSDPIVTTLAQNENLVRLNLCGC
SGFSESAVATLLSSCSRLDELNLSWCFDFTEKHVQAAVAHLPDTLTQLNLSGYRKNLQKT
DLCTLIKRCPNLVRLDLSDSIMLKNDCFPEFFQLNYLQHLSLSRCYDIIPETLLELGEIP
TLKTLQVFGIVPDGTLQLLREALPRLQINCAYFTSIARPTMDNKKNPEIWGIKCRLTLQK
PSL
>B2RZ99,RAT
MSHKQIYYSDKYDDEEFEYRHVMLPKDIAKLVPKTHLMSESEWRNLGVQQSQGWVHYMIH
EPEPHILLFRRPLPKKPKK

Two sequences of inputting two sequences are in FASTA format. For example:

>P07321,Mus
MGVPERPTLLLLLSLLLIPLGLPVLCAPPRLICDSRVLERYILEAKEAENVTMGCAEGPR
LSENITVPDTKVNFYAWKRMEVEEQAIEVWQGLSLLSEAILQAQALLANSSQPPETLQLH
IDKAISGLRSLTSLLRVLGAQKELMSPPDTTPPAPLRTLTVDTFCKLFRVYANFLRGKLK
LYTGEVCRRGDR
>P14753,Mus
MDKLRVPLWPRVGPLCLLLAGAAWAPSPSLPDPKFESKAALLASRGSEELLCFTQRLEDL
VCFWEEAASSGMDFNYSFSYQLEGESRKSCSLHQAPTVRGSVRFWCSLPTADTSSFVPLE
LQVTEASGSPRYHRIIHINEVVLLDAPAGLLARRAEEGSHVVLRWLPPPGAPMTTHIRYE
VDVSAGNRAGGTQRVEVLEGRTECVLSNLRGGTRYTFAVRARMAEPSFSGFWSAWSEPAS
LLTASDLDPLILTLSLILVLISLLLTVLALLSHRRTLQQKIWPGIPSPESEFEGLFTTHK
GNFQLWLLQRDGCLWWSPGSSFPEDPPAHLEVLSEPRWAVTQAGDPGADDEGPLLEPVGS
EHAQDTYLVLDKWLLPRTPCSENLSGPGGSVDPVTMDEASETSSCPSDLASKPRPEGTSP
SSFEYTILDPSSQLLCPRALPPELPPTPPHLKYLYLVVSDSGISTDYSSGGSQGVHGDSS
DGPYSHPYENSLVPDSEPLHPGYVACS

One sequence of inputting a sequence is in FASTA format. For example:

>B2GUZ0,RAT
MHRKHLQEIPDQSSNVTTSFTWGWDSSKTSELLSGMGVSALEKEEVDSENIPHGLLSNLG
HPQSPPRKRLKSKGSDKDFVIIRRPKLNRENFPGVSWDSLPDELLLGIFSCLCLPELLRV
SGVCKRWYRLSLDESLWQSLDLAGKNLHPDVTVRLLSRGVVAFRCPRSFMEQPLGESFSS
FRVQHMDLSNSVINVSNLHGILSECSKLQNLSLEGLQLSDPIVTTLAQNENLVRLNLCGC
SGFSESAVATLLSSCSRLDELNLSWCFDFTEKHVQAAVAHLPDTLTQLNLSGYRKNLQKT
DLCTLIKRCPNLVRLDLSDSIMLKNDCFPEFFQLNYLQHLSLSRCYDIIPETLLELGEIP
TLKTLQVFGIVPDGTLQLLREALPRLQINCAYFTSIARPTMDNKKNPEIWGIKCRLTLQK
PSL
PDB identifier

PDB (Protein Data Bank) identifier is specified in a 4-character PDB assigned identifier.
Users must input a PDB code which is presented in PDB before Dec 25, 2009, otherwise, the user needs to upload a protein 3D structure file with PDB format.


E-value

The E-value specifies the statistical significance of an alignment to obtain an indication of the reliability of the searching. This setting is a threshold for reporting matching protein sequences against sequence database. We followed previous works (Matthews et al., 2001; Yu et al., 2004) to define 10-10 as the default value. Operationally, homologous proteins can be defined as having an E-value 10-10 from BLAST. If the E-value is greater than 10-10, the match will not be reported. Lower E-value is more stringent, causing to fewer number of matches being reported.

Joint Z-score

The residues of homologous proteins in each protein complex are considered as a binomial distribution. With a random set of residues in homologous protein, the binomial distribution is capable to approach normal distribution. Then, the interacting residues can be estimated by the Z-score of homologous proteins.

Joint Z-score is a quantitative degree to measure the similarity between two protein complexes. We define the joint complex similarity as

where Z1 denotes the Z-score of proteins 1 and its homolog 1'; and Zn is Z-score of proteins n and its homolog n' and so on.


Homologous Complexes in Each Species ( Ranking by Joint Z-score )

This server will return all of homologous complexes of each species, which are ranked by joint Z-score of each homologous complex.