Applied Bioinformatics Group


A   A   A
Sections
Home > Services > MultiLoc > Information

Skip to content. | Skip to navigation

Information

About TargetLoc

TargetLoc is a computational method for predicting protein subcellular location. The method was designed for an improved prediction of subcellular locations which can be grouped according to the presence or absence of an N-terminal targeting sequence. The non-plant version distinguishes between mitochondrial, Secretory Pathway, as well as other proteins and the plant version additionally considers chloroplast proteins. The TargetLoc system was trained using support vector machines and is based on the recognition of N-terminal targeting sequences, the overall amino acid composition, and sequence motifs extracted from the NLSdb (Nair et al., Nucleic Acids Res., 2003, 31, 397-399) as well as the PROSITE (Falquet et al., Nucleic Acids Res., 2002, 30, 235-238) database. There are two calculation layers. In the first layer the three subprediction methods SVMTarget, SVMaac, and MotifSearch deliver a set of numerical or binary attributes, which describe relevant properties of proteins in order to assign them to their target location. The attributes are integrated in a feature vector, which is used as input for the second layer. Finally, for each location a score (probability estimate) is calculated and the protein is assigned to the location with the highest output score.

About MultiLoc

MultiLoc is an extention of TargetLoc with the intention to predict all of the main subcellular location. The predicted locations are the cytoplasm, chloroplasts (only with plants),ER, extracellular space, lysosomes (only with animals), mitochondria, Golgi apparatus, peroxisomes, plasma membrane, and vacuoles (only with plants and fungi). Several additional features have been incorporated into the first layer in order to facilitate for the extended number of localizations to be discriminated by MultiLoc. Furthermore, a subprediction method (SVMSA) for detecting signal anchors (SAs) is used.

Input Instructions

prediction method: The selected prediction method (MultiLoc or TargetLoc) with the corresponding version (animal, fungi, plant or non-plant) determines the set of predictable locations (see section above).

sequence field: In this field you can paste your amino acid sequence in one-letter code. All signs, which are not part of the one-letter code will be ignored.

sequence id: The field recieves an optional id for the sequence, which is pasted in the sequence field.

fasta file field: You can also submit a file containing up to 20 sequences in fasta format. When submitting a fasta file the inputs of the single sequence and sequence id fields will be ignored.

output format: You can choose between the simple output mode or the more detailed advanced output representation.

Output Format

simple output: For each submitted sequence the id,top predicted location, and final output score is presented. The output score is calculated according to the probability estimate procedure contained in the libsvm package

advanced output: For each submitted sequence the id, top three and 4, respectively, locations with belonging final output score is presented. Furthermore the calculation results of all used subprediction methods is shown.

SVMTarget delivers for each kind of N-terminal sequences (cTP, mTP, other, and SP) an output score. High positive (close to zero) values indicate presence (absence) for the corresponding N-terminal sequence. Note that the cTP score is only calculated in the plant version of SVMTarget.

SVMSA delivers an output, which indicates the presence (positive score) or absence (negative score) of a signal anchor.

SVMaac calculates for each predictable location a score based on the overall amino acid compositon. For example nuclear proteins tend to obtain a positive nuclear output score and negative scores otherwise. The output of the nuclear/cytoplasmic (nuc/cyt) discrimination classifier helps to improve the discrimination between the two locations. Here, nuclear proteins tend to have positive values and cytoplasmic proteins to have negative values. This is similar for the mitochondrial/chloroplast (mit/chl) discrimination classifier in the plant versions of MultiLoc and TargetLoc, where mitochondrial proteins tend to have positive values and chloroplast proteins to have negative values. Compared to the nuc/cyt classifier the mit/chl classifier uses the partial amino acid composition of the first 15 N-terminal residues instead of the overall amino acid composition.

MotifSearch searches for sequence motifs, which are relevant for the protein sorting process or suitable for assigning the proteins to the different locations. All attributes obtained from MotifSearch are binary coded. The NLS-monopartite attribute is set to 1 if the consensus pattern K(K|R)X(K|R) is found in the query sequence. Note that this pattern is not very specific, since it is found also in many non-nuclear proteins. If a pattern of the NLSdb database is found the NLS attribute is set to 1 and 0 otherwise. The following attributes are obtained from PROSITE:

  • KDEL-retention motif (endoplasmic reticulum targeting sequence)
  • SKL motif (microbodies C-terminal targeting signal)
  • bipartite-NLS-pattern (bipartite nuclear targeting sequence)
  • DNA associated domains: This attribute is set to 1 if one of 25 selected PROSITE entries occurs in the query sequence. These PROSITE entries represent typical DNA-binding domains like zinc finger, basic-leucine zipper (bZIP), or helix-loop-helix (HLH).
  • plasma membrane receptor domain: The attribute is set to 1, if one of 16 selected PROSITE entries occurs in the query sequence. The PROSITE entries represent different kinds of receptors, which are typical for the plasma membrane.
Note that it is useful to have the two nuclear attributes monopartite-NLS and bipartite-NLS additional to the NLSdb attribute, since NLSdb recognizes only 43% of the nuclear proteins.