HOMOLOGY API

This homology engine is geared towards proteomics, as opposed to genomics (i.e. blastp) that use genetic-based substitution matrices and scoring statistics that typically hide highly redundant sequences. In addition this engine is significantly faster than any current methods. The API for JVLN Homology allows for the Single Amino Acid Polymorphism (SAP) retrieval of any given amino-acid combination, with special consideration for proteotypic sequences.

Why have a peptide homology engine?

There exists a current need to effectively determine the homology of both short and long sequences, with specific regards to minimal substitution.

Besides being faster (millisecond response times) and deeper than current methods (covering all of the UniProt, UniRef and TrEMBL knowledge base), this engine is designed to be instrumental in the current search debate [1:3] of weather to search the all-organism database and filter to your organism or just search your organism and deal with possible inaccurate FDRs. The case-in-point would be finding a great statistical sequence match that doesn’t belong to your target organism, and gets tossed at filtering. A peptide specific homology engine would fill that gap by finding potentially unknown yet biologically relevant Single Amino Acid Polymorphisms (SAPs) and folding them back into your results.

Additionally, this tools is meant to compliment the needs of denovo sequencing methods that ultimately require biological context for any aggregate down-stream analysis.

References

[1] Noble, W. S. "Mass spectrometrists should search only for peptides they care about." Nat. Methods 12, 605–608 (2015).
[2] Sticker, Adriaan, Lennart Martens, and Lieven Clement. "Mass spectrometrists should search for all peptides, but assess only the ones they care about." Nat. Methods 14.7, 643 (2017).
[3] Noble, W. S. & Keich, U. Response to "Mass spectrometrists should search for all peptides, but assess only the ones they care about". Nat. Methods 14, 644 (2017).

USAGE

URL

./Homology/?key=<str_access_key>&seq=<str_sequence>&sap=<int_substitutions>

Methods

GET or POST
Both methods are compatible with either http or https. It is recommended to use https as this encrypts your data and access token.

Request Parameters :: Summary
parameter type usage default accepted note
seq STRING required * * no spaces
sap STRING 0 0,1,2,3 max 3 substitutions
flat BOOLEAN FALSE presence / absence
Request Parameters :: Detail
Responses:
code type content
200 success see example 1
400 bad request { error : "precursor mass (pm) parameter undefined" }
401 unauthorized { error : "key not found" }
429 too many requests { error : "too many requests" }

Note: values outside of the defined accepted inputs for sap will fail silently


EXAMPLES

Example 01

A simple example with a single result.

API Call
  http://pub.jvln.io/Homology/?key=a1b-2c3-e4f-5&seq=SAMPLER&sap=1
JSON Return Object
  { "sequence": "SAMPLER", 
    "sequence_diff": "1", 
    "result_limit": 10, 
    "result_n": 1, 
    "result_time": "054ms", 
    "result_data": [ { 
                   "peptide": "SAMPLR", 
                   "length": 6, 
                   "score": 0.018, 
                   "edits": 1, 
                   "protein": [ { 
                              "protein_name": 
                              "S39A3_HUMAN", 
                              "protein_desc": 
                              "Zinc transporter ZIP3 OS", 
                              "organism": 
                              "Homo sapiens" } ] 
                   } ] 
  }
Example 02

Using the flag flat to munge the JSON into a table friendly format.

API Call
  http://pub.jvln.io/Homology/?key=a1b-2c3-e4f-5&seq=SAMPLER&sap=1&flat
JSON Return Object
  { "sequence": "SAMPLER", 
    "sequence_diff": "1", 
    "result_limit": 10, 
    "result_n": 1, 
    "result_time": "041ms", 
    "result_data": [ { 
                   "peptide": "SAMPLR", 
                   "length": 6, 
                   "score": 0.018, 
                   "edits": 1, 
                   "protein_name": "S39A3_HUMAN", 
                   "protein_desc": "Zinc transporter ZIP3 OS", 
                   "organism": "Homo sapiens" 
                   } ] 
  }
Example 03

Using the limit flag lim to restrict the total number of unique sequences returned, ranked by score.

API Call
  http://pub.jvln.io/Homology/?key=a1b-2c3-e4f-5&seq=HMALAK&sap=1&lim=2

Note that the limit flag restricts the output of unique sequences and nesting all the associated protein hits. Also note, the return field result_n indicates how many unique peptides are available given the input parameters.

JSON Return Object
  {
    "sequence": "HMALAK",
    "sequence_diff": "1",
    "result_limit": "2",
    "result_n": 19,
    "result_time": "069ms",
    "result_data": [
        {
            "peptide": "HMALAK",
            "length": "6",
            "score": "0.032",
            "edits": "0",
            "protein": [
                {
                    "protein_name": "GYS1_BOVIN",
                    "protein_desc": "Glycogen starch synthase, muscle OS",
                    "organism": "Bos taurus"
                },
                {
                    "protein_name": "GYS1_MOUSE",
                    "protein_desc": "Glycogen starch synthase, muscle OS",
                    "organism": "Mus musculus"
                },
                {
                    "protein_name": "GYS1_RABIT",
                    "protein_desc": "Glycogen starch synthase, muscle OS",
                    "organism": "Oryctolagus cuniculus"
                },
                {
                    "protein_name": "GYS1_RAT",
                    "protein_desc": "Glycogen starch synthase, muscle OS",
                    "organism": "Rattus norvegicus"
                }
            ]
        },
        {
            "peptide": "HDALAK",
            "length": "6",
            "score": "0.008",
            "edits": "1",
            "protein": [
                {
                    "protein_name": "AROE_PARL1",
                    "protein_desc": "Shikimate dehydrogenase NADP+ OS",
                    "organism": "Parvibaculum lavamentivorans strain DS-1 - DSM 13023 - NCIMB 13966"
                }
            ]
        }
    ]
  }