Retrieve listed entries from public databases

Description

This tool retrieves sequence files or other database entries from public databases. The available databases are:

  • UniProt
  • EMBL
  • Ensembl Gene
  • Ensembl Transcript
  • Ensembl Genomes Gene
  • Ensembl Genomes Transcript
  • InterPro
  • MEDLINE
  • PDB
  • RefSeq nucleoide
  • RefSeq protein
  • Taxonomy
  • Trace Archive
  • Parameters

  • Database database to be used.
  • Input file

    The input file should be a list of database entries or names to be retrieved from the selected database. Each entry name should be on a separate row in the entry name file. For example, a file containing following accession numbers would retrieve five protein sequences from UniProt database:

    A2A1A1
    A5ABL2
    G7J032
    O04298
    O24248
    

    Note that this tool retrieves data only based on the entry names or accession numbers. However you can use wildcard character (*) to retrieve several entries that share similar identities. For example in the case of UniProt database name ABC* would retrieve all the UniProt sequences that whose name start with ABC.

    Only the first word of each row will be used as a sequence identifier and rest of the line is ignored. Thus the sequence name lines may contain comments and other information after the sequence ID. Also a table, where the first column contains the sequence name or ID, can be used as a name list.

    For example following table could be use to retrieve the full database entries of the listed five proteins from the UniProt database.

    NCS2_COPJACoptis japonica (Japanese goldthread)
    INUA_ASPNCAspergillus niger (strain CBS 513.88 / FGSC A1513)
    PHBP_MEDTRMedicago truncatula (Barrel medic) (Medicago tribuloides)
    DAU1_DAUCADaucus carota (Wild carrot)
    PRU1_PRUAVPrunus avium (Cherry) (Cerasus avium)

    Output

    Output is a text file. The format depends on the database selected. In the case of sequence databases, a full database entry, that contains all the annotation, is retrieved. This data can be further translated into FASTA format with the Sequence format conversion tool , if needed.