This tool retrieves sequence files or other database entries from public databases. The available databases are:
The input file should be a list of database entries or names to be retrieved from the selected database. Each entry name should be on a separate row in the entry name file. For example, a file containing following accession numbers would retrieve five protein sequences from UniProt database:
A2A1A1 A5ABL2 G7J032 O04298 O24248
Note that this tool retrieves data only based on the entry names or accession numbers. However you can use wildcard character (*) to retrieve several entries that share similar identities. For example in the case of UniProt database name ABC* would retrieve all the UniProt sequences that whose name start with ABC.
Only the first word of each row will be used as a sequence identifier and rest of the line is ignored. Thus the sequence name lines may contain comments and other information after the sequence ID. Also a table, where the first column contains the sequence name or ID, can be used as a name list.
For example following table could be use to retrieve the full database entries of the listed five proteins from the UniProt database.
NCS2_COPJA | Coptis japonica (Japanese goldthread) |
INUA_ASPNC | Aspergillus niger (strain CBS 513.88 / FGSC A1513) |
PHBP_MEDTR | Medicago truncatula (Barrel medic) (Medicago tribuloides) |
DAU1_DAUCA | Daucus carota (Wild carrot) |
PRU1_PRUAV | Prunus avium (Cherry) (Cerasus avium) |
Output is a text file. The format depends on the database selected. In the case of sequence databases, a full database entry, that contains all the annotation, is retrieved. This data can be further translated into FASTA format with the Sequence format conversion tool , if needed.