Retrieve sequences from sequence file

Description

Tool to retrieve a set of sequences from a given sequence set based on a list of sequence IDs or names.

Input files

This tool takes two files as an input. One file containing sequence data and another file containing a list for sequence names or ID numbers to be retrieved from the input sequence set.

For example if you have following sequence set in Chipster:

>CAA54483.1 CAA54483.1 Bet v 1 e [Betula pendula]
MGVFNYETEATSVIPAARLFKAFILDGDNLFPKVAPQAISSVENIEGNGGPGTIKKISFP
EGIPFKYVKGRVDEVDHTNFKYSYSVIEGGPVGDTLEKISNEIKIVATPNGGSILKINNK
YHTKGDHEVKAEQIKASKEMGETLLRAVESYLLAHSDAYN
>CAA54482.1 CAA54482.1 Bet v 1 d [Betula pendula]
MGVFNYEIETTSVIPAARLFKAFILDGDNLVPKVAPQAISSVENIEGNGGPGTIKKINFP
EGFPFKYVKDRVDEVDHTNFKYNYSVIEGGPVGDTLEKISNEIKIVATPDGGCVLKISNK
YHTKGNHEVKAEQVKASKEMGETLLRAVESYLLAHSDAYN
>CAA54481.1 CAA54481.1 Bet v 1 c [Betula pendula]
MGVFNYESETTSVIPAARLFKAFILEGDTLIPKVAPQAISSVENIEGNGGPGTIKKITFP
EGSPFKYVKERVDEVDHANFKYSYSMIEGGALGDTLEKICNEIKIVATPDGGSILKISNK
YHTKGDQEMKAEHMKAIKEKGEALLRAVESYLLAHSDAYN
>CAA54485.1 CAA54485.1 Bet v 1 g [Betula pendula]
MGVFNYESETTSVIPAARLFKAFILEGDNLIPKVAPQAISSVENIEGNGGPGTIKKINFP
EGFPFKYVKDRVDEVDHTNFKYNYSVIEGGPVGDTLEKISNEIKIVATPDGGCVLKISNK
YHTKGNHEVKAEQVKASKEMGETLLRAVESYLLAHSDAYN
>CAA54421.1 CAA54421.1 Bet v 1b [Betula pendula]
MGVFNYETETTSVIPAARLFKAFILEGDTLIPKVAPQAISSVENIEGNGGPGTIKKITFP
EGSPFKYVKERVDEVDHANFKYSYSMIEGGALGDTLEKICNEIKIVATPDGGSILKISNK
YHTKGDHEMKAEHMKAIKEKGEALLRAVESYLLAHSDAYN

Then you could pick the last three sequences from this set with a file containing the following name list:

CAA54481.1
CAA54485.1 v1g
CAA54421.1 v1b
Note that in the name list, only the first word of each row is picked and used as a name of a sequence to be picked. Thus the rest of the line can contain comments or other data.

Also a table, where the first column contains the sequence name or ID, can be used as a name list.

Output

The format of the resulting output sequence file is the same as the format of the input sequence file.