Metagenomics / Precluster aligned sequences with Mothur

Description

Given a fasta-formatted alignment and a names file, preclusters sequences in order to remove sequences that are likely to contain sequencing errors.

Parameters

None

Details

The basic idea is that abundant sequences are more likely to generate erroneous sequences than rare sequences. With that in mind, the algorithm proceeds by ranking sequences in order of their abundance. It then walks through the list of sequences looking for rarer sequences that are within one mismatch of the original sequence. Those that are within this threshold are merged with the larger sequence. By pre-clustering you remove a large number of sequences making the distance calculation much faster.

This tool is based on the Pre.cluster command of the Mothur package.

Output

The analysis output consists of the following:

References

Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.