VirusDetect with own genome
Description
This tool runs the VirusDetect pipeline, that performs virus identification using sRNA sequencing data.
Given a FASTQ file, it performs de novo assembly and reference-guided assembly by
aligning sRNA reads to the reference database of known viruses. The assembled contigs are
compared to the reference virus sequences for virus identification.
More detailed definition of Virus detect pipeline can be found from
the
home page of VirusDetect.
If possible, the input data should be cleaned from sequences originating from the genome of the host organism.
This can be done by mapping the query sequences against the genome of the host organisms and selecting
only those reads that do not match to the host genome.
If the host genonome is not available or it is available in Chipter, you should use the
VirusDetect tool in stead of this tool.
If the genome of the host organism is not available in Chipster, but you have the host genome
as a fasta formatted sequence file, you can use this tool to automatically calculate the required
BWA indexes and the host genome filtering for your input data. You should note that it may take several
hours to calculate BWA indexes for a large genome.
When the VirusDetect analysis is finished, the BWA indexes of the host genome are returned as
one tar-formatted archive file. This archive file can be used as input file, in stead of the fasta formatted genome, for succeeding
DirusDetect and BWA jobs so that you don't have to repeat the time consuming indexing process.
Input Files
This tool requires two input files:
- Input data (the reads) should be given as FASTQ formatted sequence file.
- Host genome should be given as fasta formatted sequence file or as tar formatted BWA index file from previous VirusDetect job.
Parameters
- Reference virus database VirusDetect is mainly used for detecting plant viruses,
but you can use it for other viruses too. Use this parameter to select a virus reference database
matching you virus type.
- Reference virus coverage cuttoff Coverage cutoff of a reported virus contig by reference virus sequences.
- Assembled virus contig cuttoff Coverage cutoff of a reported virus reference sequences by assembled virus contigs.
- Depth cutoff Depth cutoff of a reported virus reference
Output
VirusDetect produces large amount of different files as reports. Output related options are used to
select, what data is returned. By default VirusDtetect returns following files:
- virusdetect_contigs.fa Sequences of non-redundant contigs derived through reference-guided and de novo assemblies.
- contig_sequences.undetermined.fa Sequences of contigs that do not match to virus references.
- blastn_matching_refrences.html files listing reference viruses that have corresponding virus contigs identified by BLASTN. A pdf formatted report file is returned for each match
- blastn_matches.tsv a table of blastn matches to the reference virus database.
- blastx_matching_refrences.html files listing reference viruses that have corresponding virus contigs identified by BLASTX. A pdf formatted report file is returned for each match
- blastx_matches.tsv a table of blastn matches to the reference virus database.
- hostgenome_bwa_index.tar BWA indexes for the given host genome.
If parameter Return matching reference sequences is turned on the also following files are returned
- blastn_matching_references.fa and .fai. Virus reference sequences that produced hits for blastn search with the potential virus contigs.
- blastn_matching_references.fa and .fai. Virus reference sequences that produced hits for blastn search with the potential virus contigs.
If parameter Return matching reference sequences is turned on the also following files are returned
- blastn_matches.bam and .bai. BAM file containing the blastn alignment of each contig to its corresponding virus reference sequences.
- blastx_matches.bam and .bai. BAM file containing the blastx alignment of each contig to its corresponding virus reference sequences.
Note: If you select both the blastn_matching_references.fa + .fai and blastn_matches.bam + .bai,
(or the corresponding blastx files) you can use the Genome Browser to visualize the BLAST reulst.
In the Genome Browser the blastn_matching_references.fa should be assigned to be used as the genome.
Each reference virus sequence is then listed in the Chromosome pull down menu.
If parameter Return results in one archive file is selected, the all the outputfiles are stored to
a single tar formatted output file. This feature is useful if you run VirusDetetc to several input files in
the same time. The tar formatted output file can be expanded with tool Extract .tar.gz file.