Hello Mothur Team,
I am analyzing Ion Torrent Data. I had trouble processing and getting output files using dist.seqs and cluster.split most likely due to the computer power I was using. I was curious to see if I used a combo of split.abund and classify.seqs. Below are the commands in my batch file. I was using the RDP database and have now switched to SILVA. I seem to be getting similar data, but of course, with more microbes this time.
My Questions
- Are there any suggested improvements to my commands to get even better data when using split.abund?
- After bringing my finding to my group it was suggested that I decrease the criteria parameter to see if less unclassified microbes would appear. Would decreasing (50,80) or increasing (100) the criteria number for the sceen.seqs command make a difference in the number of unclassified microbes that popup?
- I finally got approval for a virtual machine via a cloud computing service. Once I get this setup, should I revisit dist.seqs and cluster.split? I am interested in data at the genus level.
Commands
set.logfile(name=Testing123LogFile)
fastq.info(fastq=Testing123.fastq)
get.current()
trim.seqs(fasta=current, oligos=MothurOligos.txt, maxambig=0, maxhomop=6, bdiffs=0, pdiffs=0, minlength=265, keepfirst=285, flip=F, processors=8)
summary.seqs(fasta=current, processors=8)
unique.seqs(fasta=current)
get.current()
summary.seqs(fasta=current, processors=8)
count.seqs(name=current, group=current)
get.current()
align.seqs(fasta=current, reference=silva.v4.fasta, processors=6)
summary.seqs(fasta=current, count=current, processors=8)
get.current()
screen.seqs(fasta=current, count=current, summary=current, optimize=start-end-minlength-maxlength, criteria=95, processors=8)
summary.seqs(fasta=current, count=current, processors=8)
get.current()
filter.seqs(fasta=current, vertical=T, trump=.)
get.current()
summary.seqs(fasta=current, count=current, processors=8)
unique.seqs(fasta=current, count=current)
summary.seqs(fasta=current, count=current, processors=8)
get.current()
pre.cluster(fasta=current, count=current, diffs=2)
summary.seqs(fasta=current, count=current, processors=8)
get.current()
chimera.vsearch(fasta=current, count=current, dereplicate=t)
remove.seqs(accnos=current, fasta=current)
get.current()
summary.seqs(fasta=current, count=current, processors=8)
classify.seqs(fasta=current, count=current, template=silva.nr_v138_1.align, taxonomy=silva.nr_v138_1.tax, cutoff=80, processors=8)
get.current()
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Eukaryota)
summary.seqs(fasta=current, count=current, processors=8)
get.current()
summary.tax(taxonomy=current, count=current)
get.current()
rename.file(fasta=current, count=current, taxonomy=current, prefix=FinalCurationComplete)
split.abund(fasta=current, count=current, cutoff=1)
set.current(fasta=FinalCurationComplete.abund.fasta, count=FinalCurationComple.abund.count_table)
get.current()
set.current(fasta=FinalCurationComplete.abund.fasta, count=FinalCurationComple.abund.count_table)
get.current()
classify.seqs(fasta=current, count=current, reference=silva.nr_v138_1.align, taxonomy=silva.nr_v138_1.tax, cutoff=80 )