Split.Abund and Advice on Batch File Commands

Hello Mothur Team,

I am analyzing Ion Torrent Data. I had trouble processing and getting output files using dist.seqs and cluster.split most likely due to the computer power I was using. I was curious to see if I used a combo of split.abund and classify.seqs. Below are the commands in my batch file. I was using the RDP database and have now switched to SILVA. I seem to be getting similar data, but of course, with more microbes this time.

My Questions

  1. Are there any suggested improvements to my commands to get even better data when using split.abund?
  2. After bringing my finding to my group it was suggested that I decrease the criteria parameter to see if less unclassified microbes would appear. Would decreasing (50,80) or increasing (100) the criteria number for the sceen.seqs command make a difference in the number of unclassified microbes that popup?
  3. I finally got approval for a virtual machine via a cloud computing service. Once I get this setup, should I revisit dist.seqs and cluster.split? I am interested in data at the genus level.

Commands
set.logfile(name=Testing123LogFile)

fastq.info(fastq=Testing123.fastq)

get.current()

trim.seqs(fasta=current, oligos=MothurOligos.txt, maxambig=0, maxhomop=6, bdiffs=0, pdiffs=0, minlength=265, keepfirst=285, flip=F, processors=8)

summary.seqs(fasta=current, processors=8)

unique.seqs(fasta=current)

get.current()

summary.seqs(fasta=current, processors=8)

count.seqs(name=current, group=current)

get.current()

align.seqs(fasta=current, reference=silva.v4.fasta, processors=6)

summary.seqs(fasta=current, count=current, processors=8)

get.current()

screen.seqs(fasta=current, count=current, summary=current, optimize=start-end-minlength-maxlength, criteria=95, processors=8)

summary.seqs(fasta=current, count=current, processors=8)
get.current()

filter.seqs(fasta=current, vertical=T, trump=.)

get.current()

summary.seqs(fasta=current, count=current, processors=8)

unique.seqs(fasta=current, count=current)

summary.seqs(fasta=current, count=current, processors=8)

get.current()

pre.cluster(fasta=current, count=current, diffs=2)

summary.seqs(fasta=current, count=current, processors=8)

get.current()

chimera.vsearch(fasta=current, count=current, dereplicate=t)

remove.seqs(accnos=current, fasta=current)

get.current()

summary.seqs(fasta=current, count=current, processors=8)

classify.seqs(fasta=current, count=current, template=silva.nr_v138_1.align, taxonomy=silva.nr_v138_1.tax, cutoff=80, processors=8)

get.current()

remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Eukaryota)

summary.seqs(fasta=current, count=current, processors=8)

get.current()

summary.tax(taxonomy=current, count=current)

get.current()

rename.file(fasta=current, count=current, taxonomy=current, prefix=FinalCurationComplete)

split.abund(fasta=current, count=current, cutoff=1)

set.current(fasta=FinalCurationComplete.abund.fasta, count=FinalCurationComple.abund.count_table)

get.current()

set.current(fasta=FinalCurationComplete.abund.fasta, count=FinalCurationComple.abund.count_table)

get.current()

classify.seqs(fasta=current, count=current, reference=silva.nr_v138_1.align, taxonomy=silva.nr_v138_1.tax, cutoff=80 )

See the other thread where you posted. IonTorrent data is pretty horrible. Sorry to be the barer of bad news. I’d strongly encourage you to either use the phylotype approach or to get 2x250 MiSeq data on the V4 region.

Pat

Hi Dr. Pschloss,

I completely understand but I have no control over switching to Miseq and the only option I have is ion torrent. I did my best to follow the Miseq SOP but tweek it a bit to be used for Ion Torrent FASTQ data. Am I at least on the right track with my commands given the circumstances?

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.