Split.Abund and Advice on Batch File Commands

Commands in mothur

jgcx January 7, 2022, 2:28pm 1

Hello Mothur Team,

I am analyzing Ion Torrent Data. I had trouble processing and getting output files using dist.seqs and cluster.split most likely due to the computer power I was using. I was curious to see if I used a combo of split.abund and classify.seqs. Below are the commands in my batch file. I was using the RDP database and have now switched to SILVA. I seem to be getting similar data, but of course, with more microbes this time.

My Questions

Are there any suggested improvements to my commands to get even better data when using split.abund?
After bringing my finding to my group it was suggested that I decrease the criteria parameter to see if less unclassified microbes would appear. Would decreasing (50,80) or increasing (100) the criteria number for the sceen.seqs command make a difference in the number of unclassified microbes that popup?
I finally got approval for a virtual machine via a cloud computing service. Once I get this setup, should I revisit dist.seqs and cluster.split? I am interested in data at the genus level.

Commands
set.logfile(name=Testing123LogFile)

fastq.info(fastq=Testing123.fastq)

get.current()

trim.seqs(fasta=current, oligos=MothurOligos.txt, maxambig=0, maxhomop=6, bdiffs=0, pdiffs=0, minlength=265, keepfirst=285, flip=F, processors=8)

summary.seqs(fasta=current, processors=8)

unique.seqs(fasta=current)

get.current()

summary.seqs(fasta=current, processors=8)

count.seqs(name=current, group=current)

get.current()

align.seqs(fasta=current, reference=silva.v4.fasta, processors=6)

summary.seqs(fasta=current, count=current, processors=8)

get.current()

screen.seqs(fasta=current, count=current, summary=current, optimize=start-end-minlength-maxlength, criteria=95, processors=8)

summary.seqs(fasta=current, count=current, processors=8)
get.current()

filter.seqs(fasta=current, vertical=T, trump=.)

get.current()

summary.seqs(fasta=current, count=current, processors=8)

unique.seqs(fasta=current, count=current)

summary.seqs(fasta=current, count=current, processors=8)

get.current()

pre.cluster(fasta=current, count=current, diffs=2)

summary.seqs(fasta=current, count=current, processors=8)

get.current()

chimera.vsearch(fasta=current, count=current, dereplicate=t)

remove.seqs(accnos=current, fasta=current)

get.current()

summary.seqs(fasta=current, count=current, processors=8)

classify.seqs(fasta=current, count=current, template=silva.nr_v138_1.align, taxonomy=silva.nr_v138_1.tax, cutoff=80, processors=8)

get.current()

remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Eukaryota)

summary.seqs(fasta=current, count=current, processors=8)

get.current()

summary.tax(taxonomy=current, count=current)

get.current()

rename.file(fasta=current, count=current, taxonomy=current, prefix=FinalCurationComplete)

split.abund(fasta=current, count=current, cutoff=1)

set.current(fasta=FinalCurationComplete.abund.fasta, count=FinalCurationComple.abund.count_table)

get.current()

set.current(fasta=FinalCurationComplete.abund.fasta, count=FinalCurationComple.abund.count_table)

get.current()

classify.seqs(fasta=current, count=current, reference=silva.nr_v138_1.align, taxonomy=silva.nr_v138_1.tax, cutoff=80 )

pschloss January 11, 2022, 5:54pm 2

See the other thread where you posted. IonTorrent data is pretty horrible. Sorry to be the barer of bad news. I’d strongly encourage you to either use the phylotype approach or to get 2x250 MiSeq data on the V4 region.

Pat

jgcx January 11, 2022, 6:07pm 3

Hi Dr. Pschloss,

I completely understand but I have no control over switching to Miseq and the only option I have is ion torrent. I did my best to follow the Miseq SOP but tweek it a bit to be used for Ion Torrent FASTQ data. Am I at least on the right track with my commands given the circumstances?

system Closed January 21, 2022, 6:08pm 4

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views	Activity
Using cluster.split with large data Commands in mothur	2	2699	March 31, 2014
Problem cluster.split Commands in mothur	4	331	April 6, 2023
Dist.seqs, cluster.split and split.abund taking long time Commands in mothur	3	835	May 12, 2023
Issues with splitting Commands in mothur	2	200	August 31, 2023
Cluster vs Cluster Split Commands in mothur	3	4724	August 13, 2014