mothur

Processed killed run in batched mode

Dear Mothur friends:

We try to analyze 16S data (illumina V3+V4, 97 samples, 20Gb, data) using a batch mode modified from the stability.batch showed in the Miseq SOP using a ubuntu server with dual cpu and 380gb RAM. The number of processor used in the analysis has currently been reduced to 28, but still got killed in the middle (not quite sure where it is, probably cluster.split). I will try to further reduce the number of processor or taxlevel to see how it will go. If I would like to used the data generated by those command before the one got killed to save time and avoid to run the batch file from start all over again, how should I modify the batch file to do so. Any suggestion to overcome the obstacle will be highly appreciated. The batched file and terminal information showed in the killed step are shown in bellow.

sincerely

Jrhau

REFERENCE_LOCATION=/media/mpiu/a93b0b36-e288-45ef-b21f-acc26e4b0af9/Bacteria-16S-Ref
ALIGNREF=silva.full_v138.fasta
TAXONREF_FASTA=trainset9_032012.pds.fasta
TAXONREF_TAX=trainset9_032012.pds.tax
CONTAMINENTS=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota
LOGNAME=20201026-trial2
DATA=/media/mpiu/a93b0b36-e288-45ef-b21f-acc26e4b0af9/tooth-16S/20201026-trial2
TYPE=fastq
PROC=28
#batch commands
set.logfile(name=$LOGNAME)
make.file(inputdir=$DATA, type=$TYPE, prefix=stability)
make.contigs(file=current, processors=$PROC)
screen.seqs(fasta=current, group=current, maxambig=0, maxlength=500)
unique.seqs()
count.seqs(name=current, group=current)
align.seqs(fasta=current, reference=$REFERENCE_LOCATION/$ALIGNREF)
# screen.seqs(fasta=current, count=current, start=6000, end=26000, maxhomop=8)
screen.seqs(fasta=current, count=current, start=6388, end=25316, maxhomop=8)
filter.seqs(fasta=current, vertical=T, trump=.)
unique.seqs(fasta=current, count=current)
pre.cluster(fasta=current, count=current, diffs=2)
chimera.vsearch(fasta=current, count=current, dereplicate=t)
remove.seqs(fasta=current, accnos=current)
classify.seqs(fasta=current, count=current, reference=$REFERENCE_LOCATION/$TAXONREF_FASTA, taxonomy=$REFERENCE_LOCATION/$TAXONREF_TAX, cutoff=80)
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=$CONTAMINENTS)
remove.groups(count=current, fasta=current, taxonomy=current, groups=Mock)
cluster.split(fasta=current, count=current, taxonomy=current, splitmethod=classify, taxlevel=4, cutoff=0.15)
make.shared(list=current, count=current, label=0.03)
classify.otu(list=current, count=current, taxonomy=current, label=0.03)
phylotype(taxonomy=current)
make.shared(list=current, count=current, label=1)
classify.otu(list=current, count=current, taxonomy=current, label=1)
Clustering /media/mpiu/a93b0b36-e288-45ef-b21f-acc26e4b0af9/tooth-16S/20201026-trial2/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.8.dist

tp	tn	fp	fn	sensitivity	specificity	ppv	npv	fdr	accuracy	mcc	f1score
1.14933e+08	2.70999e+06	1.24432e+06	2.34844e+06	0.979976	0.685326	0.989289	0.535738	0.989289	0.970366	0.5910170.984611	


tp	tn	fp	fn	sensitivity	specificity	ppv	npv	fdr	accuracy	mcc	f1score
1.52984e+08	5.97733e+08	1.02162e+07	3.72478e+07	0.804198	0.983196	0.937401	0.94134	0.937401	0.940535	0.831814	0.865705	


Clustering /media/mpiu/a93b0b36-e288-45ef-b21f-acc26e4b0af9/tooth-16S/20201026-trial2/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.9.dist

Clustering /media/mpiu/a93b0b36-e288-45ef-b21f-acc26e4b0af9/tooth-16S/20201026-trial2/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.15.dist

Clustering /media/mpiu/a93b0b36-e288-45ef-b21f-acc26e4b0af9/tooth-16S/20201026-trial2/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta.17.dist

tp	tn	fp	fn	sensitivity	specificity	ppv	npv	fdr	accuracy	mcc	f1score
2.65246e+08	1.57723e+08	8.99204e+06	1.41069e+08	0.652808	0.946063	0.967211	0.527869	0.967211	0.738127	0.5445080.779501	

Killed

Hi there,

Your distance matrix is likely gigantic and is crashing your computer because it’s trying to use too much RAM. A couple of things to consider…

The reason your distance matrix is so large is because you don’t have fully overlapping reads to sequence the V3-V4 region. Because you don’t have fully overlapping reads, you have suboptimal denoising and effectively nearly every sequence has an error in it increasing the number of unique sequences.

Pat

Dear Dr. Pschloss
Thank you so much for the detailed explanation.

sincerely,

Jrhau

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.