Hi,
I’m currently re-running an analysis using mothur v.1.44.3 on about 300 samples (Illumina MiSeq 2x250PE with EMP V4 primers) and running into problems with a large number of unique sequences which didn’t happen with my first run using mothur v.1.40.5 on the same fastq files and computer cluster. Following the SOP, I still have ~400k uniques for cluster.split() which produces ~100k OTUs whereas my first run produced a more reasonable ~9k OTUs. Now when I run my analysis with the older version, I’m still getting huge shared files and have 360k unique sequences at cluster.split with 6.8 million total sequences. My commands are exactly the same according to my records, so I’m not sure why this is happening! The problem happens early in the analysis, as the first unique.seqs() step is producing ~1 million uniques whereas it was ~450k uniques on the previous run. Here are my commands:
make.file(inputdir=., type=fastq, prefix=pp)
make.contigs(file=pp.files, processors=32)
summary.seqs(fasta=current)
screen.seqs(fasta=current, group=current, summary=current, maxambig=0, maxlength=275)
unique.seqs(fasta=current)
count.seqs(name=current, group=current)
summary.seqs(count=current)
pcr.seqs(fasta=silva.nr_v138.align, start=13862, end=23444, keepdots=F, processors=8)
rename.file(input=silva.nr_v138.pcr.align, new=silva.v4.fasta)
summary.seqs(fasta=silva.v4.fasta)
align.seqs(fasta=pp.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)
summary.seqs(fasta=current, count=current)
screen.seqs(fasta=current, count=current, summary=current, start=1967, end=11549, maxhomop=8)
summary.seqs(fasta=current, count=current)
filter.seqs(fasta=current, vertical=T, trump=.)
unique.seqs(fasta=current, count=current)
pre.cluster(count=current, fasta=current, diffs=2)
chimera.vsearch(fasta=current, count=current, dereplicate=t, processors=32)
remove.seqs(fasta=current, accnos=current)
summary.seqs(fasta=current, count=current)
classify.seqs(fasta=current, count=current, reference=silva.v4.fasta, taxonomy=silva.nr_v138.tax, cutoff=80)
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Bacteria;Cyanobacteria;Cyanobacteriia;Chloroplast-Bacteria_unclassified;-Bacteria;Cyanobacteria/Chloroplast;-Mitochondria;-Unknown;-Archaea;-Eukaryota;)
summary.tax(taxonomy=current, count=current)
dist.seqs(fasta=current, cutoff=0.03)
cluster(column=current, count=current)
classify.otu(list=current, count=current, taxonomy=current, label=0.03)
make.shared(list=current, count=current, label=0.03)
Any clue what’s going on here? Thanks in advance!