Inflated unique sequence count

vasadowski · January 25, 2021, 5:24am

Hi,
I’m currently re-running an analysis using mothur v.1.44.3 on about 300 samples (Illumina MiSeq 2x250PE with EMP V4 primers) and running into problems with a large number of unique sequences which didn’t happen with my first run using mothur v.1.40.5 on the same fastq files and computer cluster. Following the SOP, I still have ~400k uniques for cluster.split() which produces ~100k OTUs whereas my first run produced a more reasonable ~9k OTUs. Now when I run my analysis with the older version, I’m still getting huge shared files and have 360k unique sequences at cluster.split with 6.8 million total sequences. My commands are exactly the same according to my records, so I’m not sure why this is happening! The problem happens early in the analysis, as the first unique.seqs() step is producing ~1 million uniques whereas it was ~450k uniques on the previous run. Here are my commands:

make.file(inputdir=., type=fastq, prefix=pp)
make.contigs(file=pp.files, processors=32)
summary.seqs(fasta=current)
screen.seqs(fasta=current, group=current, summary=current, maxambig=0, maxlength=275)
unique.seqs(fasta=current)
count.seqs(name=current, group=current)
summary.seqs(count=current)
pcr.seqs(fasta=silva.nr_v138.align, start=13862, end=23444, keepdots=F, processors=8)
rename.file(input=silva.nr_v138.pcr.align, new=silva.v4.fasta)
summary.seqs(fasta=silva.v4.fasta)
align.seqs(fasta=pp.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)
summary.seqs(fasta=current, count=current)
screen.seqs(fasta=current, count=current, summary=current, start=1967, end=11549, maxhomop=8)
summary.seqs(fasta=current, count=current)
filter.seqs(fasta=current, vertical=T, trump=.)
unique.seqs(fasta=current, count=current)
pre.cluster(count=current, fasta=current, diffs=2) 
chimera.vsearch(fasta=current, count=current, dereplicate=t, processors=32)
remove.seqs(fasta=current, accnos=current)
summary.seqs(fasta=current, count=current)
classify.seqs(fasta=current, count=current, reference=silva.v4.fasta, taxonomy=silva.nr_v138.tax, cutoff=80)	
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Bacteria;Cyanobacteria;Cyanobacteriia;Chloroplast-Bacteria_unclassified;-Bacteria;Cyanobacteria/Chloroplast;-Mitochondria;-Unknown;-Archaea;-Eukaryota;)
summary.tax(taxonomy=current, count=current)
dist.seqs(fasta=current, cutoff=0.03)
cluster(column=current, count=current)
classify.otu(list=current, count=current, taxonomy=current, label=0.03)
make.shared(list=current, count=current, label=0.03)

Any clue what’s going on here? Thanks in advance!

pschloss · January 27, 2021, 7:10pm

Hi there,

It’s hard to say - the pipeline looks right. Can you maybe look at your pp.files file and make sure it only has each sample one time? That you’re getting twice the number of uniques is suspicious to me. Alternatively, can you post the output of running summary.seqs after make.contigs and from after unique.seqs?

Pat

system · February 6, 2021, 7:10pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
help with unique.seqs Commands in mothur	6	5001	February 28, 2014
Too many unique sequences before cluster.seqs	1	767	February 23, 2021
summary.seqs error - count table not unique Commands in mothur	2	1492	January 12, 2016
unique.seqs command Commands in mothur	4	34403	February 11, 2013
Trouble shoting of cluster Commands in mothur	21	12387	January 11, 2013

Inflated unique sequence count

Related topics