mothur

Sequence duplication in screen.seqs, unique.seqs

I using version 1.45.3 and I am getting duplicate sequences after run screen.seqs in the standard miseq sop. I am positive the .files does not contain duplicate group names, and everything is paired with the correct fast.gz file.

make.contigs(file=16slamb.files, processors=16)

summary.seqs(fasta=16slamb.trim.contigs.fasta)

screen.seqs(fasta=16slamb.trim.contigs.fasta, group=16slamb.contigs.groups, maxambig=0, maxlength=575)

unique.seqs(fasta=16slamb.trim.contigs.good.fasta)

[ERROR]: You already have a sequence named M01398_94_000000000-JP475_1_2104_15478_11022 in your fasta file, sequence names must be unique, please correct.

count.seqs(name=16slamb.trim.contigs.good.names, group=16slamb.contigs.good.groups)

[ERROR]: Your count table contains more than 1 sequence named M01398_94_000000000-JP475_1_2104_21625_19031, sequence names must be unique. Please correct.

I found a potential solution in another thread, to run list.seqs and get.seqs before unique.seqs, but I thought I would let you all know, it could be a bug or something with the fastq.gz files. I am processing sequences from a facility that is using a different protocol than what I am used to

" primer set 341F 806R was used which covers most of the V3-V4 region. This region is about 460bp.The sequencing length is 2 x 250 bp. For some samples that need additional reads, we may pool the samples with other samples that need 2 x 300 bp sequencing length. Illumina’s proprietary software will combine the result of the paired-end reads, no matter if it is 2x250 bp or 2x300 bp, into one sequence"

I wonder if this could result in duplicate sequence names?

Edit: Running list.seqs and get.seqs before unique.seqs seems to be working, but I am wondering how the extra sequence names are getting in there

Hi,

I would include a name file. I.e. duplicated sequences need a name file so only unique sequences left in the fasta file. Check your summary.seqs output. Duplicated sequences there? (duplicate sequences and duplicate sequence names is not the same thing). Help for screen.seqs:

Perhaps try this first (I do not know where the extra sequence names comes from).

Hope this helps.

Sigmund