Sequence duplication in screen.seqs, unique.seqs

ADL · July 9, 2021, 1:45pm

I using version 1.45.3 and I am getting duplicate sequences after run screen.seqs in the standard miseq sop. I am positive the .files does not contain duplicate group names, and everything is paired with the correct fast.gz file.

make.contigs(file=16slamb.files, processors=16)

summary.seqs(fasta=16slamb.trim.contigs.fasta)

screen.seqs(fasta=16slamb.trim.contigs.fasta, group=16slamb.contigs.groups, maxambig=0, maxlength=575)

unique.seqs(fasta=16slamb.trim.contigs.good.fasta)

[ERROR]: You already have a sequence named M01398_94_000000000-JP475_1_2104_15478_11022 in your fasta file, sequence names must be unique, please correct.

count.seqs(name=16slamb.trim.contigs.good.names, group=16slamb.contigs.good.groups)

[ERROR]: Your count table contains more than 1 sequence named M01398_94_000000000-JP475_1_2104_21625_19031, sequence names must be unique. Please correct.

I found a potential solution in another thread, to run list.seqs and get.seqs before unique.seqs, but I thought I would let you all know, it could be a bug or something with the fastq.gz files. I am processing sequences from a facility that is using a different protocol than what I am used to

" primer set 341F 806R was used which covers most of the V3-V4 region. This region is about 460bp.The sequencing length is 2 x 250 bp. For some samples that need additional reads, we may pool the samples with other samples that need 2 x 300 bp sequencing length. Illumina’s proprietary software will combine the result of the paired-end reads, no matter if it is 2x250 bp or 2x300 bp, into one sequence"

I wonder if this could result in duplicate sequence names?

Edit: Running list.seqs and get.seqs before unique.seqs seems to be working, but I am wondering how the extra sequence names are getting in there

sje062 · July 16, 2021, 10:23am

Hi,

I would include a name file. I.e. duplicated sequences need a name file so only unique sequences left in the fasta file. Check your summary.seqs output. Duplicated sequences there? (duplicate sequences and duplicate sequence names is not the same thing). Help for screen.seqs:

Perhaps try this first (I do not know where the extra sequence names comes from).

Hope this helps.

Sigmund

Topic		Replies	Views
unique.seqs error mothur bugs	11	4194	July 8, 2021
miseq SOP unique.seqs in your fasta file, sequence names must be unique, please correct. Commands in mothur	1	789	January 5, 2017
count table contains more than 1 sequence Commands in mothur	9	2946	March 22, 2016
sequence getting duplicated in count table mothur bugs	4	3393	April 4, 2016
Unique.seqs error duplicate sequences	4	686	June 5, 2022

Sequence duplication in screen.seqs, unique.seqs

Related topics