I using version 1.45.3 and I am getting duplicate sequences after run screen.seqs in the standard miseq sop. I am positive the .files does not contain duplicate group names, and everything is paired with the correct fast.gz file.
make.contigs(file=16slamb.files, processors=16)
summary.seqs(fasta=16slamb.trim.contigs.fasta)
screen.seqs(fasta=16slamb.trim.contigs.fasta, group=16slamb.contigs.groups, maxambig=0, maxlength=575)
unique.seqs(fasta=16slamb.trim.contigs.good.fasta)
[ERROR]: You already have a sequence named M01398_94_000000000-JP475_1_2104_15478_11022 in your fasta file, sequence names must be unique, please correct.
count.seqs(name=16slamb.trim.contigs.good.names, group=16slamb.contigs.good.groups)
[ERROR]: Your count table contains more than 1 sequence named M01398_94_000000000-JP475_1_2104_21625_19031, sequence names must be unique. Please correct.
I found a potential solution in another thread, to run list.seqs and get.seqs before unique.seqs, but I thought I would let you all know, it could be a bug or something with the fastq.gz files. I am processing sequences from a facility that is using a different protocol than what I am used to
" primer set 341F 806R was used which covers most of the V3-V4 region. This region is about 460bp.The sequencing length is 2 x 250 bp. For some samples that need additional reads, we may pool the samples with other samples that need 2 x 300 bp sequencing length. Illumina’s proprietary software will combine the result of the paired-end reads, no matter if it is 2x250 bp or 2x300 bp, into one sequence"
I wonder if this could result in duplicate sequence names?
Edit: Running list.seqs and get.seqs before unique.seqs seems to be working, but I am wondering how the extra sequence names are getting in there