sequence getting duplicated in count table

Hi Sarah

I’m running my normal batch (pretty much the miseq sop). The last sequence is getting duplicated when making the *trim.contigs.good.good.count_table using 1.36.1. I thought that this might be related to calling for a group file when there isn’t one (this time I’m processing a single sample), so removed all the “group=” from the batch, but it still duplicated the last line of the count_table when screen.seqs

grep "M00618_6_000000000-AKVA8_1_2114_15906_28552"  CowSaliva.paired.trim.contigs.good.count_table
M00618_6_000000000-AKVA8_1_2114_15906_28552 1
grep "M00618_6_000000000-AKVA8_1_2114_15906_28552"  CowSaliva.paired.trim.contigs.good.good.count_table
M00618_6_000000000-AKVA8_1_2114_15906_28552 1 
M00618_6_000000000-AKVA8_1_2114_15906_28552 1

Here’s the logfile generating those 2 files:

mothur > count.seqs(name=current, group=current)
[WARNING]: no file was saved for group parameter.
Using CowSaliva.paired.trim.contigs.good.names as input file for the name parameter.

Using 16 processors.
It took 0 secs to create a table for 121530 sequences.


Total number of sequences: 121530

Output File Names:
CowSaliva.paired.trim.contigs.good.count_table


mothur > align.seqs(fasta=current, reference=silva.nr_v119.v4.align) Using CowSaliva.paired.trim.contigs.good.unique.fasta as input file for the fasta parameter.

Using 16 processors.

Reading in the silva.nr_v119.v4.align template sequences… DONE.
It took 64 to read 153307 sequences.
Aligning sequences from CowSaliva.paired.trim.contigs.good.unique.fasta …
Some of you sequences generated alignments that eliminated too many bases, a list is provided in CowSaliva.paired.trim.contigs.good.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well.
It took 17 secs to align 5055 sequences.


Output File Names: CowSaliva.paired.trim.contigs.good.unique.align CowSaliva.paired.trim.contigs.good.unique.align.report CowSaliva.paired.trim.contigs.good.unique.flip.accnos

mothur > summary.seqs(fasta=current, count=current) Using CowSaliva.paired.trim.contigs.good.count_table as input file for the count parameter. Using CowSaliva.paired.trim.contigs.good.unique.align as input file for the fasta parameter.

Using 16 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1231 3 0 1 1
2.5%-tile: 1968 11550 253 0 4 3039
25%-tile: 1968 11550 253 0 4 30383
Median: 1968 11550 253 0 6 60766
75%-tile: 1968 11550 253 0 6 91148
97.5%-tile: 1968 11550 253 0 6 118492
Maximum: 3803 11553 263 0 8 121530
Mean: 1969.85 11549.5 252.999 0 5.12357

of unique seqs: 5055

total # of seqs: 121530

Output File Names:
CowSaliva.paired.trim.contigs.good.unique.summary

It took 1 secs to summarize 121530 sequences.

mothur > screen.seqs(fasta=current, count=current, summary=current, start=1968, end=11550, maxhomop=8)
Using CowSaliva.paired.trim.contigs.good.count_table as input file for the count parameter.
Using CowSaliva.paired.trim.contigs.good.unique.align as input file for the fasta parameter.
Using CowSaliva.paired.trim.contigs.good.unique.summary as input file for the summary parameter.

Using 16 processors.

Output File Names:
CowSaliva.paired.trim.contigs.good.unique.good.summary
CowSaliva.paired.trim.contigs.good.unique.good.align
CowSaliva.paired.trim.contigs.good.unique.bad.accnos
CowSaliva.paired.trim.contigs.good.good.count_table


It took 2 secs to screen 5055 sequences.

mothur > filter.seqs(fasta=current, vertical=T)
Using CowSaliva.paired.trim.contigs.good.unique.good.align as input file for the fasta parameter.

Using 16 processors.
Creating Filter…


Running Filter...


Length of filtered alignment: 374 Number of columns removed: 13051 Length of the original alignment: 13425 Number of sequences used to construct filter: 5002

Output File Names:
CowSaliva.filter
CowSaliva.paired.trim.contigs.good.unique.good.filter.fasta


mothur > summary.seqs(fasta=current, count=current) Using CowSaliva.paired.trim.contigs.good.good.count_table as input file for the count parameter. Using CowSaliva.paired.trim.contigs.good.unique.good.filter.fasta as input file for the fasta parameter.

Using 16 processors.
[ERROR]: Your count table contains more than 1 sequence named M00618_6_000000000-AKVA8_1_2114_15906_28552, sequence names must be unique. Please correct.

Can you try running count.seqs with processors=1?

same error

mothur > make.contigs(processors=16, file=CowSaliva.paired.file)

Using 16 processors.

Processing file pair /common/scratch/km/Fidopiastis/CowSaliva.170918189_
S40_L001_R1_001.fastq - /common/scratch/km/Fidopiastis/CowSaliva.170918189_S40_L
001_R2_001.fastq (files 1 of 1) <<<<<
Making contigs…
Done.

It took 33 secs to assemble 138302 reads.

It took 33 secs to process 138302 sequences.

Output File Names:
CowSaliva.paired.trim.contigs.fasta
CowSaliva.paired.trim.contigs.qual
CowSaliva.paired.contigs.report
CowSaliva.paired.scrap.contigs.fasta
CowSaliva.paired.scrap.contigs.qual

[WARNING]: your sequence names contained ‘:’. I changed them to ‘_’ to avoid pr
oblems in your downstream analysis.

mothur > screen.seqs(fasta=current, summary=current, maxambig=0, maxlength=270)
Using CowSaliva.paired.trim.contigs.fasta as input file for the fasta parameter.
[WARNING]: no file was saved for summary parameter.

Using 16 processors.

Output File Names:
CowSaliva.paired.trim.contigs.good.fasta
CowSaliva.paired.trim.contigs.bad.accnos

It took 2 secs to screen 138302 sequences.

mothur > summary.seqs(fasta=current)
Using CowSaliva.paired.trim.contigs.good.fasta as input file for the fasta param
eter.

Using 16 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 251 251 0 3 1
2.5%-tile: 1 253 253 0 4 3039
25%-tile: 1 253 253 0 4 30383
Median: 1 253 253 0 6 60766
75%-tile: 1 253 253 0 6 91148
97.5%-tile: 1 253 253 0 6 118492
Maximum: 1 267 267 0 8 121530
Mean: 1 253.001 253.001 0 5.1236

of Seqs: 121530

Output File Names:
CowSaliva.paired.trim.contigs.good.summary

It took 1 secs to summarize 121530 sequences.

mothur > unique.seqs(fasta=current)
Using CowSaliva.paired.trim.contigs.good.fasta as input file for the fasta param
eter.
121530 5055

Output File Names:
CowSaliva.paired.trim.contigs.good.names
CowSaliva.paired.trim.contigs.good.unique.fasta


mothur > summary.seqs(fasta=current, name=current) Using CowSaliva.paired.trim.contigs.good.unique.fasta as input file for the fast a parameter. Using CowSaliva.paired.trim.contigs.good.names as input file for the name parame ter.

Using 16 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 251 251 0 3 1
2.5%-tile: 1 253 253 0 4 3039
25%-tile: 1 253 253 0 4 30383
Median: 1 253 253 0 6 60766
75%-tile: 1 253 253 0 6 91148
97.5%-tile: 1 253 253 0 6 118492
Maximum: 1 267 267 0 8 121530
Mean: 1 253.001 253.001 0 5.1236

of unique seqs: 5055

total # of seqs: 121530

Output File Names:
CowSaliva.paired.trim.contigs.good.unique.summary

It took 0 secs to summarize 121530 sequences.

mothur > count.seqs(name=current, processors=1)
Using CowSaliva.paired.trim.contigs.good.names as input file for the name parame
ter.

Using 1 processors.
It took 0 secs to create a table for 121530 sequences.


Total number of sequences: 121530

Output File Names:
CowSaliva.paired.trim.contigs.good.count_table


mothur > align.seqs(fasta=current, reference=silva.nr_v119.v4.align) Using CowSaliva.paired.trim.contigs.good.unique.fasta as input file for the fast a parameter.

Using 1 processors.

Reading in the silva.nr_v119.v4.align template sequences… DONE.
It took 66 to read 153307 sequences.
Aligning sequences from CowSaliva.paired.trim.contigs.good.unique.fasta …
Some of you sequences generated alignments that eliminated too many bases, a lis
t is provided in CowSaliva.paired.trim.contigs.good.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as
well.
It took 72 secs to align 5055 sequences.


Output File Names: CowSaliva.paired.trim.contigs.good.unique.align CowSaliva.paired.trim.contigs.good.unique.align.report CowSaliva.paired.trim.contigs.good.unique.flip.accnos

mothur > summary.seqs(fasta=current, count=current) Using CowSaliva.paired.trim.contigs.good.count_table as input file for the count parameter. Using CowSaliva.paired.trim.contigs.good.unique.align as input file for the fast a parameter.

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1231 3 0 1 1
2.5%-tile: 1968 11550 253 0 4 3039
25%-tile: 1968 11550 253 0 4 30383
Median: 1968 11550 253 0 6 60766
75%-tile: 1968 11550 253 0 6 91148
97.5%-tile: 1968 11550 253 0 6 118492
Maximum: 3803 11553 263 0 8 121530
Mean: 1969.85 11549.5 252.999 0 5.12357

of unique seqs: 5055

total # of seqs: 121530

Output File Names:
CowSaliva.paired.trim.contigs.good.unique.summary

It took 3 secs to summarize 121530 sequences.

mothur > screen.seqs(fasta=current, count=current, summary=current, start=1968,
end=11550, maxhomop=8)
Using CowSaliva.paired.trim.contigs.good.count_table as input file for the count
parameter.
Using CowSaliva.paired.trim.contigs.good.unique.align as input file for the fast
a parameter.
Using CowSaliva.paired.trim.contigs.good.unique.summary as input file for the su
mmary parameter.

Using 1 processors.

Output File Names:
CowSaliva.paired.trim.contigs.good.unique.good.summary
CowSaliva.paired.trim.contigs.good.unique.good.align
CowSaliva.paired.trim.contigs.good.unique.bad.accnos
CowSaliva.paired.trim.contigs.good.good.count_table


It took 3 secs to screen 5055 sequences.

mothur > filter.seqs(fasta=current, vertical=T)
Using CowSaliva.paired.trim.contigs.good.unique.good.align as input file for the
fasta parameter.

Using 1 processors.
Creating Filter…


Running Filter...

Length of filtered alignment: 374
Number of columns removed: 13051
Length of the original alignment: 13425
Number of sequences used to construct filter: 5002

Output File Names:
CowSaliva.filter
CowSaliva.paired.trim.contigs.good.unique.good.filter.fasta


mothur > summary.seqs(fasta=current, count=current) Using CowSaliva.paired.trim.contigs.good.good.count_table as input file for the count parameter. Using CowSaliva.paired.trim.contigs.good.unique.good.filter.fasta as input file for the fasta parameter.

Using 1 processors.
[ERROR]: Your count table contains more than 1 sequence named M00618_6_000000000-AKVA8_1_2114_15906_28552, sequence names must be unique. Please correct.

mothur > quit()

I confirm, have the same error :?

Thanks for reporting this bug. It effects the screen.seqs commands filtering of the count file when the count file does not include groups. The last sequence in the file is duplicated. This will be fixed in version 1.37 releasing later this week. In the meantime you can workaround the issue by removing the duplicate last line or using the list.seqs and get.seqs commands.