Recurring "Your count table contains more than 1 sequence named..." yet doesn't

Hello

I had a look through the forum as well as multiple google searches and I can’t seem to find anyone with this issue.

I keep on getting an error through the pipeline on my count table that there is more than 1 of the same sequence. However this is not the case, I’ve checked through and there isn’t. It appears that at each command in the pipeline, the command is performed, output files are generated, but then it repeats this error about the multiple sequences.

[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.

**** Exceeded maximum allowed command errors, quitting ****
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.

What is curious is that it is the first sequence in the table. I did a test where I took that sequence out and sure enough it flagged the next sequence in the table that was more than 1 occurrence. Almost as though it is reiterating through the file and then would encounter the same sequence?
That’s just a guess though - could be well off with that one!

Any advise and thoughts on this would be greatly appreciated! :confused::grimacing::disappointed:

Thank you!

If you are on a mac/linux computer could you do something like…

grep "M03896_39_000000000-KVVR7:1:2109:12885:5371" *.fastq

And see which files it retrieves? You should get it coming out in the R1 and R2 files.

You might also try…

grep "M03896_39_000000000-KVVR7_1_2109_12885_5371" *count_table
grep "M03896_39_000000000-KVVR7_1_2109_12885_5371" *fasta

And see how many times it appears. I wonder whether you might have the same files used multiple times for different group names in the files file

Pat

Hello

Thanks for the advice.

When running grep on the count table and fasta files it is just finding the 1 occurrence in all the different files.

On count tables

$ grep "M03896_39_000000000-KVVR7_1_2109_12885_5371" *count_table

stability.contigs.count_table:M03896_39_000000000-KVVR7_1_2109_12885_5371       1       22,1
stability.contigs.good.count_table:M03896_39_000000000-KVVR7_1_2109_12885_5371  1       22,1
stability.trim.contigs.good.count_table:M03896_39_000000000-KVVR7_1_2109_12885_5371     1       22,1
stability.trim.contigs.good.unique.precluster.Bs002.count_table:M03896_39_000000000-KVVR7_1_2109_12885_5371     1       2,1
stability.trim.contigs.good.unique.precluster.count_table:M03896_39_000000000-KVVR7_1_2109_12885_5371   1       22,1
stability.trim.contigs.good.unique.precluster.denovo.vsearch.Bs002.count_table:M03896_39_000000000-KVVR7_1_2109_12885_5371 2,1
stability.trim.contigs.good.unique.precluster.denovo.vsearch.count_table:M03896_39_000000000-KVVR7_1_2109_12885_5371    1  22,1
stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.count_table:M03896_39_000000000-KVVR7_1_2109_12885_5371 1       22,1
stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.pick.count_table:M03896_39_000000000-KVVR7_1_2109_12885_5371    1       22,1

On fasta files

$ grep "M03896_39_000000000-KVVR7_1_2109_12885_5371" *fasta

stability.trim.contigs.fasta:>M03896_39_000000000-KVVR7_1_2109_12885_5371       ee=0.196599
stability.trim.contigs.good.fasta:>M03896_39_000000000-KVVR7_1_2109_12885_5371  ee=0.196599
stability.trim.contigs.good.unique.fasta:>M03896_39_000000000-KVVR7_1_2109_12885_5371   ee=0.196599
stability.trim.contigs.good.unique.precluster.Bs002.fasta:>M03896_39_000000000-KVVR7_1_2109_12885_5371  ee=0.196599
stability.trim.contigs.good.unique.precluster.denovo.vsearch.Bs002.fasta:>M03896_39_000000000-KVVR7_1_2109_12885_5371   ee=0.196599
stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.fasta:>M03896_39_000000000-KVVR7_1_2109_12885_5371      ee=0.196599
stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.pick.fasta:>M03896_39_000000000-KVVR7_1_2109_12885_5371 ee=0.196599
stability.trim.contigs.good.unique.precluster.denovo.vsearch.fasta:>M03896_39_000000000-KVVR7_1_2109_12885_5371 ee=0.196599
stability.trim.contigs.good.unique.precluster.fasta:>M03896_39_000000000-KVVR7_1_2109_12885_5371        ee=0.196599

Unless there is anything that is awry there?

Thanks again

Dominic

Hmmm. Can you post the commands you are running from the beginning to where you’re getting the error message? Are you only getting the error at the last step or all along?

Pat

The error started appearing from classify.seqs.
(Just a bit of background these are ITS sequences)

The previous commands prior to this was chimera.vsearch and remove.seqs

Removing chimeras from your input files:
/******************************************/
Running command: remove.seqs(fasta=stability.trim.contigs.good.unique.precluster.denovo.vsearch.fasta, accnos=stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.accnos)
Removed 19 sequences from stability.trim.contigs.good.unique.precluster.denovo.vsearch.fasta.

Output File Names:
stability.trim.contigs.good.unique.precluster.denovo.vsearch.pick.fasta

/******************************************/

Output File Names:
stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.count_table
stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.chimeras
stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.accnos
stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.fasta

Then when running classify.seqs it goes through the command (including outputting files) then this happened

It took 3490 secs to classify 243871 sequences.

[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.

Then the following commands is similar. It will run the command and create output files and then repeat this error. I thought I would keep trying through the commands but once get get.shared - it was coming back with the error straight away.

Dominic

Can you post the actual command syntax you are using at each step?

Included the commands run up to and including classify.seqs which is when the error came.

mothur > make.file(inputdir = ., type = fastq)

mothur > make.contigs(file = stability.files)

mothur > summary.seqs(fasta = current)

mothur > screen.seqs(fasta = stability.trim.contigs.fasta, count = stability.contigs.count_table, summary = stability.trim.contigs.summary, maxambig = 0, maxlength = 350, maxhomop = 8)

mothur > summary.seqs(fasta = current)

mothur > unique.seqs(fasta = current, count = current)

mothur > summary.seqs(fasta = current)

mothur > pre.cluster(fasta = current, count = current, diffs = 2)

mothur > summary.seqs(fasta = current, count = current)

mothur > chimera.vsearch(fasta = stability.trim.contigs.good.unique.precluster.fasta, count = stability.trim.contigs.good.unique.precluster.count_table)

mothur > chimera.vsearch(fasta = current, count = current, dereplicate = t)

mothur > classify.seqs(fasta=stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.fasta, count=stability.trim.contigs.good.unique.precluster.denovo.vsearch.denovo.vsearch.count_table, reference=UNITEv6_sh_dynamic.fasta, taxonomy=UNITEv6_sh_dynamic.tax, cutoff=60, processors=32)

I can add in the output for the commands if needed.

Thanks
Dominic

Can you try it with only one of the two chimera.vsearch function calls? I think that, with the use of the current argument is screwing things up

Pat

Hmm. So I backtracked and ran chimera.vsearch with the appropriate input files from pre.cluster and now I am getting the error output again (where previously this wasn’t an error). It is basically saying whatever is at the beginning of the file is duplicate.

For instance, I ran it as follows

chimera.vsearch(fasta = stability.trim.contigs.good.unique.precluster.fasta, count = stability.trim.contigs.good.unique.precluster.count_table, dereplicate = t)

the count table file looked like this:
#Compressed Format: groupIndex,abundance. For example 1,6 would mean the read has an abundance of 6 for group As001.
#1,As001 2,As002 3,As003 4,As004 5,As005 6,As006 7,As007 8,As008 9,As009 10,As010 11,As011 12,As012 13,As013 14,As014 15,As015 16,As016 17,As017 18,As018 19,As019 20,As020 21,Bs001 22,Bs002 23,Bs003 24,Bs004 25,Bs005 26,Bs006 27,Bs007 28,Bs008 29,Bs009 30,Bs010 31,Bs011 32,Bs012 33,Bs013 34,Bs014 35,Bs015B1 36,Bs015B2 37,Bs016 38,Bs017 39,Bs018 40,Bs019 41,Bs020 42,FBL001 43,FBL002 44,FBL003 45,FBL004 46,FBL005 47,FBL006 48,FBL007 49,FBL008 50,FBL009 51,FBL010 52,FBL011 53,FBL012 54,FBL013 55,FBL014 56,FBL015 57,FBL016 58,FBL017 59,FBL018 60,FBL019 61,FBL020 62,SBL001 63,SBL002 64,SBL003 65,SBL004 66,SBL005 67,SBL006 68,SBL007 69,SBL008 70,SBL009 71,SBL010 72,SBL011 73,SBL012 74,SBL013 75,SBL014 76,SBL015 77,SBL016 78,SBL017 79,SBL018 80,SBL019
Representative_Sequence total As001 As002 As003 As004 As005 As006 As007 As008 As009 As010 As011 As012 As013 As014 As015 As016 As017 As018 As019 As020 Bs001 Bs002 Bs003 Bs004 Bs005 Bs006 Bs007 Bs008 Bs009 Bs010 Bs011 Bs012 Bs013 Bs014 Bs015B1 Bs015B2 Bs016 Bs017 Bs018 Bs019 Bs020 FBL001 FBL002 FBL003 FBL004 FBL005 FBL006 FBL007 FBL008 FBL009 FBL010 FBL011 FBL012 FBL013 FBL014 FBL015 FBL016 FBL017 FBL018 FBL019 FBL020 SBL001 SBL002 SBL003 SBL004 SBL005 SBL006 SBL007 SBL008 SBL009 SBL010 SBL011 SBL012 SBL013 SBL014 SBL015 SBL016 SBL017 SBL018 SBL019

and I got this error coming through


[ERROR]: Your count table contains more than 1 sequence named #1,As001, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named #1,As001, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named #1,As001, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named #1,As001, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named #1,As001, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named #1,As001, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named #1,As001, sequence names must be unique. Please correct.

I then removed that top line so the top of the file then looked like this
Representative_Sequence total As001 As002 As003 As004 As005 As006 As007 As008 As009 As010 As011 As012 As013 As014 As015 As016 As017 As018 As019 As020 Bs001 Bs002 Bs003 Bs004 Bs005 Bs006 Bs007 Bs008 Bs009 Bs010 Bs011 Bs012 Bs013 Bs014 Bs015B1 Bs015B2 Bs016 Bs017 Bs018 Bs019 Bs020 FBL001 FBL002 FBL003 FBL004 FBL005 FBL006 FBL007 FBL008 FBL009 FBL010 FBL011 FBL012 FBL013 FBL014 FBL015 FBL016 FBL017 FBL018 FBL019 FBL020 SBL001 SBL002 SBL003 SBL004 SBL005 SBL006 SBL007 SBL008 SBL009 SBL010 SBL011 SBL012 SBL013 SBL014 SBL015 SBL016 SBL017 SBL018 SBL019
M03896_39_000000000-KVVR7_1_2109_12885_5371 1 22,1

and then it went back to that first sequence

[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.
[ERROR]: Your count table contains more than 1 sequence named M03896_39_000000000-KVVR7_1_2109_12885_5371, sequence names must be unique. Please correct.

Not sure if that means anything. But seems like it doesn’t like the top of the file?

Thanks
Dominic

Can you possibly send me a link to where I can access your fastq files and files file? pschloss / umich.edu

Pat

Was just about to reply on here - I’ve managed to sort this out.

Well I just ran it again from the beginning and it has gone through with no errors. I think the problem was probably something to do with chimera vsearch function calls like you mentioned. I imagine that I was getting the errors again when backtracking because there might have been other files in there which it was picking up. So I just thought to try and run again from the beginning - clean slate as it were - and all fine.

Thank you for the help on this!

Wonderful - I’m glad you were able to get it sorted out.

Take care
Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.