unique.seqs error

When running unique.seqs command, I am getting the error below:

mothur > unique.seqs(fasta=uic.trim.contigs.good.fasta )


[ERROR]: You already have a sequence named M02127_258_000000000-B4RL2_1_1101_16099_2069 in your fasta file, sequence names must be unique, please correct. This error is repeated for all of the sequences that I will not add to this post. This seems odd that this command would require unique sequences.

The details for the mothur version used are:
Linux version
Using ReadLine
Running 64Bit Version
mothur v.1.39.5

Thanks for your assistance,
Stephanie

1 Like

What are the commands that you are running upstream of this command?

The previous commands were:

make.contigs(file=uic.files, processors=8)

summary.seqs(fasta=current)

screen.seqs(fasta=current, group=current, summary=current, maxambig=0, maxlength=275)

summary.seqs(fasta=current)

get.current()

unique.seqs(fasta=current)
Using uic.trim.contigs.good.fasta as input file for the fasta parameter.

Best,
Stephanie

I also have run these same set of commands on the newest version for Mac using 1 processor, in which I still get the same error, except for only a large subset of sequences, and not every sequence.

Could you have duplicate files in your uic.files file? If not, can you send your input files and log file to mothur.bugs@gmail.com?

I have been getting this same error recently. When I ran my data using v 1.39.5 the error occurred after make.contigs and identified 68 repeated sequence names. However, when I updated to v.1.40.5 the error didn’t occur until the unique.seqs command and identified thousands of repeated sequence names.


Thank you for the help,

Kelsy

Could you send your input files and logfile to mothur.bugs@gmail.com?

Hi there,

Has this issue beeen fixed? I’m getting the same error with v. 1.41.1 after running unique.seqs. These are the code and errors:

make.contigs(file=dio.files, checkorient=T)

summary.seqs(fasta=dio.trim.contigs.fasta)

screen.seqs(fasta=dio.trim.contigs.fasta, group=dio.contigs.groups, summary=dio.trim.contigs.summary, maxambig=0, maxlength=301, maxhomop=8)

summary.seqs(fasta=dio.trim.contigs.good.fasta)

unique.seqs(fasta=dio.trim.contigs.good.fasta)
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_2109_20418_2415 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_1101_5910_25285 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_2103_11121_6885 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_2110_3694_9213 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_1110_12330_22908 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_1111_24820_9625 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_2111_22903_5708 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_2114_7703_9852 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_1108_15091_17664 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_2108_5225_20067 in your fasta file, sequence names must be unique, please correct.
[ERROR]: You already have a sequence named M03613_79_000000000-BN6L2_1_2102_12063_5957 in your fasta file, sequence names must be unique, please correct.

I indeed checked the output after make.contigs (dio.trim.contigs.fasta) and there are indeed some duplicates. However, there are no duplicate samples in my dio.files file.

Thanks in advance for your help.

Following this issue further, it seems that there are output differences between Mothur versions.

I ran the following command using raw fastq files (not fastq.gz files):

make.contigs(file=dio.files, checkorient=T)

When I run this command in v. 1.39.0, the number of sequences in the dio.trim.contigs.fasta and dio.contigs.groups files is the same (in my case: 8,131,113). Now, if I run the same command in v. 1.41.1, the number of sequences in the dio.trim.contigs.fasta file is 8,131,113, whereas it outputs 8,129,290 sequences in the dio.contigs.groups file. Clearly, there is something wrong with this command in v. 1.41.1.

I went into the SOP and next ran screen.seqs with the two Mothur versions, using the outputs of v. 1.39.0 above (because those of v. 1.41.1 were wrong), as follows:

screen.seqs(fasta=dio.trim.contigs.fasta, group=dio.contigs.groups, summary=dio.trim.contigs.summary, maxambig=0, maxlength=301, maxhomop=8)

However, both v. 1.39.0 and v. 1.41.1 gave me wrong dio.contigs.good.groups files. Let’s check the number of sequences being trimmed and clean:

dio.trim.contigs.bad.accnos: 4,666,759 (I know, too many crapy sequences, but let’s forget that for the sake of this discussion).

dio.trim.contigs.good.fasta: 3,464,354

The sum of these two is 8,131,113, the correct number of total sequences after make.contigs. I expected then 3,464,354 sequences in the dio.contigs.good.groups file, but instead I got 3,465,917 with v. 1.39.0 or 3,464,094 in v. 1.41.1. In any case, I can’t go further with the SOP since in the unique.seqs command it always tells me that the fasta and groups files differ.

Any idea of what’s going on here?

The group file contains only unique names, but it looks like your fasta file contains duplicates. Can you try this?

mothur > list.seqs(fasta=dio.trim.contigs.good.fasta) - list all the unique names in the fasta file
mothur > get.seqs(fasta=current, accnos=current) - remove any duplicate names, you will see a bunch of warnings
mothur > unique.seqs(fasta=current) - should process without duplicate sequences error

If the resulting fasta file does not match the group file could you send the dio.trim.contigs.fasta and did.contigs.groups files to mothur.bugs@gmail.com?

1 Like

I seem to be having this problem with version 1.45.3. I am positive the .files does not contain duplicate group names, and everything is paired with the correct fast.gz file.

make.contigs(file=16slamb.files, processors=16)

summary.seqs(fasta=16slamb.trim.contigs.fasta)

screen.seqs(fasta=16slamb.trim.contigs.fasta, group=16slamb.contigs.groups, maxambig=0, maxlength=575)

unique.seqs(fasta=16slamb.trim.contigs.good.fasta)

[ERROR]: You already have a sequence named M01398_94_000000000-JP475_1_2104_15478_11022 in your fasta file, sequence names must be unique, please correct.

count.seqs(name=16slamb.trim.contigs.good.names, group=16slamb.contigs.good.groups)

[ERROR]: Your count table contains more than 1 sequence named M01398_94_000000000-JP475_1_2104_21625_19031, sequence names must be unique. Please correct.

I will try the suggestion above, but I thought I would let you all know, it could be a bug or something with the fastq.gz files. I am processing sequences from a facility that is using a different protocol than what I am used to

" primer set 341F 806R was used which covers most of the V3-V4 region. This region is about 460bp.The sequencing length is 2 x 250 bp. For some samples that need additional reads, we may pool the samples with other samples that need 2 x 300 bp sequencing length. Illumina’s proprietary software will combine the result of the paired-end reads, no matter if it is 2x250 bp or 2x300 bp, into one sequence"

I wonder if this could result in duplicate sequence names?

Would you mind posting this as a separate thread?