Hi
I am getting the following error while trying to process query sequences (nifH) using classify.seqs with custom nifH reference and taxonomy files. I have attached the dropbox link of query, reference and taxonomy files nifH (query, reference, taxonomy files) and screenshot of error. I used mothur v.1.40.5.
For example,
AHJU02000025 is already in your taxonomy file, names must be unique
‘AHJU02000025’ is in your template file and is not in your taxonomy file. Please correct.
In a crux for all 100 entries present in reference and taxonomy were shown in error.
Dear Westcott,
Thanks. I have used the same mothur v 1.40.5 for mcrA (taxonomy and reference) files given in 10.1016/j.mimet.2014.05.006 to verify whether any issue is there due to version. It is running properly without any problem. I will send the files to your mail Id. I ll also try to run in newest version. Thanks
Regards,
Dinesh S L
Thanks for sending your files. The references you sent include duplicate lines for several sequences. For example, looking at sequence CP020898, it is located on lines 19, 43 and 55. Mothur expects the reference sequences to be unique. Removing the duplicates from both files can be down with the list.seqs and get.seqs commands. Here’s how:
mothur > list.seqs(fasta=nifH100.fasta) - list unique names in fasta file
mothur > get.seqs(fasta=current, taxonomy=nifH100.taxonomy) - select only unique names from references. Command will generate a bunch of warnings about the duplicate names
mothur > classify.seqs(fasta=nifHqueryseqs.fas, reference=nifH100.pick.fasta, taxonomy=nifH100.pick.taxonomy) - classify your sequences using references without duplicates