mothur crashes using classify.seqs and UNITE database

gbont · January 23, 2016, 12:55am

hello, I am running into a problem with a dataset of ITS sequences. I ran them through the Miseq SOP (make.contigs, screen.seqs, unique.seqs, count.seqs, pre.cluster, chimera.uchime, remove.seqs) without aligning them and using needleman during the pre.cluster step. Now I reached the point where I would like to run the command classify.seqs, using the UNITE database.

During the first attempts mothur crashed. I noticed that mothur began to use a lot of RAM. Has anyone experienced this problem with UNITE?
I ran a larger 16S dataset, following the MiSeq SOP (aligning to SILVA) and the sequences were classified without such extreme memory consumption. Could the cause of mothur crashing with UNITE be that the ITS sequences are not aligned? I do not understand this because the reference files for 16S are also not aligned.

Best wishes, guido

dwaite · January 24, 2016, 2:15am

Hm, I’ve classify.seqs with the UNITE species hypothesis database from here and it worked fine, although I had to format the taxonomy file a bit. Can you paste a few lines from your database fasta and tax files? It could just be a formatting error.

You’re correct that the files don’t need to be aligned, so that won’t be the problem.

gbont · January 25, 2016, 2:12am

Thanks for your reply. You were right, it was a formatting error. I had the dataset modified with a few extra sequences. I tried it now with the original dataset and that one works.

hema8689 · July 19, 2016, 2:19pm

Dear All,
I’m too encountering a similar probelm of mothur getting crashed while running classify.seqs with UNITE database. I saw that the problem gets solved by formatting the taxonomy file. But how to format a taxonomy file? It would be great if somebody helps me in this regard

Regards,

Hema

dwaite · July 20, 2016, 2:04am

The format for the taxonomy file is just a simple text file where each line has the format:

SequenceID[\t]Domain;Phylum;Class;Order;Family;Genus;Species

That should be a tab separating the SequenceID from the taxonomy. You can have as few or as many taxonomic ranks as you like in the taxonomy string, but they must all be the same length, so if you add subclasses or superfamilies to a sequence, you need to account for this in all the other ones. The DNA sequences should just be in a standard fasta file, with matching SequenceIDs to the taxonomy file.

Topic		Replies	Views
Mothur crashing at classify.seqs Commands in mothur	1	62	June 30, 2024
classify.seqs possible bug mothur bugs	5	10302	October 8, 2014
formatting database into mothur format Commands in mothur	7	3815	September 14, 2016
creating a .taxonomy file for a customized database Commands in mothur	2	1710	April 24, 2017
Problem with classify.seqs for fungi in mothur 1.43 linux	3	711	February 12, 2020

mothur crashes using classify.seqs and UNITE database

Related topics