Cannot update count file due to file mismatch even after using screen.seq(fasta=*.fasta,count=*.count_table)

Hi everyone
I would like to ask about an file mismatch error which pop when using summary.seq(fasta=.fasta,count=.count_table).
I used pre.cluster(fasta=,names=,group=*) to generate the current fasta file.Before using pre.cluster(), I have a previous count file.
Following MiSeq SOP on mothur wiki,I plan to use screen.seq() to update the count file by *.precluster.fasta

By MiSeq SOP on mothur wiki:

“Note that we need the count table so that we can update the table for the sequences we’re removing and we’re also using the summary file so we don’t have to figure out again all the start and stop positions:
mothur > screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=1968, end=11550, maxhomop=8)
mothur > summary.seqs(fasta=current, count=current)”

But after using screen.seq(fasta=.fasta,count=.count_table), the summary.seq() still report error “File mismatch detected, quitting command.”
“[ERROR]: Your count file contains 517431 unique sequences, but your fasta file contains 467156. File mismatch detected, quitting command.”

I also tried using fasta and name file created by pre.cluster() and previous group file to create count file.
After using screen.seq(fasta=.precluster.fasta,name=.precluster.names,group=mergegroup), count.seqs(name=current, group=current) still report file mismatch error.

My question is:
How can I update count file by fasta file?
Thanks for taking your time again! If you have any suggestions or ideas, I would love to hear it : )

Best wishes,
Coke

Following are some log imformation:

ERROR 1:

mothur > screen.seqs(processors=16,optimize=start-end, criteria=20, fasta=/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodprecluster,count=/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totalnamescount_tablegoodgoodgood)

Using 16 processors.
Optimizing start to 13862.
Optimizing end to 21780.

Output File Names:
/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodpreclustergood
/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodpreclusterbad.accnos
/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totalnamescount_tablegoodgoodgoodgood


It took 3336 secs to screen 1415619 sequences.

mothur > summary.seqs(fasta=current, count=current)
Using /home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totalnamescount_tablegoodgoodgoodgood as input file for the count parameter.
Using /home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodpreclustergood as input file for the fasta parameter.

Using 16 processors.
[ERROR]: Your count file contains 517431 unique sequences, but your fasta file contains 467156. File mismatch detected, quitting command.

ERROR 2:

mothur > count.seqs(name=current, group=current)
Using /home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/mergegroupsgoodgood as input file for the group parameter.
Using /home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodprecluster.good.names as input file for the name parameter.

Using 16 processors.
[ERROR]: processes reported processing 1443389 sequences, but group file indicates you have 50195626 sequences. Either you have a file mismatch or a process failed to complete the task assigned to it.

I might be missing something, but you aren’t using the count_table in pre.cluster. This needs to be done. I would strongly encourage you to follow the exact commands used in the MiSeq SOP.

pat

Dear pschloss
Thanks for the reply.

At the first time, I follow the miseq sop using pre.cluster(fasta,count).
When I use pre.cluster(fasta,count) there are always two kinds of problems:
(fasta~68GB and count_table~6G)

  1. The used %RAM becomes more and more high, after reaching 99% (total 256GB RAM and only 1 processor is used in pre.cluster()).Then mothur start pre.cluster.But finally, I get an error and mothur just stuck.(I used summary.seq(fasta,count) and show no problem)

"Processing group ERR1867272:
1595 1136 459
Total number of sequences before pre.cluster was 1595.
pre.cluster removed 459 sequences.

It took 1 secs to cluster 1595 sequences.
[ERROR]: Could not open 6782.outputNames.temp"

2.^\Quit (core dumped)
I would like to know how to generate core dumped file for mothur andf where it is when it shows “^\Quit (core dumped)” and stop running without generating core dump file.

I follow https://stackoverflow.com/questions/6152232/how-to-generate-core-dump-file-in-ubuntu
and I can generate core.26646. It seems that mothur does not generate core.*

I tried screen.seq(fasta,count,processors=1) and summary.seq(fasta,count) and it didnt work as well.
Maybe I should retry the pre.cluster(fasta,count,precessors=?) first?

Linux version

Using ReadLine

Running 64Bit Version

mothur v.1.39.5
Last updated: 3/20/2017

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type ‘help()’ for information on the commands that are available

For questions and analysis support, please visit our forum at https://www.mothur.org/forum

Type ‘quit()’ to exit program
Interactive Mode


mothur > screen.seqs(processors=1, fasta=/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodprecluster,count=/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totalnamescount_tablegoodgood)

Using 1 processors.

Output File Names:
/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodpreclustergood
/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodpreclusterbad.accnos
/home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totalnamescount_tablegoodgoodgood


It took 3117 secs to screen 1415619 sequences.

mothur > summary.seqs(fasta=current, count=current)
Using /home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totalnamescount_tablegoodgoodgood as input file for the count parameter.
Using /home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodpreclustergood as input file for the fasta parameter.

Using 1 processors.

quitting command…

mothur > summary.seqs(fasta=current, count=current,processors=14)
Using /home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totalnamescount_tablegoodgoodgood as input file for the count parameter.
Using /home/ubuntu/NEW/data/Moving_pictures_of_the_human_microbiome_ID_550/Mothur/totaluniquegoodgoodgoodaligngooduniquegoodpreclustergood as input file for the fasta parameter.

Using 14 processors.
[ERROR]: Your count file contains 1465894 unique sequences, but your fasta file contains 1415619. File mismatch detected, quitting command.

mothur > quit


************************************************************ ************************************************************ ************************************************************ Detected 1 [ERROR] messages, please review. ************************************************************ ************************************************************ ************************************************************

It looks like pre.cluster is running into memory issues with your dataset and the number of processors. What’s the output of summary.seqs on the files you want to run with pre.cluster? Have you tried pre.cluster with less processors?