Pairwise.seqs taking long with ITS

Nicolas · December 13, 2020, 1:39pm

Good day
I have a frustrating little issue with the pairwise.seqs command. The command it just taking extremely long to run. Normally it doesn’t run longer than two days on my local machine, but this one that I’m trying to run took several days on my local machine. I therefore submitted it as a job on my university’s cluster. I ran out my my week’ s time last night. On Friday I submitted another job and set the processors to 46. That job is still running.
This is for an ITS dataset and the command that I’m using is:
mothur > pairwise.seqs(fasta=final.fasta, cutoff=0.05)
and for my jobsubmission:
pairwise.seqs(fasta=final.fasta, cutoff=0.05, processors=48)

If I go and check in the folder that the job is running in, I see that there is already a final.dist file, but I’m assuming this file is not ready to work with until the job has finished running?

This problem is seriously affecting my timeline for my worflow and I would therefore kindly appreciate any insight regarding this matter.

Best
Nicolas

pschloss · December 14, 2020, 6:14pm

Hi,

I’m sorry it’s taking so long. How many sequences do you have to analyze? I’m not sure that we can speed it up any for you, but suspect you have a lot more sequences than you typically do.

Thanks,
Pat

Nicolas · December 14, 2020, 9:04pm

Thanks for getting back to me on this, Pat. My final fasta file has 10370208 seqeunces. Yes, I think this is a bit more than I normally have.

Best
Nicolas

pschloss · December 17, 2020, 12:13pm

Sorry - with over 10 million unique sequences, I’m afraid it’s going to be slow going

Pat

leocadio · December 17, 2020, 6:41pm

And, please notice that your distances file is going to be huge - likely several TB.

Nicolas · December 18, 2020, 11:03am

Thanks for the responses, Pat and Leocadio.
That is good to know. Would you say that those numbers look a bit unusual for 300 soil samples? This is just for ITS

leocadio · December 21, 2020, 12:10am

I think that, without doing any denoising, that wouldn’t be that strange

Nicolas · December 24, 2020, 6:18am

Thanks for the feedback, everyone

Topic		Replies	Views
Are my pairwise.seqs results normal? Commands in mothur	2	668	July 7, 2019
How to reduce the time in pairwise.seq	5	431	June 21, 2021
Pairwise.seq with version 1.46.1	1	408	October 1, 2021
Dist.seqs of 700 000 illumina sequences Commands in mothur	4	4459	March 31, 2013
Applying pairwise.seqs for ITS1-ITS2 Commands in mothur	6	51	January 16, 2025

Pairwise.seqs taking long with ITS

Related topics