mothur

How to reduce the time in pairwise.seq

I am running pairwise.seq command with fungal sequences and would have your advice. The problem is that the dataset is too large and the command takes too long time even I used 40 processors. Time is out after 10 days…the info. of the dataset is:

 Start End  NBases     Ambigs     Polymer    NumSeqs

Minimum: 1 230 230 0 3 1
2.5%-tile: 1 230 230 0 3 650365
25%-tile: 1 230 230 0 4 6503644
Median: 1 230 230 0 5 13007288
75%-tile: 1 230 230 0 5 19510931
97.5%-tile: 1 230 230 0 7 25364210
Maximum: 1 230 230 0 8 26014574
Mean: 1 230 230 0 4

of unique seqs: 489119

total # of seqs: 26014574

My question is: is there anyway solve this problem?

Many thanks,

Hui

Hi Hzsun,
myself I have used the command dist.seqs. I guess your goal is simlar; to get a distance matrix. I may be wrong but my suggestion is to reduce workload. Run the command on only unique sequences, set cutoff to 0.10, think about what information you need your distance matrix to store and limit your workload to only this.

Sigmund

Hei, Sigmund,
Thanks for the suggestion. I guess dist.seq should be working as well. I also set up the cutoff as 0.10 in pairwise.seq command, in which the unique sequences should be used. I just want to confirm if the dist.seq would be faster than pairwise.seq to handle the same dataset.
Thanks

Hui

If the sequences are ITS and they are not aligned, then dist.seqs will not work. You’ll have to use pairwise.seqs. Sorry it is slow - we know and are working on some solutions, but they won’t be ready for a while yet. I’m afraid that with 490k sequences, any distance matrixx you generated would be gigantic. How did you get all of your ITS sequences to be 230 nt?

Pat

Hei, Pat,

Thanks for the information. We used ITS1 region for sequences and due to the large number of samples and huge dataset, we trimmed all the sequences to 230nt. This for sue affects the downstream analysis.

If it is possible to set up cutoff=0.03 in pair.wise command, which we are only interested ?
Thanks,

Hui

Hi Hui - you should be able to set cutoff=0.03

Pat