I am running pairwise.seq command with fungal sequences and would have your advice. The problem is that the dataset is too large and the command takes too long time even I used 40 processors. Time is out after 10 days…the info. of the dataset is:
Hi Hzsun,
myself I have used the command dist.seqs. I guess your goal is simlar; to get a distance matrix. I may be wrong but my suggestion is to reduce workload. Run the command on only unique sequences, set cutoff to 0.10, think about what information you need your distance matrix to store and limit your workload to only this.
Hei, Sigmund,
Thanks for the suggestion. I guess dist.seq should be working as well. I also set up the cutoff as 0.10 in pairwise.seq command, in which the unique sequences should be used. I just want to confirm if the dist.seq would be faster than pairwise.seq to handle the same dataset.
Thanks
If the sequences are ITS and they are not aligned, then dist.seqs will not work. You’ll have to use pairwise.seqs. Sorry it is slow - we know and are working on some solutions, but they won’t be ready for a while yet. I’m afraid that with 490k sequences, any distance matrixx you generated would be gigantic. How did you get all of your ITS sequences to be 230 nt?
Thanks for the information. We used ITS1 region for sequences and due to the large number of samples and huge dataset, we trimmed all the sequences to 230nt. This for sue affects the downstream analysis.
If it is possible to set up cutoff=0.03 in pair.wise command, which we are only interested ?
Thanks,