Dist.seqs (set RAM?)

joelio1616 · March 20, 2019, 2:45pm

Hi there,

TL;DR version.

not looking at microbiome, but songbird MHC (highly duplicated genes in this taxon)
can only use 1 processor for dist.seqs because mothur will use all 24gb of RAM on my PC and shut down if I ask the program to use more than one processor
have to use 0.0 for cut-off because one SNP may be of importance
filtered aligned FASTA file has ~2.3 million sequences
dist.seqs has been running for days, but has stopped at sequence 390,800
can you designate how much RAM for dist.seq to use, so mothur won’t close? Also, to allow me to use more processors?

I’m using mothur to determine individual MHC genes per individual, in this case, a songbird. In a sense, you can consider each individual as a bacterial community, and each MHC allele as an OTU. Since songbirds have incredibly diverse MHC class II allele repertoires, I need to use a program such as mothur.

So far I have tailored the MiSeq SOP to my MHC analysis, however, I am running into trouble at dist.seqs – the reason being is my computer has 8 processors and 24 gb of RAM, and if I run any higher than 1 processor, it will max out the RAM and shut down mothur. It’s also important to note, that I cannot have a cutoff value as even a SNP may be important in determining allelic variation at MHC, so I had to set it to 0.

Is there any way to tell this command to use a set amount of RAM so it won’t shut down, but it will also speed up? The filtered aligned FASTA file has roughly 2.3 million sequences, and it has stalled at 390,800.

Any help is appreciated.

joelio1616 · March 20, 2019, 10:24pm

Consider this resolved! I’m going to have to have to use a server for sure.

pschloss · March 21, 2019, 4:56pm

If you’re using a threshold of 0, you should be able to use make.shared with a count file:

mothur > make.shared(count=amazon.count_table, label=0.03)

That should be considerably lighter and will not require dist.seqs or cluster.

joelio1616 · March 21, 2019, 5:21pm

Hi Pat,

Apologies if this is an overly simple question – still new to using mothur.

With my filtered and aligned file, I was having difficulty trying to find the commands to remake a count table that contains the new (and less) sequences. I see one prior to these commands (formed at the start of the MiSeq SOP).

Thanks in advance!

joelio1616 · March 28, 2019, 1:49pm

So I was able to create a shared file. Thank you. However, how would I go about cross-referring the OTU names with the sequences? Which are named in this format: “M03127_554_000000000-C89WG_1_2111_6953_9784”

Thanks!

pschloss · March 28, 2019, 2:49pm

Those would be in the list file generated in the cluster commands.

joelio1616 · April 3, 2019, 9:52pm

This helped greatly! Thank you. The one issue I am having is the vast amount of OTUs (1.4 million) – which are probably mainly singletons among my samples. I found some information regarding singletons in the clustering wiki, however, I’m not too sure if that’s what I need to do. Is there a way to eliminate all of your singleton OTUs in a shared file? Due to having so many columns, I cannot open the file in a program without it stalling or shutting down.

system · April 13, 2019, 9:55pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Killed: 9 while running dist.seqs mothur bugs	4	1836	February 19, 2019
dis.seqs issue? Commands in mothur	1	2002	May 7, 2014
Produce too large amount of data when running dist.seqs Commands in mothur	8	7732	October 18, 2013
Dist.seqs running for many days/large file Commands in mothur	8	1616	April 26, 2020
mothur does not complete batch script Commands in mothur	5	2078	June 8, 2016

Dist.seqs (set RAM?)

Related topics