*.unique.filter.phylip.fn.list file should be how big?

mhmtgenc · May 2, 2016, 11:56am

I have a 26 GB in size *unique.filter.phylip.dist file and used cluster command to have *.unique.filter.phylip.fn.list … It is taking 5 hours now and have this view

********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

and still waiting. How much big will be the .list file and would it take a while more?

pschloss · May 4, 2016, 12:50pm

It looks like you are using a phylip-formatted distance matrix with furthest neighbor (why not average?). Are you using cluster.classic, cluster, or cluster.split?

If you have to run a phylip matrix like this, it could take a while depending on how similar the sequences are. I’d estimate that it could take up to 2 weeks. Alternatively, I’d suggest using the cluster.split approach as outlined in the MiSeq SOP on the wiki.

Pat

mhmtgenc · May 5, 2016, 8:34am

First of all Thenk you for your quick reply,

I use cluster. command and chose furthest randomly casue I could’t really guess which one to use. At the and I have this table which goes to only Genus LEvel but I would like to reach to species level, How could I do that Dr. Schloss? And here is the part of table I have is everything ok?

OTU Size Taxonomy
Otu00001 25137 Bacteria(100) Proteobacteria(100) Betaproteobacteria(100) Burkholderiales(100) Alcaligenaceae(100) Bordetella(99)
Otu00002 15182 Bacteria(100) Proteobacteria(100) Alphaproteobacteria(100) Rhodobacterales(100) Rhodobacteraceae(100) Paracoccus(100)
Otu00003 2758 Bacteria(100) Proteobacteria(100) Betaproteobacteria(100) Burkholderiales(100) Alcaligenaceae(100) Bordetella(99)
Otu00004 2493 Bacteria(100) Proteobacteria(100) Alphaproteobacteria(100) Rhodobacterales(100) Rhodobacteraceae(100) Paracoccus(100)
Otu00005 1516 Bacteria(100) Proteobacteria(100) Betaproteobacteria(100) Burkholderiales(100) Alcaligenaceae(100) Bordetella(77)
Otu00006 1436 Bacteria(100) Proteobacteria(100) Alphaproteobacteria(61) Rhodobacterales(52) Rhodobacteraceae(52) unclassified(100)
.
…
…

pschloss · May 10, 2016, 12:17pm

I would encourage you to follow the SOPs that are available on the wiki. The default for cluster and cluster.split is average neighbor, which is far preferred to any other method. For you other question, see your other post: How can I classify OTUs to "SPECIES" level with mothur?

Topic		Replies	Views
Problems handling a >50 Gb distance matrix (cluster command) mothur bugs	12	14734	October 18, 2013
Cluster.split and computer characteristics	7	1850	October 23, 2019
Making OTUs without distance matrix Theory behind mothur	8	848	September 29, 2019
Phylip vs. Column-based format changing downstream results Commands in mothur	2	5263	June 17, 2010
phylip distance matrix vs column distance matrix Theory behind mothur	3	6732	August 21, 2012

*.unique.filter.phylip.fn.list file should be how big?

Related topics