creating a tree for UniFrac


We are trying to create a tree in order to perform UniFrac analysis. We used our .dist file in the first command out of the three (cluster, make.shared, and tree.shared) - we got the following error message:
“expected a number and got xxx. I suspect you entered a column formatted file as a phylip file, aborting.
No valid current files. You must provide a phylip or column file before you can use the cluster command.”

Thank you for your help

By default the dist.seqs command creates a column formatted distance matrix, It sounds like you entered the column file format under the phylip parameter.

You wanted:

cluster(column=yourDistanceFile, name=yourNamefile)

or if you don’t have a name file:

dist.seqs(fasta=yourFastaFile, output=lt)

Thank you for the clarification.

Now I have a new problem:
I created the dist file as you suggested but when I tried to run the cluster command ‘mothur’ could not complete the run and “killed” the process. I have only 20000 sequences after all the clean-up and pre-clustering steps so I am not sure why it cannot complete the process.
I am working on the recent version of Mothur for Linux and not sure if the problem is in the installation process.

thank you

You are most likely running out of memory. How big is the distance matrix file? How much RAM does your computer have? Are you running the 32bit or 64bit version of mothur? Did you use a cutoff? Here is a link to mothur’s memory estimates for the cluster command, And from our frequently asked questions,

Mothur crashes when I read my distance file

There are two common causes for this, file size and format.

File Size:

The cluster command loads your distance matrix into RAM, and your distance file is most likely too large to fit in RAM. There are two options to help with this. The first is to use a cutoff. By using a cutoff mothur will only load distances that are below the cutoff. If that is still not enough, there is a command called cluster.split, > > which divides the distance matrix, and clusters the smaller pieces separately. You may also be able to reduce the size of the original distance matrix by using the commands outline in the Schloss SOP, >> .

Wrong Format:

This error can be caused by trying to read a column formatted distance matrix using the phylip parameter. By default, the dist.seqs command generates a column formatted distance matrix. To make a phylip formatted matrix set the dist.seqs command parameter output to lt.

My computer is 64-bit and has 16G RAM
The dist file is 1.5G

The linux is installed on a computer that have two booting options - as linux and as windows - does that can cause problems running mothur?

Thank you

As you suggested - the cutoff limitation solved the problem

Thank you very much