Using pre-defined OTU list

I am doing phylogenetic analysis of vaginal microbiota, using the protein-encoding cpn60 instead of 16S.

We have used an alternative approach to assembling our sequence data and have a list of OTU with associated frequencies for a number of samples with associated metadata. I would like to use mothur to calculate a-diversity measures using this dataset, but can’t see a way into the analysis pipeline with a pre-defined list of OTU.

Is there a way to access mothur without having to start at the “beginning”? I am not defining my OTU based on similarity cutoffs, is this a barrier to using mothur?

Thanks

This shouldn’t be a problem. You’d have to create either your own list, rabund, sabund, or shared files. The formats are pretty simple so it shouldn’t be too hard to convert your data to our format. Let us know if you have any problems.

OK great - now I know its possible at least!
Right now I have a table with 3 columns that I have been using in Unifrac - 1st is OTU label, 2nd is sample label, 3rd is normalized abundance of reads/OTU/sample. It is unclear to me how to transform this into one of the formats you are talking about. The wiki refers to a dataset called AmazonData but there are no examples of rabun files etc. in that folder. (Below see contents of folder…)

96_lt_column_11_amazon.dist
96_lt_column_amazon.dist
96_lt_phylip_amazon.dist
96_sq_column_amazon.dist
98_lt_phylip_amazon.dist
98_sq_phylip_amazon.dist
amazon.fasta
amazon.groups
amazon.names
amazon1.names

Any help much appreciated!

See…

http://www.mothur.org/wiki/Rabund_file
http://www.mothur.org/wiki/List_file
http://www.mothur.org/wiki/Shared_file

OK… but I don’t see how to translate my file format into those file formats.
As I said my file has three columns - here’s the first few lines:

OTULabel SampleName #reads
050 N_B_2315 2
318 N_B_2315 2
198 N_B_2315 2
233 N_B_2315 2
246 N_B_2315 2
307 N_B_2315 2
313 N_B_2315 2
420 N_B_2315 2

Since I have 250 OTU and 44 samples, would my shared file be a table with 44 rows and 250 columns representing abundance of each OTU

? N_B_2315 250 2 4 19256…

But then what is the value for the first column?? Your example looks like a % identity value.
Am I on the right track here?

Almost - the shared file will have 44 rows and 253 columns…

Column 1 - pick some label (e.g. NA - it doesn’t matter what you use as long as there are no spaces in the label)
Column 2 - sample name (e.g. N_B_2315)
Column 3 - the number of columns to follow (e.g. 250)
Columns 4-253 - the abundance of each OTU

So I can generate .rabund files for each of my samples, and a .shared file.
But I can’t get mothur to read the files - (my experience with the command line is very limited).

mothur > read.otu(rabund=2315.rabund)

Unable to open 2315.rabund

I’ve tried moving the file around (desktop, home directory or parent directory), and it says it can’t read the file.

in the directory that you’re running mothur from you should either type “dir” (if windows) or “ls” (if mac/'nix). you should see the file you want in the output.

also, if you create the shared file, when you run read.otu mothur will create all of the rabund files automatically for you.