Diversity analysis: How to build and compare OTUs from different samples with same sequence ID

Lina · June 18, 2016, 1:44pm

Dear Mothur community,

I´m a Mothur newbie and struggle a littel bit with my data.

I have 24 samples containing varying numbers of sequences. Some of them appear in several samples and have the same sequence ID.
For example:

M123
ACGGTTAG…
is in sample 1,2 and 4 but not in 3, M345 is in 2 and 3.
First, I split my fasta file into 24 fasta files containing only sequence Id and sequences of the subsample, aligned and clustred them into OTUs at 0.03. But I guess Mothur could not compute diversity of merged files containing x-times OTU001, OTU002… .
Next, I tried to sort the samples in the original fasta file using a group file, but Mothur warns that all my IDs appear more than one time in my sample.

What I want Mothur to do is: sorting my IDs samplewise, cluster them into OTUs and compute the diversity within and along all my samples.

Does anyone know how to solve that problem? Many thanks in advance!

dwaite · June 18, 2016, 10:38pm

The approach you’re doing, with the groups file, is the right way to do it. It’s just an issue of the identical sequence names - if you rename those in some way so that they’re unique then you won’t get the error from mothur.

Usually when I do this kind of thing I just write a simple script to rename each sequence as [File Name].[Sequence ID] so that they’re all unique, but you can also work out which original sequence they were.

Lina · June 20, 2016, 12:02pm

Thanks for your reply!
I renamed my subsample fasta files, merged and clustred them and used make.shared to assign OTUs to their corresponding subsample, but the number of OTUS appears much higher than clustering them samplewise.
Sample 1 contains 14 OTUs running it as a single sample, now 481 OTUs.

I guess all sequences are mixed while clustering and result in this high number of OTUs. Shell I first cluster sequences samplewise and then merge all shared folders or is it possible to determine that before?

Does anyone know help?

pschloss · June 20, 2016, 12:43pm

Hi,

You will need to merge all of your data together and cluster them together. Then you’ll generate a shared file and use summary.single to calculate your alpha diversity parameters for each sample. Have you tried to follow one of the SOPs on the wiki?

Pat

Lina · June 21, 2016, 10:20am

Yes I tried 454 and MiSeq SOP (great to get started!), but struggled with my own data. Seems to work now, thank you very much!

Topic		Replies	Views
Representative OTU Seqs in Multisample Analyses Commands in mothur	1	3145	June 24, 2010
abundant + shared otus Commands in mothur	0	3949	April 5, 2012
OTU differences when analyzing separate fastas or one file Commands in mothur	5	5969	June 23, 2010
Classify OTUs by sample using all sequences Commands in mothur	5	5981	June 20, 2014
how to get groups, sequences and numbers in the same file Commands in mothur	4	4426	August 26, 2013

Diversity analysis: How to build and compare OTUs from different samples with same sequence ID

Related topics