Cluster sequence into OTUs

Hi every one,
I intend to calculate Shannon index by using Mothur software. This is the first time I use this solfware, so at first I don’t know how to cluster my sequences into OTUs.
I have fasta files of n sequences. How could I use Mothur soflware to cluster the sequences into OTUs by using methods unweighted-pair group method using average linkage.
Thank you very much!

Hi there and welcome to the forum. I’d suggest checking out either the MiSeq or 454 SOPs and adapting the initial steps to your data. Are these Sanger, 454, or MiSeq? Do you have quality score data to accompany them?

Thank you very much!
My sequence data is Sanger. I don’t have quality score data. Actually, I sent the clones to company to run sequencing of the clones. I just received data with comment is good from that company.
I have looked at the MiSeq_SOP, but might be it different from my data, I only use forward primer for doing sequencing but for MiSeq I see you have two data for 1 sample (1 forward and 1 reverse data). So could you help me to sole this problem?

I’d suggest looking at the MiSeq and 454 SOPs, get a sense of what is going on and then see where you can start the processes.

Pat

Dear Pat,
I use primer set 27f and 1492r for pcr experiment. So if I use pcr.seqs command for silva Release 102, which start and end point should I use. Would you mind giving me a suggested?
Have nice day,
Pham

I’m pretty sure its 1044 and 47956

Thank you very much.
I have already run Mothur for my sequences. I align my sequencse with reference database ( silva.bacteria.fasta). However, after using align.seqs commands, I use screen.seqs to get data overlaps the same region (start=5158, end=42072). As the result, only 13 sequences were observed ( my original data contains 52 sequences). If I continues to calculate Shannon index, is this result affect to accurate of Shannon index value? Would you mind giving me an suggestion?
Have a nice day,
Pham.

Dear Pat,
I use dist.seqs command for my aligned sequence. After applying this command, the Mothur said that It took 0 to calculate distances for … sequences. I wonder if this is fine or not. Could you help me?
Have a nice day,
Pham

Can you post the output of running summary.seqs from before you run screen.seqs?

Dear Pat,
This is output of summary file after I align my sequences with references database Silva.bacteria.fasta (start: 1044, end: 47956): http://www.imageupload.co.uk/image/506T
Because, the aligned sequenced have different length so I already removed the sequences which start from 42007. Then I use filter.seqs and dist.seqs command, but I got the same comment from Mothur:" it took 0 to calculate … sequence.
Here is summary file after I remove the sequences which start from 42007: http://www.imageupload.co.uk/image/5060.
Have a nice day,
Pham

Image not found - can you just post the text output?

Dear Pat,
Would you mind showing me the way to upload file directly in the forum?
Thanks,
Pham

Dear Pat,
I send you back the new file. Because I cannot upload my summary file directly to the forum.
This is output of summary file after I align my sequences with references database Silva.bacteria.fasta (start: 1044, end: 47956): https://www.flickr.com/photos/29695869@N03/15693905789/in/photostream/
Because, the aligned sequenced have different length so I already removed the sequences which start from 42007. Then I use filter.seqs and dist.seqs command, but I got the same comment from Mothur:" it took 0 to calculate … sequence.
Here is summary file after I remove the sequences which start from 42007: https://www.flickr.com/photos/29695869@N03/15693905809/
Have a nice day,
Pham

Can you just copy and paste the text that is outputted to the screen after you run summary.seqs? I don’t want the summary file itself, just the summary table that is put on the screen after running the command.

Dear Pat,
This is output of summary file after I align my sequences with references database Silva.bacteria.fasta (start: 1044, end: 47956):
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 2 4 1 0 1 2
25%-tile: 2 36649 5 0 2 14
Median: 2 39117 1106 0 5 27
75%-tile: 41977 42072 1144 0 7 40
97.5%-tile: 42071 42072 1187 0 7 51
Maximum: 42072 42072 1196 0 7 52
Mean: 10513.8 31274.7 617.212 0 4.13462

of Seqs: 52

Because, the aligned sequenced have different length so I already removed the sequences which start from 42007. Then I use filter.seqs and dist.seqs command, but I got the same comment from Mothur:" it took 0 to calculate … sequence.
Here is summary file after I remove the sequences which start from 42007:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 2 36638 1083 0 4 1
2.5%-tile: 2 36638 1083 0 4 1
25%-tile: 2 38307 1123 0 5 8
Median: 2 39117 1140 0 7 15
75%-tile: 2 39272 1157 0 7 22
97.5%-tile: 2 39297 1196 0 7 28
Maximum: 2 39297 1196 0 7 28
Mean: 2 38544.3 1140.86 0 6.07143

of Seqs: 28

Have a nice day,
Pham

I suspect your sequences are backwards. Can you try doing flip=T in align.seqs or using the opposite reverse option that you are using in trim.seqs?

this is sweet

Dear Pat,
I have already set flip=T in align.seqs, but the problem didn’t fix.
So I intend to using trim.seqs, but because my sequences are Sanger sequences so I don’t have oligos file and barcodes for each sequence, So could you suggest me the way how to trim.seqs command for sanger sequences or how to create oligos file?
Thank you very much.
Best whishes,
Pham.

That is why we need people to tell us what they actually did and the type of data they have.

Can you post the output of summary.seqs from the file generated when running align.seqs?

Dear Pat,
This is output of summary file after I align my sequences with references database Silva.bacteria.fasta (start: 1044, end: 47956):
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 2 4 1 0 1 2
25%-tile: 2 36649 5 0 2 14
Median: 2 39117 1106 0 5 27
75%-tile: 41977 42072 1144 0 7 40
97.5%-tile: 42071 42072 1187 0 7 51
Maximum: 42072 42072 1196 0 7 52
Mean: 10513.8 31274.7 617.212 0 4.13462

of Seqs: 52

I have realized that there is nearly half of sequencing results of the clone inserted in reverse orientation, could I modify these sequences to continuously use Mothur for calculating Shannon index or I have to remove them.
Thank you very much.
Have a nice day,
Pham.