Cluster sequence into OTUs

AlexPham · October 21, 2014, 5:55am

Hi every one,
I intend to calculate Shannon index by using Mothur software. This is the first time I use this solfware, so at first I don’t know how to cluster my sequences into OTUs.
I have fasta files of n sequences. How could I use Mothur soflware to cluster the sequences into OTUs by using methods unweighted-pair group method using average linkage.
Thank you very much!

pschloss · October 21, 2014, 4:59pm

Hi there and welcome to the forum. I’d suggest checking out either the MiSeq or 454 SOPs and adapting the initial steps to your data. Are these Sanger, 454, or MiSeq? Do you have quality score data to accompany them?

AlexPham · October 22, 2014, 1:10am

Thank you very much!
My sequence data is Sanger. I don’t have quality score data. Actually, I sent the clones to company to run sequencing of the clones. I just received data with comment is good from that company.
I have looked at the MiSeq_SOP, but might be it different from my data, I only use forward primer for doing sequencing but for MiSeq I see you have two data for 1 sample (1 forward and 1 reverse data). So could you help me to sole this problem?

pschloss · October 24, 2014, 7:21pm

I’d suggest looking at the MiSeq and 454 SOPs, get a sense of what is going on and then see where you can start the processes.

Pat

AlexPham · November 7, 2014, 6:16am

Dear Pat,
I use primer set 27f and 1492r for pcr experiment. So if I use pcr.seqs command for silva Release 102, which start and end point should I use. Would you mind giving me a suggested?
Have nice day,
Pham

pschloss · November 11, 2014, 6:06pm

I’m pretty sure its 1044 and 47956

AlexPham · November 12, 2014, 2:20am

Thank you very much.
I have already run Mothur for my sequences. I align my sequencse with reference database ( silva.bacteria.fasta). However, after using align.seqs commands, I use screen.seqs to get data overlaps the same region (start=5158, end=42072). As the result, only 13 sequences were observed ( my original data contains 52 sequences). If I continues to calculate Shannon index, is this result affect to accurate of Shannon index value? Would you mind giving me an suggestion?
Have a nice day,
Pham.

AlexPham · November 17, 2014, 5:29am

Dear Pat,
I use dist.seqs command for my aligned sequence. After applying this command, the Mothur said that It took 0 to calculate distances for … sequences. I wonder if this is fine or not. Could you help me?
Have a nice day,
Pham

pschloss · November 19, 2014, 2:52pm

Can you post the output of running summary.seqs from before you run screen.seqs?

AlexPham · November 20, 2014, 6:07am

Dear Pat,
This is output of summary file after I align my sequences with references database Silva.bacteria.fasta (start: 1044, end: 47956): http://www.imageupload.co.uk/image/506T
Because, the aligned sequenced have different length so I already removed the sequences which start from 42007. Then I use filter.seqs and dist.seqs command, but I got the same comment from Mothur:" it took 0 to calculate … sequence.
Here is summary file after I remove the sequences which start from 42007: http://www.imageupload.co.uk/image/5060.
Have a nice day,
Pham

pschloss · November 24, 2014, 8:57pm

Image not found - can you just post the text output?

AlexPham · November 25, 2014, 4:22am

Dear Pat,
Would you mind showing me the way to upload file directly in the forum?
Thanks,
Pham

AlexPham · November 26, 2014, 12:33am

Dear Pat,
I send you back the new file. Because I cannot upload my summary file directly to the forum.
This is output of summary file after I align my sequences with references database Silva.bacteria.fasta (start: 1044, end: 47956): https://www.flickr.com/photos/29695869@N03/15693905789/in/photostream/
Because, the aligned sequenced have different length so I already removed the sequences which start from 42007. Then I use filter.seqs and dist.seqs command, but I got the same comment from Mothur:" it took 0 to calculate … sequence.
Here is summary file after I remove the sequences which start from 42007: https://www.flickr.com/photos/29695869@N03/15693905809/
Have a nice day,
Pham

pschloss · December 1, 2014, 3:03pm

Can you just copy and paste the text that is outputted to the screen after you run summary.seqs? I don’t want the summary file itself, just the summary table that is put on the screen after running the command.

AlexPham · December 2, 2014, 3:17am

Dear Pat,
This is output of summary file after I align my sequences with references database Silva.bacteria.fasta (start: 1044, end: 47956):
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 2 4 1 0 1 2
25%-tile: 2 36649 5 0 2 14
Median: 2 39117 1106 0 5 27
75%-tile: 41977 42072 1144 0 7 40
97.5%-tile: 42071 42072 1187 0 7 51
Maximum: 42072 42072 1196 0 7 52
Mean: 10513.8 31274.7 617.212 0 4.13462

of Seqs: 52

Because, the aligned sequenced have different length so I already removed the sequences which start from 42007. Then I use filter.seqs and dist.seqs command, but I got the same comment from Mothur:" it took 0 to calculate … sequence.
Here is summary file after I remove the sequences which start from 42007:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 2 36638 1083 0 4 1
2.5%-tile: 2 36638 1083 0 4 1
25%-tile: 2 38307 1123 0 5 8
Median: 2 39117 1140 0 7 15
75%-tile: 2 39272 1157 0 7 22
97.5%-tile: 2 39297 1196 0 7 28
Maximum: 2 39297 1196 0 7 28
Mean: 2 38544.3 1140.86 0 6.07143

of Seqs: 28

Have a nice day,
Pham

pschloss · December 4, 2014, 6:22pm

I suspect your sequences are backwards. Can you try doing flip=T in align.seqs or using the opposite reverse option that you are using in trim.seqs?

sternml700 · December 8, 2014, 4:35pm

this is sweet

AlexPham · December 19, 2014, 2:20am

Dear Pat,
I have already set flip=T in align.seqs, but the problem didn’t fix.
So I intend to using trim.seqs, but because my sequences are Sanger sequences so I don’t have oligos file and barcodes for each sequence, So could you suggest me the way how to trim.seqs command for sanger sequences or how to create oligos file?
Thank you very much.
Best whishes,
Pham.

pschloss · December 20, 2014, 3:31pm

That is why we need people to tell us what they actually did and the type of data they have.

Can you post the output of summary.seqs from the file generated when running align.seqs?

AlexPham · December 22, 2014, 12:24am

Dear Pat,
This is output of summary file after I align my sequences with references database Silva.bacteria.fasta (start: 1044, end: 47956):
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 2 4 1 0 1 2
25%-tile: 2 36649 5 0 2 14
Median: 2 39117 1106 0 5 27
75%-tile: 41977 42072 1144 0 7 40
97.5%-tile: 42071 42072 1187 0 7 51
Maximum: 42072 42072 1196 0 7 52
Mean: 10513.8 31274.7 617.212 0 4.13462

of Seqs: 52

I have realized that there is nearly half of sequencing results of the clone inserted in reverse orientation, could I modify these sequences to continuously use Mothur for calculating Shannon index or I have to remove them.
Thank you very much.
Have a nice day,
Pham.

Topic		Replies	Views
Clustering at 98% identity threshold level Commands in mothur	2	846	August 9, 2017
Average Clustering of ~10k unique V6 sequences Commands in mothur	10	8762	May 27, 2011
Blank file after summary.seqs mothur bugs	24	12033	July 2, 2015
difference between total seq numbers Theory behind mothur	2	1496	November 9, 2017
Unique nseq & a lot of "Bacteria; unlcassified" Commands in mothur	1	2430	March 30, 2015

Cluster sequence into OTUs

of Seqs: 52

of Seqs: 28

of Seqs: 52

Related topics