Cutoff value for the rarefaction curve

tanvirahman · August 10, 2012, 1:03pm

Dear All,

I am trying to generate rarefaction curve at cutoff 0.20. I have generated the dist file at cutoff 0.20 using the command

dist.seqs(fasta=goodtrim.good.good.filter.unique.fasta,cutoff=0.20, processors=2)

and later on did clustering also at cutoff 0.20 using the command

cluster(column=goodtrim.good.good.filter.unique.dist,name=goodtrim.good.good.filter.names,cutoff=0.20)
(Note that after Reading my matrix, I see “changed cutoff to 0.0903472”)

Finally I used the following command to generate data for the rarefaction curve

make.shared(list=goodtrim.good.good.filter.unique.an.list,group=goodgroup.good.groups, label=unique-0.07-0.10-0.20)

but in the output RAREFACTION File I can see values for unique, 0.03, 0.07 and 0.09 (no value for 0.10 or 0.20).

Any suggestion on how to get the rarefaction curve at cutoff 0.20

And also what is the basic difference among the unique, lci and hci.

Cheers,

Tanvir

vingomez · August 10, 2012, 1:48pm

Hi Tanvir,

One possible solution and one possible explanation:

(1) solution: increase your cutoff value of 0.20 to 0.30 or even 0.40 in the “dist.seqs” and “cluster” command. But keep the cutoff value of 0.20 with the “make.shared” command.

(2) explanation: is possible that all your sequence do not share more than 0.0903472 of dissimilarity, they are more phylogenetically related.
**Are these sequences functional genes?*** Maybe you are working with very conserved functional genes.
Hope this help

pschloss · August 10, 2012, 2:20pm

The reason the cutoff is changing is because you are using average neighbor with a cutoff. When you use a cutoff mothur ignores distances above the cutoff. Then when the averaging occurs, there may be attempts to average two numbers - one above and one below the cutoff. Clearly it can’t average with the number above the cutoff because it’s gone, and so the cluster command adjusts the cutoff down. If you want to get 0.20, then you can do as Vicente suggests and adjust the cutoff up. Alternatively, you can run dist.seqs(output=phylip); cluster.classic(phylip=…, name=…).

As an aside, I’m personally not so convinced anymore that these high cutoffs are really useful. It may be more useful to run phylotype and classify sequences into OTUs at the phylum, class, order levels and perform rarefaction analysis on those data.

Pat

tanvirahman · August 10, 2012, 3:22pm

Thanks. I will increase the cutoff as you have suggested, and as you predicted, I am working on some highly conserved functional genes.

Best regards,

Tanvir

[quote="vingomez"] Hi Tanvir,
One possible solution and one possible explanation:

(1) solution: increase your cutoff value of 0.20 to 0.30 or even 0.40 in the “dist.seqs” and “cluster” command. But keep the cutoff value of 0.20 with the “make.shared” command.

(2) explanation: is possible that all your sequence do not share more than 0.0903472 of dissimilarity, they are more phylogenetically related.
**Are these sequences functional genes?*** Maybe you are working with very conserved functional genes.
Hope this help [/quote]

tanvirahman · August 10, 2012, 3:29pm

Thanks Pat for making things clear to me. I am using high cutoff, simply to compare the findings with those generated at less cutoff.

Best regards,

Tanvir

[quote="pschloss"] The reason the cutoff is changing is because you are using average neighbor with a cutoff. When you use a cutoff mothur ignores distances above the cutoff. Then when the averaging occurs, there may be attempts to average two numbers - one above and one below the cutoff. Clearly it can't average with the number above the cutoff because it's gone, and so the cluster command adjusts the cutoff down. If you want to get 0.20, then you can do as Vicente suggests and adjust the cutoff up. Alternatively, you can run dist.seqs(output=phylip); cluster.classic(phylip=..., name=...).

As an aside, I’m personally not so convinced anymore that these high cutoffs are really useful. It may be more useful to run phylotype and classify sequences into OTUs at the phylum, class, order levels and perform rarefaction analysis on those data.

Pat
[/quote]

meamizuno · March 21, 2013, 3:17pm

pschloss:

The reason the cutoff is changing is because you are using average neighbor with a cutoff. When you use a cutoff mothur ignores distances above the cutoff. Then when the averaging occurs, there may be attempts to average two numbers - one above and one below the cutoff. Clearly it can’t average with the number above the cutoff because it’s gone, and so the cluster command adjusts the cutoff down. If you want to get 0.20, then you can do as Vicente suggests and adjust the cutoff up. Alternatively, you can run dist.seqs(output=phylip); cluster.classic(phylip=…, name=…).

As an aside, I’m personally not so convinced anymore that these high cutoffs are really useful. It may be more useful to run phylotype and classify sequences into OTUs at the phylum, class, order levels and perform rarefaction analysis on those data.

Pat

I would like to ask, regarding rarefaction analysis, what is the suitable cutoff to perform at the phylum, class, order levels for 3 samples which have around 100+ sequences each sample.

How to choose the best cutoff for the graph and to be explain in the journal for bacterial diversity in soil sites?

Thank you so much

pschloss · March 21, 2013, 7:30pm

If you want to do rarefaction at the phylum, class, etc levels, the best thing to do is to use the phylotype command to cluster sequences into those taxonomic levels. There is no way to map distance-based cutoffs to taxonomic levels.

meamizuno · March 24, 2013, 10:38am

Thank you for your advice. I already used phylotype command to cluster my samples into taxonomix level. Now I got data in tx.list, tx.rabund and tx.sabund. May I know what is the next step to make rarefaction curve graph? Is it by using sabund data or i need to use this file into rarefaction.single command? Please enlighten me.

Thank you so much again

pschloss · March 24, 2013, 5:40pm

If your data are all from the same sample then you can run rarefaction.single(sabund=whatever.sabund). If they are from multiple samples, then you need to run make.shared(list=, group=) and then rarefaction.single(shared=).

Pat

meamizuno · April 8, 2013, 10:55am

Thank you.

I already been able to get the data.

May I know after I run rarefraction.single command, the label such as 0.01, 0.02 and 0.03 what is that suppose to mean?

Thank you

pschloss · April 8, 2013, 4:11pm

These are the distance-based cutoffs that were used to form the OTUs. You likely want from the 0.03 cutoff.

Topic		Replies	Views
rarefaction curve Commands in mothur	2	3276	December 2, 2011
distance cutoff in the clusters Commands in mothur	2	2981	March 23, 2011
change in rarefaction output Commands in mothur	3	3249	December 4, 2013
clustered with 0.03.. out put returned 0.01 Commands in mothur	1	1050	July 19, 2016
change in clustering cutoff Commands in mothur	3	2724	September 11, 2013

Cutoff value for the rarefaction curve

Related topics