Cutoff value for the rarefaction curve

Dear All,

I am trying to generate rarefaction curve at cutoff 0.20. I have generated the dist file at cutoff 0.20 using the command

dist.seqs(fasta=goodtrim.good.good.filter.unique.fasta,cutoff=0.20, processors=2)

and later on did clustering also at cutoff 0.20 using the command

cluster(column=goodtrim.good.good.filter.unique.dist,name=goodtrim.good.good.filter.names,cutoff=0.20)
(Note that after Reading my matrix, I see “changed cutoff to 0.0903472”)

Finally I used the following command to generate data for the rarefaction curve

make.shared(list=goodtrim.good.good.filter.unique.an.list,group=goodgroup.good.groups, label=unique-0.07-0.10-0.20)

but in the output RAREFACTION File I can see values for unique, 0.03, 0.07 and 0.09 (no value for 0.10 or 0.20).

Any suggestion on how to get the rarefaction curve at cutoff 0.20

And also what is the basic difference among the unique, lci and hci.

Cheers,

Tanvir

Hi Tanvir,


One possible solution and one possible explanation:

(1) solution: increase your cutoff value of 0.20 to 0.30 or even 0.40 in the “dist.seqs” and “cluster” command. But keep the cutoff value of 0.20 with the “make.shared” command.


(2) explanation: is possible that all your sequence do not share more than 0.0903472 of dissimilarity, they are more phylogenetically related.
**Are these sequences functional genes?*** Maybe you are working with very conserved functional genes.
Hope this help

The reason the cutoff is changing is because you are using average neighbor with a cutoff. When you use a cutoff mothur ignores distances above the cutoff. Then when the averaging occurs, there may be attempts to average two numbers - one above and one below the cutoff. Clearly it can’t average with the number above the cutoff because it’s gone, and so the cluster command adjusts the cutoff down. If you want to get 0.20, then you can do as Vicente suggests and adjust the cutoff up. Alternatively, you can run dist.seqs(output=phylip); cluster.classic(phylip=…, name=…).

As an aside, I’m personally not so convinced anymore that these high cutoffs are really useful. It may be more useful to run phylotype and classify sequences into OTUs at the phylum, class, order levels and perform rarefaction analysis on those data.

Pat

Thanks. I will increase the cutoff as you have suggested, and as you predicted, I am working on some highly conserved functional genes.

Best regards,

Tanvir





[quote="vingomez"] Hi Tanvir,
One possible solution and one possible explanation:

(1) solution: increase your cutoff value of 0.20 to 0.30 or even 0.40 in the “dist.seqs” and “cluster” command. But keep the cutoff value of 0.20 with the “make.shared” command.


(2) explanation: is possible that all your sequence do not share more than 0.0903472 of dissimilarity, they are more phylogenetically related.
**Are these sequences functional genes?*** Maybe you are working with very conserved functional genes.
Hope this help [/quote]

Thanks Pat for making things clear to me. I am using high cutoff, simply to compare the findings with those generated at less cutoff.

Best regards,

Tanvir




[quote="pschloss"] The reason the cutoff is changing is because you are using average neighbor with a cutoff. When you use a cutoff mothur ignores distances above the cutoff. Then when the averaging occurs, there may be attempts to average two numbers - one above and one below the cutoff. Clearly it can't average with the number above the cutoff because it's gone, and so the cluster command adjusts the cutoff down. If you want to get 0.20, then you can do as Vicente suggests and adjust the cutoff up. Alternatively, you can run dist.seqs(output=phylip); cluster.classic(phylip=..., name=...).

As an aside, I’m personally not so convinced anymore that these high cutoffs are really useful. It may be more useful to run phylotype and classify sequences into OTUs at the phylum, class, order levels and perform rarefaction analysis on those data.

Pat
[/quote]

I would like to ask, regarding rarefaction analysis, what is the suitable cutoff to perform at the phylum, class, order levels for 3 samples which have around 100+ sequences each sample.

How to choose the best cutoff for the graph and to be explain in the journal for bacterial diversity in soil sites?

Thank you so much

If you want to do rarefaction at the phylum, class, etc levels, the best thing to do is to use the phylotype command to cluster sequences into those taxonomic levels. There is no way to map distance-based cutoffs to taxonomic levels.

Thank you for your advice. I already used phylotype command to cluster my samples into taxonomix level. Now I got data in tx.list, tx.rabund and tx.sabund. May I know what is the next step to make rarefaction curve graph? Is it by using sabund data or i need to use this file into rarefaction.single command? Please enlighten me.

Thank you so much again

If your data are all from the same sample then you can run rarefaction.single(sabund=whatever.sabund). If they are from multiple samples, then you need to run make.shared(list=, group=) and then rarefaction.single(shared=).

Pat

Thank you.

I already been able to get the data.

May I know after I run rarefraction.single command, the label such as 0.01, 0.02 and 0.03 what is that suppose to mean?

Thank you

These are the distance-based cutoffs that were used to form the OTUs. You likely want from the 0.03 cutoff.