Are my number of sequences and OTU weird?

nathalie.cote · November 14, 2016, 7:43pm

Hello!

I am comparing the results I obtained to the SOP and I think my results are a little bite weird!
For example, there is 129 058 sequences before unique.seqs, and 16 477 after.
I have 583 612 before, and 486 121.

After pre-cluster and chimeres screening: 2609 unique sequences in the SOP. I still have 241 958.

I ended up with 205 332 Otu (including 182 929 otu created with only one sequence…)
Can you help me to figure out if there is a problem with my data (V3-V4, 192 fecal samples. I used cluster.split with a cutoff of 0.15, and tax level of 4)

Also, is there a command to generated a shared file excluding for example Otu with less than 2 sequences?

edit: I also notice that make.contig doesn't remove any sequences. If the input files for R1 and R2 are 40 000 reads, I end up with 40 000 reads after make.contig. Is it normal?

Thank you

Kendra · November 15, 2016, 10:14pm

What are your samples-fecal, soil,…? What are your sequences-v4, v34, miseq, hiseq?

nathalie.cote · November 15, 2016, 11:19pm

Hello,
Thanks for answering.
I sequenced on mi-seq (V3 reagents) V3-4 region of fecal samples (mice).

After more investigation, I believe that the poor quality of the run (only 50% of the reads >Q30) is the cause of my problem. When I compare the results of the SOP with mine:

After make.contig
Mothur SOP: 152360
Me: 7640334

screen.seqs
Mothur SOP: 129058
Me: 583612

% reads kept
Mothur SOP: 85
Me: 8

after unique.seqs
Mothur SOP:16477
Me: 486121

% unique
Mothur SOP: 13
Me: 83

What is your feeling about it?

Kendra · November 16, 2016, 3:42pm

Run quality and sequencing too long of an insert. Pat has written about this issue http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix/

I think you are left with phylotype analysis as the only way to try to salvage the data.

nathalie.cote · November 16, 2016, 6:14pm

I am surprised that the overlap between paired-end reads is an issue: with 2X300, I have ~130 bp overlap (report of make.contig).

I guess we will need to select a shorter region for the next experiment.

A last question: When I performed cluster.split with different tax level, my otu (in the taxonomy file), are the same. I would have expected, if taxlevel=3, to have only family names in this file, and when taxlevel=4, genus. This is not the case…

Thanks again!

Kendra · November 16, 2016, 10:24pm

taxlevel=3 is phylum (root, kingdom, phylum). 4 is class

nathalie.cote · November 17, 2016, 12:33am

Yes you are right. Sorry about this.
But why are the taxonomy files identical, when I use tax level 4 ou 3? Even when I use level 3, I have otu that are identified as species.

Kendra · November 17, 2016, 3:42pm

cluster.split isn’t making otu’s at a particular taxon level, it’s splitting up sequences by taxon identification for clustering. so if you use tax=3, it will only calculate sequence dissimilarity and cluster sequences that are all id’d to the same phyla. This is a computational load reduction-it should result in roughly the same OTUs as clustering all sequences together, just in much less time.

Remember that the taxon id for an OTU tells you nothing about the level of that OTU. You could be looking at phyla level OTUs and still see an id down to species level because classify.otu classifies the one representative sequence for that OTU.

nathalie.cote · November 17, 2016, 4:53pm

ok…
So finally how can I process to compare analysis at various taxonomic level (I would like to do stack bars with the shared file)? I think with classify.otu for phylotype, we can do it with label 2,3 or 4, but with otu, this is not clear what the label 0.03 means.

Kendra · November 17, 2016, 5:50pm

0.03 means 3% sequence dissimilarity or 97% sequence similarity. You can use *.tax.summary to make your bar graphs

nathalie.cote · November 21, 2016, 3:03pm

Thanks, I will know work on phylotype and maybe use only the r1 reads, for which the quality is a bit better…

nathalie.cote · November 21, 2016, 5:52pm

I will continue the discussion here, even if I believe the answer is somewhere on the forum (is there a bug on the forum? When I search on google, using site:mothur.org/forum " x", it says that I am not allow to search on the forum).

I decided to work only with the r1, and I would like to use mothur for single-end processing. Is it possible?

Thanks

Topic		Replies	Views
OTUs number too high Theory behind mothur	7	8545	January 26, 2016
Too much OTUs Commands in mothur	1	1839	October 12, 2015
I get too many OTUs. Like almost same as sequence number. Commands in mothur	3	1062	April 24, 2017
Unique nseq & a lot of "Bacteria; unlcassified" Commands in mothur	1	2418	March 30, 2015
Number of OTUs in "shared" file #637	3	519	July 2, 2019

Are my number of sequences and OTU weird?

Related topics