I am comparing the results I obtained to the SOP and I think my results are a little bite weird!
For example, there is 129 058 sequences before unique.seqs, and 16 477 after.
I have 583 612 before, and 486 121.
After pre-cluster and chimeres screening: 2609 unique sequences in the SOP. I still have 241 958.
I ended up with 205 332 Otu (including 182 929 otu created with only one sequence…)
Can you help me to figure out if there is a problem with my data (V3-V4, 192 fecal samples. I used cluster.split with a cutoff of 0.15, and tax level of 4)
Also, is there a command to generated a shared file excluding for example Otu with less than 2 sequences?
edit: I also notice that make.contig doesn't remove any sequences. If the input files for R1 and R2 are 40 000 reads, I end up with 40 000 reads after make.contig. Is it normal?
Hello,
Thanks for answering.
I sequenced on mi-seq (V3 reagents) V3-4 region of fecal samples (mice).
After more investigation, I believe that the poor quality of the run (only 50% of the reads >Q30) is the cause of my problem. When I compare the results of the SOP with mine:
I am surprised that the overlap between paired-end reads is an issue: with 2X300, I have ~130 bp overlap (report of make.contig).
I guess we will need to select a shorter region for the next experiment.
A last question: When I performed cluster.split with different tax level, my otu (in the taxonomy file), are the same. I would have expected, if taxlevel=3, to have only family names in this file, and when taxlevel=4, genus. This is not the case…
Yes you are right. Sorry about this.
But why are the taxonomy files identical, when I use tax level 4 ou 3? Even when I use level 3, I have otu that are identified as species.
cluster.split isn’t making otu’s at a particular taxon level, it’s splitting up sequences by taxon identification for clustering. so if you use tax=3, it will only calculate sequence dissimilarity and cluster sequences that are all id’d to the same phyla. This is a computational load reduction-it should result in roughly the same OTUs as clustering all sequences together, just in much less time.
Remember that the taxon id for an OTU tells you nothing about the level of that OTU. You could be looking at phyla level OTUs and still see an id down to species level because classify.otu classifies the one representative sequence for that OTU.
ok…
So finally how can I process to compare analysis at various taxonomic level (I would like to do stack bars with the shared file)? I think with classify.otu for phylotype, we can do it with label 2,3 or 4, but with otu, this is not clear what the label 0.03 means.
I will continue the discussion here, even if I believe the answer is somewhere on the forum (is there a bug on the forum? When I search on google, using site:mothur.org/forum " x", it says that I am not allow to search on the forum).
I decided to work only with the r1, and I would like to use mothur for single-end processing. Is it possible?