Hello
I don’t understand how taxonomy level (taxlevel option) is used to split distances file in the cluster.split command. Could somebody please explain it or send me somewhere I could get the explanation ?
I made a small go at looking what happen when I change taxlevel from 6 to 7 with a small dataset using a species-level well curated database. The results I get is few more hits (reads iDed) for taxlevel=6 that I don’t get into the taxlevel=7, and much less the other way around (ie hits that I get in taxlevel=7 and that I don’t with taxlevel=6). I would have expected taxlevel=7 to perform better here.
Thanks for your help!
David
Hi David,
It takes your taxonomy strings and sends sequences to different groups based on the level with the levels separated by semi-colons. You would need each level to be represented for all of the sequences to get it to perform correctly. Do you have a minimum reproducible example that you could post that shows what you’re seeing? I could take a look at it and let you know what I see…
Pat
Hi Pat
Thanks for your answer. So it first groups seqs according to the taxlevel option, and then makes clusters at the chosen cutoff into each of these ?
Here is the shape of my ref library, all levels are populated - in fact I take a wider dataset and I go for a small taxonomic part with a custom library.
MNHNL145207 Eukaryota;Annelida_6340;Clitellata_42113;Haplotaxida_6382;Lumbricidae_6392;Aporrectodea;A_caliginosa_L2;
MNHNL146587 Eukaryota;Annelida_6340;Clitellata_42113;Haplotaxida_6382;Lumbricidae_6392;Aporrectodea;A_caliginosa_L3;
MNHNL146567
Here are the command lines I use to get there (maybe the issue originates from here)
classify.seqs(fasta=mergedfastaunique, count=mergedfastacount_table, cutoff=90, reference=lumbref.fas, taxonomy=lumb5.txt)
cluster.split(fasta=current, count=current, taxonomy=current, taxlevel=6, cutoff=0.03, runsensspec=f, processors=32)
make.shared(list=current, count=current, label=0.03)
classify.otu(list=current, count=current, taxonomy=current, label=0.03)
merge.otus(constaxonomy=current, list=current)
Here is the a table of the few discrepancies I get between the two taxlevel (bottom rows are just different with count):
taxlevel=6 | taxlevel=7 | ||||||||
---|---|---|---|---|---|---|---|---|---|
sample_id | otu | count | genus | taxon | sample_id | otu | count | genus | taxon |
HS_D_M_9_6 | Otu050790 | 1 | Lumbricus | L_rubellus_L2 | |||||
HS_Esc_M_10_4 | Otu002940 | 19 | Lumbricus | L_rubellus_LP1 | |||||
HS_H_M_10_4 | Otu013040 | 1 | Aporrectodea | A_caliginosa_L2 | |||||
HS_H_M_6_14 | Otu050790 | 1 | Lumbricus | L_rubellus_L2 | |||||
HS_H_M_7_26 | Otu002940 | 2 | Lumbricus | L_rubellus_LP1 | |||||
HS_W_M_7_12 | Otu053147 | 1 | Octolasion | Octolasion_sp._BIOUG32056_A02_2474609 | |||||
M_BE_M_5_3 | Otu072321 | 2 | Aporrectodea | A_icterica | |||||
M_BI_M_10_4 | Otu002940 | 86 | Lumbricus | L_rubellus_LP1 | |||||
M_BI_M_6_28 | Otu002940 | 1 | Lumbricus | L_rubellus_LP1 | |||||
M_BI_M_8_24 | Otu013040 | 1 | Aporrectodea | A_caliginosa_L2 | |||||
M_E_M_7_26 | Otu002940 | 1 | Lumbricus | L_rubellus_LP1 | |||||
M_H_M_6_14 | Otu013040 | 3 | Aporrectodea | A_caliginosa_L2 | |||||
M_H_M_6_28 | Otu002940 | 1 | Lumbricus | L_rubellus_LP1 | |||||
M_H_M_9_20 | Otu020357 | 1 | Aporrectodea | Aporrectodea_unclassified | |||||
M_HER_M_5_3 | Otu002940 | 1 | Lumbricus | L_rubellus_LP1 | |||||
O_A_M_10_4 | Otu020357 | 1 | Aporrectodea | Aporrectodea_unclassified | |||||
O_A_M_6_28 | Otu053147 | 1 | Octolasion | Octolasion_sp._BIOUG32056_A02_2474609 | |||||
O_G_M_6_14 | Otu002940 | 1 | Lumbricus | L_rubellus_LP1 | |||||
O_HAM_M_5_17 | Otu002940 | 1 | Lumbricus | L_rubellus_LP1 | |||||
O_HAM_M_9_6 | Otu050790 | 1 | Lumbricus | L_rubellus_L2 | |||||
O_HE_M_7_12 | Otu014656 | 1 | Aporrectodea | A_caliginosa_L2 | |||||
O_HE_M_9_20 | Otu020357 | 1 | Aporrectodea | Aporrectodea_unclassified | |||||
O_HO_M_5_3 | Otu020357 | 2 | Aporrectodea | Aporrectodea_unclassified | |||||
O_HO_M_7_26 | Otu021795 | 1 | Dendrobaena | Dendrobaena_unclassified | |||||
O_HU_M_6_14 | Otu053316 | 4 | Lumbricus | L_rubellus_L2 | |||||
O_V_M_8_23 | Otu021795 | 1 | Dendrobaena | Dendrobaena_unclassified | |||||
O_V_M_8_9 | Otu020357 | 1 | Aporrectodea | Aporrectodea_unclassified | |||||
O_V_M_8_9 | Otu021795 | 1 | Dendrobaena | Dendrobaena_unclassified | |||||
M_H_M_7_26 | Otu002940 | 30 | Lumbricus | L_rubellus_LP1 | M_H_M_7_26 | Otu003061 | 28 | Lumbricus | L_rubellus_LP1 |
M_HER_M_9_20 | Otu020357 | 10 | Aporrectodea | Aporrectodea_unclassified | M_HER_M_9_20 | Otu023121 | 8 | Aporrectodea | Aporrectodea_unclassified |
O_G_M_8_9 | Otu013040 | 25 | Aporrectodea | A_caliginosa_L2 | O_G_M_8_9 | Otu014656 | 21 | Aporrectodea | A_caliginosa_L2 |
O_HAC_M_5_3 | Otu013040 | 9 | Aporrectodea | A_caliginosa_L2 | O_HAC_M_5_3 | Otu014656 | 4 | Aporrectodea | A_caliginosa_L2 |
O_HAU_M_8_9 | Otu002940 | 24 | Lumbricus | L_rubellus_LP1 | O_HAU_M_8_9 | Otu003061 | 21 | Lumbricus | L_rubellus_LP1 |
O_HAU_M_9_20 | Otu013040 | 8 | Aporrectodea | A_caliginosa_L2 | O_HAU_M_9_20 | Otu014656 | 7 | Aporrectodea | A_caliginosa_L2 |
O_HE_M_6_14 | Otu002940 | 40 | Lumbricus | L_rubellus_LP1 | O_HE_M_6_14 | Otu003061 | 38 | Lumbricus | L_rubellus_LP1 |
David
It looks like these are consensus taxonomies for each OTU. cluster.split
uses the output from classify.seqs
not the consensus taxonomies. I wonder if this is part of the problem.
Pat
Hi Pat
I ran analyses with taxlevel 6 and 7 comparing results when using concensus taxonomy ouputs from classify.otu and merge.otus. For both taxlevels, ouput from classify.otu performs way better with a lot more hits (ie species occurences in sites). So indeed there was a concensus taxonomy issue, when using merge.otus. Why do I loose that much hits merging OTUs with the same taxonomy ? Did I make a mistake when I used it ?
Thanks for your help
David
To be honest, you might be the first person to ever use merge.otus and I’m not really sure what it does
Please use cluster.split as described in the MiSeq SOP. That is how we intend it to be used.
Pat
ok , it is supposed to “combine OTUs based on taxonomic assignment.” I see Sarah Westcott implemented it, I’ll try to contact her. I probably used it badly.
My goal was, for big datasets, to reduce the size of the file stitching the shared and taxonomy files for downstream analyses. It allowed to run them on a regular machine instead of a server with lots of ram.
One last question on the taxlevel thing (doing it according to the SOP, no merging): comparing concensus taxonomy from taxlevel 6 and 7, level 7 performs mostly better but in a few cases I have matches in level 6 that I don’t in 7. Any idea why ?
Thanks
David
What do you mean by performs “mostly better”? If you still have questions about the algorithm, I’d encourage you to check out the papers describing the method and its performance:
https://journals.asm.org/doi/10.1128/mspheredirect.00073-17
https://journals.asm.org/doi/10.1128/aem.02810-10
If you want to combine OTUs by taxonomy, then you’d be better off using the phylotype command to create phylotype-based OTUs. There wouldn’t be any need to go through the de novo OTU generation process.
Pat
I meant that I generally obtain more OTUs matches among sites with taxlevel=7 than with 6. But, in a few cases, I obtain matches in 6 that I don’t get in 7 which I find puzzling .
Thanks for the papers and also the phylotype advice, I’ll give it a go.
David
This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.