Cluster.split for finding Salmonella

First, here is why I am using this command.

I have Salmonella in a positive control.
Using the trad SOP, I have Salmonella after classify.seqs.
But when I classify my OTU, they are not there.

The only way to find them is to cluster at 0.005 (they get merged at 0.01).

I am not a fan of these almost ASV and not a fan of dada2 also (loosing almost all my sequences and gives perfect positive control which is technically impossible to have a pure positive control without any contamination at all).

So, I went for cluster.spli to make OTU after spliting by taxonomy at the genus level, so that I can keep my Salmonella (well I loved them).

This is the commands I ran and what I got.
"
mothur > dist.seqs(fasta=phytosynthese2021.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta,cutoff=0.03)

mothur > cluster.split(column=current, count=current, taxonomy=current, splitmethod=classify, taxlevel=6, delta=0, iters=300,cutoff=0.03)

label cutoff numotus tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score
0.03 0.03 163578 441 1.33565e+10 29 4.90239e+07 8.996e-06 1 0.9383 0.9963 0.9383 0.9963 0.002899 1.799e-05

"

So there is no clustering as I get 163 578 OTU from 163 741 unique.

The weird ting is when I run this command only in my positive control, I am going from 350 uniques to 179 OTU…

More over, when I run mke.shared, I get a weird output (why does the number of sequences to not match??) and even though the log say it made a .shared file, I do not see it!

"
mothur > make.shared(list=current,count=current,label=0.03)
Using phytosynthese2021.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table as input file for the count parameter.
Using phytosynthese2021.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.list as input file for the list parameter.
[ERROR]: M03992_423_000000000-CK27V_1_1113_10438_12959 is in your groupfile and not your listfile. Please correct.
[ERROR]: M03992_423_000000000-CK27V_1_2104_13401_20983 is in your groupfile and not your listfile. Please correct.
[ERROR]: M03992_423_000000000-CK27V_1_2107_21552_19489 is in your groupfile and not your listfile. Please correct.
[ERROR]: M03992_423_000000000-CK27V_1_2112_18236_19445 is in your groupfile and not your listfile. Please correct.
Your group file contains 163741 sequences and list file contains 163737 sequences. Please correct.

Output File Names:
phytosynthese2021.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.shared

"

So well, any thought?

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

The classify.otu command finds the consensus taxonomy for each OTU. If the positive control contains 350 sequences out of 163,741, then it is likely they are not the dominate taxonomy in the OTUs.

So there is no clustering as I get 163 578 OTU from 163 741 unique.

The weird ting is when I run this command only in my positive control, I am going from 350 uniques to 179 OTU…

The opti method uses metrics to find the best fit for each sequence. When you add or remove sequences from the dataset, the “best” location may change.

The error with make.shared looks strange, what version of mothur are you running? The output file names for the make.shared commands should not be created if an error occurs. Mothur did not complete the command or create the shared file. Sorry for the confusion.