mothur

Summary.seqs() after performing remove.seqs() giving me different values

I’m trying to analyze fungal ITS2 sequences from Sanger sequencing & am following along with a protocol that @kmitchell shared with me in this post I made about getting started earlier this year & I have hit a snag that doesn’t make sense.

My approach thus far has been to pretty much mimic what was in the protocol mentioned above:

summary.seqs(fasta=seqs.txt)

unique.seqs(fasta=seqs.txt)

count.seqs(name=seqs.names, group=group.txt)

pre.cluster(fasta=current, diffs=2, count=current)

summary.seqs(fasta=current, count=current)

chimera.vsearch(fasta=current, count=current, dereplicate=t)

remove.seqs(fasta=current, accnos=current, count=current)

summary.seqs(fasta=current, count=current)

When I get to the summary.seqs(fasta=current, count=current) portion at the end of this code block I get two errors:

[ERROR]: 'LCl_84' is not in your name or count file, please correct.[ERROR]: 'HCO3_120' is not in your name or count file, please correct.

[ERROR]: Your count file contains 94 unique sequences, but your fasta file contains 76. File mismatch detected, quitting command.

The interesting part about this is when I rerun that command, I get different lengths on the error message for my FASTA file. For example, the first error message I received said that my count file had 94 sequences, but my FASTA file had 76. It quit the command due to this mismatch. When I run the command again, I get this error:

[ERROR]: 'HCO3_120' is not in your name or count file, please correct.[ERROR]: 'LCl_84' is not in your name or count file, please correct.

[ERROR]: Your count file contains 94 unique sequences, but your fasta file contains 53. File mismatch detected, quitting command.

And, if I run the same summary.seqs() command once more, I get yet another value in the FASTA error:

[ERROR]: 'HCO3_120' is not in your name or count file, please correct.[ERROR]: 'LCl_84' is not in your name or count file, please correct.

[ERROR]: Your count file contains 94 unique sequences, but your fasta file contains 62. File mismatch detected, quitting command.

Why are the sample names & number of unique sequences consistent with the .count_table file, but highly variable with the number of sequences in the FASTA file.

Also, when I manually inspect the FASTA file I count 97 unique sequences.

I’m not really sure what is going on under the hood of the de novo clustering step, or the chimera call-out step to make this error occur, and why it has variable FASTA sequences each time I run it. Has anyone run into this issue before?

Is this the full list of commands? I do screen.seqs before unique.seqs which would make the group file group.good.groups

I missed a command or two in my OP, and have gotten further along in the pipeline, but now I’m running into a new issue. I can get to assessing alpha diversity with

summary.single(shared=current, 
calc=nseqs-sobs-coverage-shannon-shannoneven-invsimpson, 
subsample=T)

but when I try to analyze beta diversity with

dist.shared(shared=current, calc=braycurtis-jest-thetayc, subsample=T)

I get the error message:

Using 4 processors.
You have not provided enough valid groups.  I cannot run the command.

Not sure why I don’t have enough groups for beta diversity when I didn’t get that error message with alpha diversity.

Any insights?

How many groups (samples) do you have? You can run alpha diversity with one sample since it’s diversity within a sample, beta diversity requires at least 2 samples. What does your shared file look like?

summary.seqs(fasta=current)

                Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        1       52      52      0       3       1
2.5%-tile:      1       52      52      0       3       1
25%-tile:       1       315     315     0       5       7
Median:         1       317     317     0       5       14
75%-tile:       1       325     325     0       6       20
97.5%-tile:     1       378     378     0       8       26
Maximum:        1       378     378     0       8       26
Mean:   1       291     291     0       5
# of Seqs:      26

It took 1 secs to summarize 26 sequences.

EDIT
This is verified by this fasta file:

>HCO3_71	Otu01|49|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTGTGGTATTCCGCAGGGCATGCCTGTTCGAGCGTCATTTCAACCCATCAAGCTCACGCTTGGTCTTGGGGCCTGCGGTTTCGCAGCCTCTAAACTCAGTGGCGGTGCGATTGAGCTCTGAGCGTAGTAATTTTTCTCGCTATAGGGTCTCGGTCGTGACTTGCCAGTAACCCCCAATTTTTATCAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>SW_21	Otu02|31|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTCTGGTATTCCGGGGGGCATGCCTGTTCGAGCGTCATTACAACCCTCAAGCTCAGCTTGGTATTAGGCTTCACCCTTAGGGGCGGGCTTTAAAATCAGTGGCGGTGCCATTCGGCTTCAAGCGTAGTAATTTTCTCGCTTTGGAGATCGGGTGTGTGTTTGCCAACAACCCCATATCTTTTAAAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCO3_64	Otu03|4|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTCTGGTATTCCGGGGGGCATGCCTGTTCGAGCGTCATTACAACCCTCAAGCTCTGCTTGGTATTAGGTTTCACCCGTAAGGGCGTACCGTAAAACTAGTGGCGGTGCCATTCGGCTTCAAGCGTAGTAATTCTTCTCGCTTTAAACACCGGTTGAGTGCTTGCCAACAACCCCAATTTTTATCAAAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCl_106	Otu04|3|mothurGroup
TGGTTCTGGCATCGATGAAAAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAAAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCATTAGTACTCTAGTGGGCATGCCTGTTCGAGCGTCATTTCAACCCCTAAGCCTAGCTTAGTGTTGGGAGACTACTGTAGCGGTGCTACATGGAAGGCCACCCTGAAAGATGGGTCGGTTTACCCTGTAGCTACCCTGTAGCTCCTTAAAGCCAGTGGCGGAGACACGGAGTCCTCTGAGCGTAGTAATTATTTCTCGCTTTTGTAGGTTCTGTGGCTTTTGCCATTAAACCCCCAATTTTTAATGGTTGACCTCGGATCAGGTAGGAATACCCGCTGA
>SW_67	Otu05|3|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAACGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCATTAGTATTCTAGTGGGCATGCCTGTTCGAGCGTCATTTCAACCCTTAAGCCTAGCTTAGTGTTGGGAGCCTACTGCTTTTACTAGCTGTAGCTCCTGAAATACAACGGCGGATCTGCGATATCCTCTGAGCGTAGTAATTTTTATCTCGCTTTTGACTGGAGTTGCAGCGTCTTTAGCCGCTAAATCCCCCAATTTTTAATGGTTGACCTCGGATCAGGTAGGAATACCCGCTGA
>SW_47	Otu06|3|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCGTGGTATTCCGCGGGGCATGCCTGTTCGAGCGTCATTACAACCCCTCAAGCCTCGGCTTGGTATTGGAGCATGCGGTCTCGCAGCTCCTAAACTCAGTGGCGGTGCCATCGAGCTCTGAGCGTAGTACATTTTCTCGCTATAGGGTCTCGGTGGTTGCTTGCCAATAACCCCCCATTTTTATCAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>HCO3_59	Otu07|3|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCGTGGTATTCCGCGGGGCATGCCTGTTCGAGCGTCATTTCAACCAATCAAGCCTCGGCTTGGTATTGGGGCCTGCGCCTGCGCAGCCCTTAAACCCAGTGGCGGTGCTATTGAGCTCTGAGCGTAGTAAATCTCCTCGCTATAGGGTCTCGGTAGTTGCTTGCCAACAACCCCCAAATTCTTTCAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCO3_74	Otu08|3|mothurGroup
TCGTGACTTGCCAGTAACCCCCAATTTTTATCAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCl_120	Otu09|3|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCATTAGTATTCTAGTGGGCATGCCTGTTCGAGCGTCATTTCAACCCTTAAGCCTAGCTTAGTGTTGGGAATCTACTGTATTGTAGTTCCTGAAATACAACGGCGGATCTGTAATATCCTCTGAGCGTAGTAATTTTTTTCTCGCTTTGGTTAGGTGCTGCAGCTCTCAGCCGCTAAACCCCCCAATTTTAATGGTTGACCTCGGATCAGGTAGGAATACCCGCTGA
>LCO3_55	Otu10|2|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCATTAGTATTCTAGTGGGCATGCCTGTTCGAGCGTCATTTCAACCCTTAAGCCTAGCTTAGTATTGGGAATCGACTTTACTGTCGTTCCTCAAATTCAACGGCGGATTTATAGCAATCTCTGAACGTAGTAATTTTTTTTCTCGTTTTTGAAATACTATAAACCTCAGCCGCTAAACCCCCAATTTCTTATGGTTGACCTCGGATCAGGTAGGAATACCCGCTGA
>LCO3_72	Otu11|2|mothurGroup
TGGTTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCTTGGTATTCCGAGGGGCATGCCTGTTCGAGCGTCATTACACCACTCAAGCTATGCTTGGTATTGGGCGTCGTCCCTAGTTGGGCGCGCCTTAAAGACCTCGGCGAGGCCACTCCGGCTTTAGGCGTAGTAGAATTTATTCGAACGTCTGTCAAAGGAGAGGAACTCTGCCGACTGAAACCTTTATTTTTCTAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCl_85	Otu12|2|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGGTGGTATTCCACCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAAGCCTGGCTTTGGTGTTGGAGGGATACCTGTAAAAGGGTACCCTCTGAAATTTAGTGGCGGGCTCGCTAGAATTTTGAGCGTAGTAGTTTTACCTCGTTTTTAAAGACTAGTGGGACTTCTTGCCGTAAAACCCCCCAACTTTCTGAAAATTGACCTCGGATCAGGTAGGAATACCCGCTGA
>HCl_107	Otu13|2|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGACAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGGTGGTATTCCGCCGGGCATGCCTGTTCGAGCGTCATTATAACCAATCATGCCTGGCATGGTGTTGGGGCATGCGTCTTCGCAGCCCTCAAAATCAGTGGCGGCGCCTGTAGGCTCTAAGCGTAGTAACTTCTCTCGCTATAGACGTCTGTGGGTAGCTTGTCAGAATTCCCCCCAATTTTTCAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCO3_116	Otu14|2|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGAAAAGTAGTGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCTTGGTATTCCATGGGGCATGCCTGTTCGAGCGTCATTTGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCCCGCTTTGTGCGTGGAC
>LCO3_17	Otu15|1|mothurGroup
TGGTTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCTTGGTATTCCGAGGGGCATGCCTGTTCGAGCGTCATTACACCACTCAAGCACAGCTTGGTATTGGGCAACCGCCCCCGCCAGTCGGGGACGCGCCTCAAACACCTCGGCGGAGCCTCACCGGCTTTGGGCGTAGTAGATTTTCTAAACGTCCTTTAACGGAGATGGTTCCATTGCCGACTGAAGCCTTTTATTTTTCTAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCl_23	Otu16|1|mothurGroup
TGGCTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAGTGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTTTGGTATTCCTTAGGGCATGCCTGTTCGAGCGTCATTTCAAAATTCAAGCTCAGCTTGGTGATGGGCGTCTGTCCCGCCTCCGCGCGCGGACTCGCCTCAAAAGTAGTTGGCAGCTCCCTTATCGGCACTGAACGCAGCAAATTTGCGGGACGCACCGAAGAAAGGGGCTTACCAGTAAGCAAACCACCCCAGATTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCO3_14	Otu17|1|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAACGCGATAGGTATTGTGAATTGCAGAATTTAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTCTGGTATTCCGGAGGGCATGCCTGTCCGAGCGTCATTTCAACCCTCCAGCCCGGCTGGTGTGTTGGGCCTTCGTCCCCCGGGACGGGCCCCAAAGACAGGGACGGCGCCGCGTCTGACCCCCGAGCGTATGGGACCTTCGTCACTGCTCGCAAGGGAAAGCCGGCCCGGTCCAACCCCCCATCTATTTTTCCAGGTGGACCTCGGATCAGGTAAGGATACCCGCTGA
>LCl_17	Otu18|1|mothurGroup
TGGCTCTCGCATCGATGAAGAACGCAGCGAAATGTGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCCTGGTATTCCGGGGAGCATGTCTGTTTGAGTGTCATGAACTCTTCAACCCATCAATTTCTTGTAGATTGACTGGTGTTTGGATTTTGAATGTTGCTGGTCCTTGGACGTAGCTCATTCGTAATATATTAGCATCTCTAATTCGAACTCGGATTGACTCAGTGTAATAGACTATTCGCTGAGGACACTTTTATTAGTGGCCGAATGAGATAATTGTAGACGCTTCTAACCCCTATAGTCAACTTTGTATTAGACCTCAGATCAGGCAGGATTACCCGCTGA
>LCO3_96	Otu19|1|mothurGroup
CCCCCATATATTTTATCAGGTGGACCTCGGATCAGGTAAGGATACCCGCTGA
>SW_113	Otu20|1|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCATTAGTATTCTAGTGGGCATGCCTGTTCGAGCGTCATTTCAACCCTTAAGCCTAGCTTAGTGTTGGGAATCTACTTCTCTTAGGAGTTGTAGTTCCTGAAATACAACGGCGGATTTGTAGTATCCTCTGAGCGTAGTAATTTTTTTCTCGCTTTTGTTAGGTGCTATAACTCCCAGCCGCTAAACCCCCAATTTTTTGTGGTTGACCTCGGATCAGGTAGGAATACCCGCTGA
>SW_56	Otu21|1|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCGTGGCATTCCGCGGGGCATGCCTGTTCGAGCGTCATTATGACCAATCCAGCTCGCTGGGTCTTGGGCACCGCCGCCTGGCGGGCCTCAAAATCAGTGGCGGTCCGGCCGGGCTCTGAGCGTAGTACATCTTCTCGCTACAGGGTCCCGGGCGGCACTGGCCAACAACCCCCAATCTTTCACAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCl_117	Otu22|1|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCGTGGTATTCCGCGGGGCATGCCTGTTCGAGCGTCATTTCAACCCCTCAAGCCCCGGCTTGGTCTTGGGGCCTGCGGTTCCGCAGCCCTTAAACGCAGTGGCGGTGCGATCGAGCTCTGAGCGTAGTAATACTCCTCGCTATAGAGTCCCGGTCGTGGCCTGCCAACAACCCCCCATTTTTTCAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>SW_19	Otu23|1|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTCTGGTATTCCGGAGGGCATGCCTGTTCGAGCGTCATAATGACCAACTCACCCCCGTGGTGGACTTGGAG
>LCO3_76	Otu24|1|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTGTGGTATTCCGGGGGGCATGCCTGTTCGAGCGTCATTACAACCCTCCAGCTCTGCTTGGGATTATGCTTCCTGCTTATGGGCGGGCTTTAAAATCAGTGGCGGTGCCATTCAGCTTCGAGCGTAGTAATTTCTTCCCTTTGGAGACCGCGTGTGTGTTTGCCCGTAACCCCATATTTTTTAAAGGGTGACCTCCGAACAAGGAAGGATACCCGCTGA
>LCl_32	Otu25|1|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAGTGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTTCGGTATTCCGTTGGGCATGCCTGCTCGAGCGTCATTTAAACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTTCCGCCTAGTGCGTGGACTCGCCTTAAATTCATTGGCAGCCGGTAAGTTGGCTTCGTGCGCAGCACATTGTGTCGCGATCCAGTCTACCTCCCTCCATCAAGCCTCTTTTTTACTTTGACCTCGGATCAGGTAGGGATACCCGCTGA
>LCl_15	Otu26|1|mothurGroup
TGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAGTGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTTTGGTATTCCAAAGGGCATGCCTGTTCGAGCGTCATTTGTACCCTCAAGCTTTGCTTGGTGTTGGGCGTCTTGTCTCTAGCTTTGCTGGAGACTCGCCTTAAAGTAATTGGCAGCCGGCCTACTGGTTTCGGAGCGCAGCACAAGTCGCACTCTCTAT

you only have one sample-mothurGroup

I probably messed up somewhere along the way. I’ll look more into this tomorrow & see if I can identify where the problem lies. I’m sure I’m just not understanding something pretty basic about the process. More homework to do on this!

Thanks for the quick replies! I’ll let you know what I figure out.

EDIT: I started with 429 sequences across 5 sample groups. I’m not sure why it has pared everything down to just one sample.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.