Upon reaching the stage in the mothur pipeline where I need to classify my sequences and remove undesirable lineages, mothur seems to remove all my sequences! I am not entirely sure what I am doing wrong, but I suspect it has something to do with my reference and taxonomy files.
[NOTE]: The count file should contain only unique names, so mothur assumes your
fasta, list and taxonomy files also contain only uniques.
Your taxonomy file contains only sequences from Chloroplast-Mitochondria-unknown
-Archaea-Eukaryota.
Your fasta file contains only sequences from Chloroplast-Mitochondria-unknown-Ar
chaea-Eukaryota.
Removing group: 5_PAW2 because all sequences have been removed. etc. etc. for each group
Your group file contains only sequences from Chloroplast-Mitochondria-unknown-Ar
chaea-Eukaryota.
In classify.seqs it looks like you are using the SILVA reference but in remove.lineage it looks like you’re using my RDP reference. Can you try running classify.seqs with the trainset files rather than the SILVA files and see what happens? Also, just to be clear, you did sequence 16S rRNA genes, right?
I’m sequencing eDNA samples of the cultured fish microbiota run through the MiSeq platform. . I should clarify, I have two datasets here, one of the ponds the fish live in, and another of the water samples taken from fish held in a bucket (before adding the fish and then afterwards). The sequences on the “remove.lineages error” thread are the latter fish dataset. The “warning notice from classify.otu” thread is referring to pond water samples.
With the pond water samples, excluding the aforementioned warning notices, I was able to get through the entire pipeline with virtually no issues. You can imagine that I am a little perplexed why the same procedure is not working with the fish samples.
Thank you for your help here, I am fairly new to using mothur, and with my project supervisor currently out of action, any help is appreciated.
At this step I had 25,927 sequences, all of them have been classified as unknown.
The input fasta file for the classify.seqs step (pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta) had sequences in the following format:
M01625_131_000000000-C4WTY_1_1111_18207_5125
TAC–GG-AG-GGT—GCG-A-G-C-G-T–T–AT-C-CGG-AA—TC-A-C-T–GG-GT–TT–A–AA-GG-GT-AC-G-TA-G-G-C-G–G–T-TA-A-T-T-----AA etc. etc.
Could the reason why mothur is mis-classifying sequences be that the format the sequences are presented is different between the input fasta and reference file?
I’m not sure what you mean by different formats. The gap characters are fine - they’re automatically removed from the input sequences before doing anything. When I run the sequence you posted with silva.bacteria.fasta, I get (I named the sequence “test”)…
test Bacteria(100);Bacteroidetes(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);
Can you email your input fasta file and the mothur logfile to mothur.bugs@gmail.com so we can take a look? Please include a link to this thread so we can report back.