Hello,
I am new to metagenomics, and I am having a similar issue as that described in this past post. I am using mothur v. 1.48.0.
I am following the SOP, and I have run into no issues with creating contigs, screening out the sequences that have the incorrect length/ambiguities, and isolating the unique sequences.
I am trying to align my sequences to the SILVA 138.1 reference using:
mothur > align.seqs(fasta=shrub.trim.contigs.good.unique.fasta, reference=silva.nr_v138_1.pcr.fasta, processors=1)
, where silva.nr_v138_1.pcr.fasta is my reference file following the pcr.seqs command as described in the SOP (using mothur > pcr.seqs(fasta=silva.nr_v138_1, start=11895, end=25318, keepdots=F)
).
After running this command and trying to see the summary of the aligned output, I keep receiving errors saying that the fasta file contains fewer reads than the original count table (about 600,000 fewer reads). This occurred initially when I was defaulting to 8 processors, and then it kept occurring after I changed it to 1 or 2 processors as was recommended in past posts.
Here is the code and output that I’m using/receiving:
mothur > align.seqs(fasta=shrub.trim.contigs.good.unique.fasta, reference=silva.nr_v138_1.pcr.fasta, processors=1)
“It took 3930 secs to align 952556 sequences.
[WARNING]: 81 of your sequences generated alignments that eliminated too many bases, a list is provided in /Volumes/NO NAME/RMultiflora/shrub.trim.contigs.good.unique.flip.accnos.
[NOTE]: 35 of your sequences were reversed to produce a better alignment.
It took 3930 seconds to align 952556 sequences.”
and
mothur > summary.seqs(fasta=shrub.trim.contigs.good.unique.align, count=shrub.trim.contigs.good.count_table, processors=8)
“[ERROR]: Your count file contains 952556 unique sequences, but your fasta file contains 318861. File mismatch detected, quitting command.”
(Just more information in case it’s relevant for diagnostics/troubleshooting.)
I noticed that my shrub.trim.contigs.good.unique.align file is 4.29GB, which is the same size as the original silva.nr_v138_1 file (4.28GB).
When I open the shrub.trim.contigs.good.unique.align_report file, I see a table with lots of rows of entries with the following columns: QueryName, QueryLength, TemplateName, TemplateLength… This file looks promising, but it may be missing entries if the error code is correct.
Any advice on what I should do? Let me know if there is any more information that I could provide to help.