Hi all,
a collague and me have been running mothur analysis for fungal ITS2 region MiSeq sequence data. We have worked independently on the same dataset, and the aim is to make our own ‘Fungal SOP for MiSeq data’. As we have to mix a bit our old 454 protocoll with the bacterial MiSeq SOP, we have had some mistakes during the process but managed to finish with pretty similar results. However, we obviously should have exactly the same results, not only ‘pretty similar’…
Well, we have found a difference in the steps where the chimeras were removed. We had the following steps until remove.seqs:
mothur > make.contigs(file=stabilityDry.files, processors=8)
mothur > pcr.seqs(fasta=stabilityDry.trim.contigs.fasta, group= stabilityDry.contigs.groups, oligos=oligo_file.txt, pdiffs=2)
mothur > screen.seqs(fasta=stabilityDry.trim.contigs.pcr.fasta, group=stabilityDry.contigs.pcr.groups, maxambig=0, maxhomop=8, maxlength=350)
mothur > unique.seqs(fasta=stabilityDry.trim.contigs.pcr.good.fasta)
mothur > count.seqs(name=stabilityDry.trim.contigs.pcr.good.names, group=stabilityDry.contigs.pcr.good.groups)
mothur > chimera.uchime(fasta=stabilityDry.trim.contigs.pcr.good.unique.fasta, count=stabilityDry.trim.contigs.pcr.good.count_table, dereplicate=t, processors=10)
mothur > remove.seqs(fasta=stabilityDry.trim.contigs.pcr.good.unique.fasta, accnos=stabilityDry.trim.contigs.pcr.good.unique.denovo.uchime.accnos, count=stabilityDry.trim.contigs.pcr.good.denovo.uchime.pick.count_table)
My colleague had command ‘pdiffs=2’ in make.contigs, I did not have that. However, I had command ‘maxhomop=8’ in screen.seqs while my colleague did not have that. These were the only differences in the mothur commands.
My output after chimera.uchime was:
stabilityDry.trim.contigs.pcr.good.denovo.uchime.pick.count_table
stabilityDry.trim.contigs.pcr.good.unique.denovo.uchime.chimeras
stabilityDry.trim.contigs.pcr.good.unique.denovo.uchime.accnos
and after remove.seqs was:
[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.
Removed 3897 sequences from your fasta file.
Removed 0 sequences from your count file.
Output File Names:
stabilityDry.trim.contigs.pcr.good.unique.pick.fasta
stabilityDry.trim.contigs.pcr.good.denovo.uchime.pick.pick.count_table
My colleagues output after chimera.uchime was:
Output File Names:
Naoki_stability_file_Dry.trim.contigs.pcr.good.unique.denovo.uchime.chimeras
Naoki_stability_file_Dry.trim.contigs.pcr.good.unique.denovo.uchime.accnos
She did not get any new count_table file although I did!
Her output after remove.seqs:
[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.
#Removed 4205 sequences from your fasta file.
#Removed 490510 sequences from your count file.
In other words, my colleague did not get a new count_table file after chimera check and after remove.seqs, she ‘lost’ almost half a million sequences from count file (I did not loose any) but our removed fasta-sequences were in the same range.
Now to the question: can the differences in the previoius commands (pdiff=2 and maxhomop=8) cause these differences in the chimera check and is it therefore recommended to have/not have these commands (pdiff=2 and maxhomop=8) before chimera.uchime and remove.seqs?
Thanks a lot for any comments,
Jussi Heinonsalo
University of Helsinki, FInland