Classify.seqs with Pacbio/long reads not classifying any taxa

Hello all,

I have four PacBio full length 16S rRNA reads from an environmental sample, targeting Archaea in the dataset. I have made progress using mothur (version 1.48.1, on Mac & while using a HPC cluster on command line) for these sequences, but am encountering trouble at the classify.seqs() step of the pipeline. For all of my sequences, the system cannot classify any taxa with Silva v138.2– which is highly unusual (some I would expect– but not all!). I also tried it with the other available repositories (RDP/greengenes) but I received the same result. My intuition is that there may be a mismatch between my reads and what the classifiers are searching for, but I’m not sure. I have also tried to redownload the classifiers several times as suggested in other postings on this forum but it does not fix the issue. I have been following the MiSeq SOP loosely as it applied to these reads, but please let me know if something seems off.

Here are the links to logfiles/code pertaining to this, and an example fasta file from sample_22.

Any input is greatly appreciated. I am happy to provide more information/files as needed. Thank you!

-Meghan

It looks like you’re creating and using a v4 specific reference file rather than covering the region that you are sequencing. I don’t know what region you are sequencing, but it appears that you expect the reads to be longer than 850 nt long. When you do the alignment against the v4 region a lot of bases are getting tossed. You’ll see this in the output from align.seqs and the subsequent summary.seqs. FWIW, you should run screen.seqs after running align.seqs based on alignment positions.

I’d encourage you to adjust the region in the reference file or to use the entire reference file without making it specific to one region. You can make it specific to one region by following the instructions here:

Pat