Working out proceedure for nanopore 16s

Creating this thread for my questions that arise when working through nanopore data because that could be useful for others. I’ll write up an SOP if the end result is worth others repeating.

Background. I’ve finally gotten my gridion and started sequencing. I have a few zymo mocks sequenced with nanopore’s 16s kit on R9.? (too high error to use mothur), 8f/1391R with nanopore PCR barcoded on R10.4, plus I’ve run them on MiSeq using standard v4 sequencing with EMP version of 515f/806R.

I’m trying to get the R10.4 data through mothur but I’m not super confident it’s going to give me anything useful. There are very few non-unique sequences which suggests too high error rates. This is still running (4 days and counting on 32 cores 500gb ram).

I’ve also pcr.seq selected out just v4 from this data and am trying to run mothur on that section to get a better idea of how this compares to illumina.

Finally, I’m going to repeat both these analyses on just the duplex data which is ~10% of the R10.4 run.

I’m presenting all this at ASM Microbe next month, so I have a deadline to get this done!

First few issues:
trimming data: I’m using PCR seqs with pdiffs/rdiffs = 3 and 5

Here’s the batch that I’m trying to run. I’m sure that I’ll have to edit this as I find more errors. I’m on a high mem node, using 500gb

mothur "#summary.seqs(fasta=oct23nanoporeRecall.fasta, processors=32);
pcr.seqs(fasta=current, count=oct23nanoporeRecall.count_table, oligos=oligos8f1391r.txt, pdiffs=3, rdiffs=3, checkorient=T);
summary.seqs(fasta=current, count=current);
screen.seqs(fasta=current, maxambig=0, minlength=800, maxhomop=8);
summary.seqs(fasta=current, count=current);
align.seqs(fasta=current, reference=silva.nr_v138_1.align);
summary.seqs(fasta=current, count=current);
filter.seqs(fasta=current, vertical=T);
pre.cluster(fasta=current, count=current, diffs=14);
summary.seqs(fasta=current, count=current);
classify.seqs(fasta=current, count=current, reference=silva.nr_v138_1.align,, cutoff=80);
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Eukaryota);, count=current);
dist.seqs(fasta=current, countends=F, cutoff= 0.03, processors=16);
cluster(column=current, count=current, method=opti);
make.shared(list=current, count=current);
classify.otu(list=current, count=current, taxonomy=current);
get.oturep(fasta=current, count=current, list=current, method=abundance);
summary.single(shared=current, calc=nseqs-sobs-coverage-shannon-shannoneven-invsimpson, subsample=10000);

Thanks for doing this Kendra. Since you’re at a core facility that has done both the Kozich protocol and nanopore sequencing, you’ll likely be able to answer this. Starting with genomic DNA, what do you estimate the cost to be to generate sequence data and how many reads of sequence data are being generated?

We were able to get very good PacBio full length sequence data that was as good as our Kozich protocol, but it would have been prohibitively expensive to do at the same scale. In my ranking quality is more important than cost, but cost is still important.