Creating this thread for my questions that arise when working through nanopore data because that could be useful for others. I’ll write up an SOP if the end result is worth others repeating.
Background. I’ve finally gotten my gridion and started sequencing. I have a few zymo mocks sequenced with nanopore’s 16s kit on R9.? (too high error to use mothur), 8f/1391R with nanopore PCR barcoded on R10.4, plus I’ve run them on MiSeq using standard v4 sequencing with EMP version of 515f/806R.
I’m trying to get the R10.4 data through mothur but I’m not super confident it’s going to give me anything useful. There are very few non-unique sequences which suggests too high error rates. This is still running (4 days and counting on 32 cores 500gb ram).
I’ve also pcr.seq selected out just v4 from this data and am trying to run mothur on that section to get a better idea of how this compares to illumina.
Finally, I’m going to repeat both these analyses on just the duplex data which is ~10% of the R10.4 run.
I’m presenting all this at ASM Microbe next month, so I have a deadline to get this done!
First few issues:
trimming data: I’m using PCR seqs with pdiffs/rdiffs = 3 and 5
Here’s the batch that I’m trying to run. I’m sure that I’ll have to edit this as I find more errors. I’m on a high mem node, using 500gb
mothur "#summary.seqs(fasta=oct23nanoporeRecall.fasta, processors=32);
pcr.seqs(fasta=current, count=oct23nanoporeRecall.count_table, oligos=oligos8f1391r.txt, pdiffs=3, rdiffs=3, checkorient=T);
summary.seqs(fasta=current, count=current);
screen.seqs(fasta=current, maxambig=0, minlength=800, maxhomop=8);
unique.seqs(fasta=current);
summary.seqs(fasta=current, count=current);
align.seqs(fasta=current, reference=silva.nr_v138_1.align);
summary.seqs(fasta=current, count=current);
filter.seqs(fasta=current, vertical=T);
pre.cluster(fasta=current, count=current, diffs=14);
summary.seqs(fasta=current, count=current);
classify.seqs(fasta=current, count=current, reference=silva.nr_v138_1.align, taxonomy=silva.nr_v138_1.tax, cutoff=80);
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Eukaryota);
summary.tax(taxonomy=current, count=current);
dist.seqs(fasta=current, countends=F, cutoff= 0.03, processors=16);
cluster(column=current, count=current, method=opti);
summary.seqs(processors=32);
make.shared(list=current, count=current);
classify.otu(list=current, count=current, taxonomy=current);
get.oturep(fasta=current, count=current, list=current, method=abundance);
count.groups(shared=current);
rarefaction.single(shared=current);
summary.single(shared=current, calc=nseqs-sobs-coverage-shannon-shannoneven-invsimpson, subsample=10000);