Hi,
For the whole workflow? Ok, here it is. I’ve run this with both mothur 1.33.3. and 1.34.2, on our lab cluster, with 20 processors both times. The data is V1-V3, MiSeq. We often have more than one sequencing run which we combine, which was also the case here.
-Before mothur, I ran the data through cutadapt for primer removal & quality control.
-For each run separately:
make.contigs(file=file.files, processors=20)
screen.seqs(fasta=file.trim.contigs.fasta, group=file.contigs.groups, maxambig=0, maxlength=550)
unique.seqs(fasta=file.trim.contigs.good.fasta)
align.seqs(reference=silva.nr_v119.align, fasta=file.trim.contigs.good.unique.fasta)
(I just came to realize from talking with a colleague that there’s actually no reason to do this like I did here and start with separate file lists - he’s just handing mothur the whole list of all fastq files from different runs to start with and running it like that. Still, he’s also seeing the thing with the OTU numbering so this shouldn’t be the reason for it.)
-Concatenate .names, .align and .groups files from each run with cat for the rest of the workflow
-With combined files:
count.seqs(name=combined_files.names, group=combined_files.groups)
unique.seqs(fasta=combined_files.align, count=combined_files.count_table)
screen.seqs(fasta=combined_files.unique.align, name=combined_files.unique.count_table, maxhomop=8, start=1046, end=13127)
filter.seqs(fasta=combined_files.unique.good.align, vertical=T, trump=.)
pre.cluster(fasta=combined_files.unique.good.filter.fasta, count=combined_files.unique.good.count_table, diffs=4)
chimera.uchime(fasta=combined_files.unique.good.filter.precluster.fasta, count=combined_files.unique.good.filter.precluster.count_table, dereplicate=T)
remove.seqs(fasta=combined_files.unique.good.filter.precluster.fasta,accnos=combined_files.unique.good.filter.precluster.uchime.accnos)
classify.seqs(fasta=combined_files.unique.good.filter.precluster.pick.fasta, count=combined_files.unique.good.filter.precluster.uchime.pick.count_table, reference=trainset10_082014.pds.fasta, taxonomy=trainset10_082014.pds.tax, cutoff=70)
remove.lineage(fasta=combined_files.unique.good.filter.precluster.pick.fasta, count=combined_files.unique.good.filter.precluster.uchime.pick.count_table, taxonomy=combined_files.unique.good.filter.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
cluster.split(fasta=combined_files.unique.good.filter.precluster.pick.pick.fasta, count=combined_files.unique.good.filter.precluster.uchime.pick.pick.count_table, taxonomy=combined_files.unique.good.filter.precluster.pick.pds.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15)
make.shared(list=combined_files.unique.good.filter.precluster.pick.pick.an.unique_list.list, count=combined_files.unique.good.filter.precluster.uchime.pick.pick.count_table, label=0.03)
classify.otu(list=combined_files.unique.good.filter.precluster.pick.pick.an.unique_list.list, count=combined_files.unique.good.filter.precluster.uchime.pick.pick.count_table, taxonomy=combined_files.unique.good.filter.precluster.pick.pds.wang.pick.taxonomy, label=0.03)