Dear Pat,
I think I do use the frequency of the reads as I use the count input in the various commands.
This is the script I’m using (I start right after picking the representative set from my OTU table):
muscle -in stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.0.03.rep.fasta -out realignedRepSetMuscle.fasta -maxiters 2
PREFIX=realignedRepSetMuscle
FASTA=realignedRepSetMuscle.fasta
COUNT=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.0.03.rep.count_table
TAXONOMY=path_to_folder/trainset16_022016rdp/trainset16_022016.rdp.tax
REFERENCE=path_to_folder/trainset16_022016rdp/trainset16_022016.rdp.fasta
TAXCUTOFF=80
NPROC=4
for i in {1…20}; do
mothur “#set.dir(input=path_to_folder, output=path_to_folder); classify.seqs(fasta={FASTA}, count={COUNT}, taxonomy={TAXONOMY}, reference={REFERENCE}, cutoff={TAXCUTOFF}, processors={NPROC}); remove.lineage(fasta={FASTA}, count={COUNT}, taxonomy=${PREFIX}.rdp.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota); summary.tax(taxonomy=current, count=current); dist.seqs(fasta=current, cutoff=0.03); cluster(column=current, count=current); make.shared(list=current, count=current, label=0.03); classify.otu(list=current, count=current, taxonomy=current, label=0.03); get.oturep(column=current, list=current, count=current, fasta=current)” &&
muscle -in realignedRepSetMuscle.pick.opti_mcc.0.03.rep.fasta -out realignedRepSetMuscle.fasta -maxiters 2 &&
COUNT=${PREFIX}.pick.opti_mcc.0.03.rep.count_table;
done
Would this make sense?
I am now also reading something else: this paper https://onlinelibrary.wiley.com/doi/full/10.1111/1755-0998.12700 suggests to remove singleton reads (not OTUs!) during early stages of analysis. Maybe this is actually a more efficient approach than mine? Would you think this makes sense and could help reducing this substantial inflation one often gets? And if one wants to do this during the mothur pipeline, when should one do it and with which command? I was thinking this could be done right after calling the first unique.seqs (after making contigs and screening sequences) but not sure what command could do this?
Thanks a lot for your kind help!
Best,
Joanito