Hi guys,
I have clustered my 16S V4 amplicon sequences from 96 different samples according to the Mothur MiSeq SOP, and I now want to remove singletons (OTUs consisting of maximum one sequence (read) in all of the 96 samples) from my dataset. I consider this a technical question, let’s leave the discussion whether it’s right or not until later.
After clustering 149,352 pre-clusters at 97% similarity level, I have 70,376 OTUs, representing about 12 million reads.
From the user manual and forum I understand that there are at least two strategies to remove singletons (after clustering):
- split.abund using fasta file, list or count file, label=0.03 and cutoff=1
- remove.rare using list file, count file, label=0.03 and nseqs=2
Using split.abund:
Should I use the count_table (generated before clustering), or the list (generated after clustering)?
LIST:
split.abund(fasta=xxx.fasta, [b]list[/b]=xxx.list, cutoff=1, label=0.03)
outputs an .abundant.list with 10,711 “abundant” OTUs (with >1 sequence).
OR
COUNT_TABLE:
split.abund(fasta=xxx.fasta, [b]count[/b]=xxx.count_table, cutoff=1, label=0.03)
outputs an .abund.count_table with 21,103 pre-clusters (not OTUs) with >1 read.
Is the list option compatible with the make.shared command? If so, how do I update my count_table for use in this command?
Using remove.rare
remove.rare(list=pilot.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=pilot.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, nseqs=2, label=0.03)
Here, I get only 6788 OTUs. This is much less than when performing split.abund with the list option.
Can anyone help me? Do anyone have a good way to remove singletons?
All answers appreciated!
Even