pre.cluster removes >90 % of sequences?

Hello

I have used your pipeline on my 16S rRNA data. I am new in this field.
I would very much like to receive feedback here, if I have done something wrong.
When I used this command:
mothur > pre.cluster(fasta=stability.file.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.file.trim.contigs.good.unique.good.filter.count_table, diffs=2), I loose > 90 % of the sequences.

An example is shown here (from my mothur.logfile):
Processing group cDNA1:
2759 165 2594
Total number of sequences before pre.cluster was 2759.
pre.cluster removed 2594 sequences.

It took 0 secs to cluster 2759 sequences.

Processing group cDNA10:
5236 378 4858
Total number of sequences before pre.cluster was 5236.
pre.cluster removed 4858 sequences.

It took 0 secs to cluster 5236 sequences.

Processing group cDNA11:
1851 221 1630
Total number of sequences before pre.cluster was 1851.
pre.cluster removed 1630 sequences.

It took 0 secs to cluster 1851 sequences.

Processing group cDNA12:
10333 673 9660
Total number of sequences before pre.cluster was 10333.
pre.cluster removed 9660 sequences.

It took 1 secs to cluster 10333 sequences.

Why have so many sequences been removed?

Thank you very much!

That’s not unexpected, those sequences are still in the count table so will go into your final OTU table

Great to hear, thank you very much :slight_smile:

You can double check by running summary.seqs with the fasta and count files as input using the files from before and after pre.cluster.

Pat