I have run through the following process on my sequences:
pre.cluster(fasta=@.unique.%, name=@.names, diffs=2)
where @ is the full filename of the sequence file.
The cluster command produces, naturally enough, a rabund file. I started looking at this file to figure out how many of my OTUs were singleton OTUs, i.e. OTUs that were formed on the basis of one single sequence. In the process I discovered that it seems like the rabund file reports too many sequences. As an example:
The file control.fsa contains 48452 fasta sequences, when I count the number of > in the file. This is also the number I get when I get it into mothur and do the first summary.seqs on it.
Now I run through the commands for this file as described above. I then open the resulting rabund, and as a check I sum up the numbers from field no 3 and onwards. This should, if I have understood the documentation correct, represent all the sequences in the data set. However, in this case I get varying numbers - on the first line, the one that begins with unique, I in this case get the number 49806. Note, if I access any other line, I get another different number than that again.
Have I misinterpreted what the rabund file reports?