Reduction in number of sequences

Hi community!!! I am working with 16S sequence data in MOTHUR.

  • Initially I had 526345 sequences. After all “screen.seqs” and “remove.seqs” steps when I was heading for “classify.seqs” step I was having around 406528 sequences. The rests were removed in each quality screening and chimera removal steps.

  • initially, # of unique seqs: 172219 and total # of seqs: 526345

  • after preclustering, # of unique seqs: 94052 and total # of seqs: 526345

  • after chimera removal, # of unique seqs: 50148 and total # of seqs: 406528

Is such reduction in sequence numbers normal or I am doing something wrong?

Thanks and Regards,
DC7

loosing 20%o of your sequences through quality checking is completely normal

2 Likes

Thanks for the information maa’m.

  • On a separate set of sequence I startted with 671453 sequences. Before chimera removal, I had 605417 sequences.After chimera removal, the number of sequence reduced from 605417 to 450840. That implies I have lost total 33% sequences of which around 26% sequences lost during chimera removal. Is it normal also?

  • Also during the remove.lineage step 260 sequences from fasta file and 1573 sequneces from count file was removed. Please make a comment on it also?

Thanks and Regards,

DC7

it is normal

the fasta only has unique sequences (so 260 unique were flagged). The count file is the total number of reads.