I am revising the mothur pipeline I used for my 16S rRNA pyrosequencing data and I have the following doubt: I decided to remove from my dataset those sequences with length less than 250 bp in order to improve taxonomic classification. To do so, I used the following command:
I noticed thought that by using the command above I removed sequences only from the fasta file and not from the group file. I believe this may be tricky for downstream analyses. For example, when I wanted to normalize the number of sequences in each of my samples, I noticed that in the group file I still have all my sequences. In other words, I did not remove sequences shorter than 250 bp from my group file only from my fasta file. Therefore, would I be normalizing my data for a number greater that the one it should be?
Would any of you out there have any comment or suggestion on how to solve that?