After using the align.seqs command to align my sequences to the silva database, I use the screen.seqs command to filter the aligned sequences and provide the countable to get an updated sequence count.
The updated sequences count, however contains duplicate sequence names wherefore the unique.seqs command fails with “[ERROR]: Your count table contains more than 1 sequence named M01867_162_000000000-AFMW9_1_2119_6612_17173, sequence names must be unique. Please correct.”
I checked and it does contain this sequence twice (both singletons), but removing one copy just postponed the error to the next duplicated sequence. I tried running screen.seqs command on a single core but the error was the same.
I aligned the sequences to the silva database that I prepared as described in the tutorial (by predicting the e.coli fragment with the primers that I used, 341F and 805R, then aligning the fragment (without primers) to the silva.bacteria.fasta file and using the aligned sequence in pcr.seqs command.)
Ps.: (and unrelated to the question) I’m aware that my sequences do not fully overlap and that this can be regarded as a problem. To be honest, I haven’t been involved in the choice of primers. However, these primers are extremely widely used for environmental sequences, mostly because the are supposed to have a small phylogenetic bias and amplify both bacteria and archaea. I don’t want to imply that so many scientists can’t be wrong (sure they can) but It seems like there must be an upside, too? Or why are they so widely spread?