I was processing some data and I thought I should screen with a maxambig=5, because it seemed like an appropriate number of ambiguities. Then I wondered what was the basis for assigning this number… So, my question is : What is actually the definition for “ambiguous base call”? Thank you!
It is when the base caller on the sequencer or when making contigs doesn’t know whether the base should be an ATG or C. It’s hard to think of a reason why it would be anything but 0 for 99% of applications.
So, is it represented by a “N” in the sequence? If it is, is it only by “N” or it can be by some other letters, for instance : W (for A or T)?
Is there any way to substitute ambiguous base calls by a gap instead of deleting the entire read?
Thanks a lot for your explanations
The raw sequence has an actual basecall. In mothur we just turn it to an N. Because there are so many sequences in the typical run and the error are fairly random, including sequences with known errors will artificially inflate the number of unique sequences and generate a bunch of spurious OTUs