But this generates another question, I would really like to understand what happens in trim.seqs. Is trim.seqs working sequentially? First, looking for barcodes, then, if found, removes them (in the solution you proposed already with linker), and then looks for primers in the sequence resulting from step 1? If yes, is there any way to look for primer in first n bp of the remaining sequence? I would think that any nucleotides between primer and linker are synthesis problem, and the whole read should better be disregarded. Still, some mismatches within primer sequence are ok since we know the primers are not really universal.
Sorry for the late response, I missed your reply post.
But this generates another question, I would really like to understand what happens in trim.seqs. Is trim.seqs working sequentially? First, looking for barcodes, then, if found, removes them (in the solution you proposed already with linker), and then looks for primers in the sequence resulting from step 1?
Yes, you are correct. The trim.seqs command looks for linkers, then barcodes, then spacers, then primers. If the barcode can’t be found then nothing is trimmed, this will cause the spacer and primer to fail as well.
If yes, is there any way to look for primer in first n bp of the remaining sequence? I would think that any nucleotides between primer and linker are synthesis problem, and the whole read should better be disregarded. Still, some mismatches within primer sequence are ok since we know the primers are not really universal.
In the trim.seqs command, mothur will scrap the sequence if any part of the trimming fails.
I have a similar problem. I added the linker to the barcode, which was a good improvement. However, I still find some unexpected assignments. Two examples. In my pool, I have no sample corresponding to these barcodes (the linker is AG). However, mothur (with bdiffs=1) assigned these sequences:
ex 1:
barcode
ACGTAGCTAG
seq
ACGTCGCTGGTTACCGCGG…
ex 2:
barcode
CAGTGAGAAG
seq
CAGTGAAGGTTACCGCGGCTG…
The barcodes are designed in order to require a minimum of two differences between samples. Over 8 position (10 with the linker), here we have far many differences. For info, the primer starts at GGTTACC. Maybe, it would be more stringent to read the barcode without allowing indels. Indels allow too many differences. What do you think?
In advance, thanks for help.
Thanks for your answer, and for spending time on these strange results.
I am using version 1.31.2
Would you need more info, more examples, to find the origin of the problem?