barcodes get mixed up

Hello,

These two barcodes seem to be undistinguishable (linker is underlined), when doing trim.seqs with bdiffs=1

ATCGTTCCAG
ACGTTCCTAG

They didn’t look similar to me, but I guess it’s like that?

ATCGTTCC
A-CGTTCCT

Is there any way to avoid it, while still allowing bdiffs=1, by say, requiring that barcode is directly followed by the linker, and ldiffs=0?

Many thanks,
Olga

Can you try adding the linker to the barcode?

ATCGTTCCAG
ACGTTCCTAG

with bdiffs=1, would align like:

A-CGTTCCTAG
ATCGTTCC-AG

which mothur would see as 2 diffs.

Thanks a lot!
This particular problem was fixed.

But this generates another question, I would really like to understand what happens in trim.seqs. Is trim.seqs working sequentially? First, looking for barcodes, then, if found, removes them (in the solution you proposed already with linker), and then looks for primers in the sequence resulting from step 1? If yes, is there any way to look for primer in first n bp of the remaining sequence? I would think that any nucleotides between primer and linker are synthesis problem, and the whole read should better be disregarded. Still, some mismatches within primer sequence are ok since we know the primers are not really universal.

Many thanks,
Olga

Sorry for the late response, I missed your reply post.

But this generates another question, I would really like to understand what happens in trim.seqs. Is trim.seqs working sequentially? First, looking for barcodes, then, if found, removes them (in the solution you proposed already with linker), and then looks for primers in the sequence resulting from step 1?

Yes, you are correct. The trim.seqs command looks for linkers, then barcodes, then spacers, then primers. If the barcode can’t be found then nothing is trimmed, this will cause the spacer and primer to fail as well.

If yes, is there any way to look for primer in first n bp of the remaining sequence? I would think that any nucleotides between primer and linker are synthesis problem, and the whole read should better be disregarded. Still, some mismatches within primer sequence are ok since we know the primers are not really universal.

In the trim.seqs command, mothur will scrap the sequence if any part of the trimming fails.

I have a similar problem. I added the linker to the barcode, which was a good improvement. However, I still find some unexpected assignments. Two examples. In my pool, I have no sample corresponding to these barcodes (the linker is AG). However, mothur (with bdiffs=1) assigned these sequences:

ex 1:

barcode
ACGTAGCTAG
seq
ACGTCGCTGGTTACCGCGG…

ex 2:

barcode
CAGTGAGAAG
seq
CAGTGAAGGTTACCGCGGCTG…

The barcodes are designed in order to require a minimum of two differences between samples. Over 8 position (10 with the linker), here we have far many differences. For info, the primer starts at GGTTACC. Maybe, it would be more stringent to read the barcode without allowing indels. Indels allow too many differences. What do you think?
In advance, thanks for help.

What version of mothur are you using? When I ran your sequence fragments with some debug flags I am getting:

[DEBUG]: reading type - barcode.
[DEBUG]: reading - ACGTAGCTAG.
[DEBUG]: reading type - barcode.
[DEBUG]: reading - CAGTGAGAAG.
[DEBUG]: 0 0 52
[DEBUG]: seq aligned fragment =ACGTCGCTGG, barcode =ACGTAGCTAG, numDiffs = 2
[DEBUG]: seq aligned fragment =ACGTC-GCTG-GT–, barcode =----CAG-TGAGAAG, numDiffs = 10
[DEBUG]: seq, trashcode= b
[DEBUG]: seq2 aligned fragment =-CAGT-G-AAG, barcode =AC-GTAGCTAG, numDiffs = 5
[DEBUG]: seq2 aligned fragment =CAGTGA-AGG, barcode =CAGTGAGAAG, numDiffs = 2
[DEBUG]: seq2, trashcode= b

Mothur scraps both sequences because the barcode can’t be found.

Thanks for your answer, and for spending time on these strange results.
I am using version 1.31.2
Would you need more info, more examples, to find the origin of the problem?