random length barcodes


I have been given some MiSeq data (V1-V3) which seems to have short random length barcodes before the primer sequences. Do I need to remove these before processing in mothur, or is this something mothur can handle?

Thanks for any advice


If you tell mothur what the barcode sequence are she should be able to figure it out.

FWIW, you might want to read this:


Thanks. I ended up writing a script to just trim everything before the primers.

Thanks for the link, I had seen this last week on twitter, it’s very helpful. I think the V1-V3 primers were chosen in this project because others had used these before (probably 454 though) and also because these were the sequencing provider’s preferred primer set. However this is the pilot stage of the project so I will certainly try to persuade the decision makers to think again for the next stage. So would the better choice be V4 or maybe V1-V2? (The sample sites are fecal and skin)

In order to rescue something from this data I will try the phylotype approach as you suggest in your post. Also dist.seqs is still running, so I will leave it running and see where it gets to, FWIW this was the summary.seqs before going into dist.seqs:

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1286 426 0 3 1
2.5%-tile: 1 1286 458 0 4 25511
25%-tile: 1 1286 475 0 5 255110
Median: 1 1286 485 0 5 510219
75%-tile: 1 1286 490 0 5 765328
97.5%-tile: 1 1286 495 0 6 994926
Maximum: 1 1286 518 0 6 1020436
Mean: 1 1286 480.944 0 4.98883

of unique seqs: 211997

total # of seqs: 1020436

Thanks again for your help