merge.sfffiles

Hi,

I have 5 sff files, each sff file containing reads with one different barcode (A1 to A5).
When I tried to merge these sff files into one file, I have this error message :
[ERROR]: merge issue with common headers. Key sequences do not match. A1_B6_M.sff Key sequence is TCAGCTAAGGTAACGAT, but A2_B6_J0.sff key sequence is TCAGTAAGGAGAACGAT.
[ERROR]: merge issue with common headers. Key sequences do not match. A1_B6_M.sff Key sequence is TCAGCTAAGGTAACGAT, but A3_B6_J2.sff key sequence is TCAGAAGAGGATTCGAT.
[ERROR]: merge issue with common headers. Key sequences do not match. A1_B6_M.sff Key sequence is TCAGCTAAGGTAACGAT, but A4_B6_J7.sff key sequence is TCAGTACCAAGATCGAT.
[ERROR]: merge issue with common headers. Key sequences do not match. A1_B6_M.sff Key sequence is TCAGCTAAGGTAACGAT, but A5_M6.sff key sequence is TCAGCAGAAGGAACGAT.

So, my questions are simply why, and how to merge its?

Many thanx!!

Lionel

Hmm… That’s odd. It almost looks like the barcodes are attached to the key sequences. Can you send me the sff files at mothur.bugs@gmail.com?

You should have got the sff files via mothur.bugs@gmail.com… did you notice anything special?

… still no answer?

Sorry for the delay in getting back to you. I have been on Christmas break. The problem you are having stems from the key sequence in each sff file. Here is the decoded information for IonXpress_001.A1-B6-M.2014-12-03.sff common header:

magicNumber = 779314790
version = 0001
index offset = 0
index read length = 0
numReads = 243357
readLength = 704
key length = 17
num flow Reads = 650
read format code = 1
flow chars = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTC
key = TCAGCTAAGGTAACGAT

Here is the same for IonXpress_003.A3-B6-J2.2014-12-03.sff:

magicNumber = 779314790
version = 0001
index offset = 0
index read length = 0
numReads = 202130
readLength = 704
key length = 17
num flow Reads = 650
read format code = 1
flow chars = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTC
key = TCAGAAGAGGATTCGAT

Mothur creates a new common header for the merged files from the headers of the old files. Everything must match with the exception of the numReads, index offset and index read length since in the new file these are updated. Mothur is not sure which key sequence to use, since there are more than one. Your key sequences seem to be the standard TCAG followed by a barcode AAGAGGATTCGAT or CTAAGGTAACGAT. In order for the merge to work you would need to modify the key sequence and key length in each file and adjust the padding. I added a keytrim parameter to the merge.sfffiles command. If keytrim=t, mothur will trim the key sequences to the first 4 characters, provided those match, and create a merged sff file that can be read by sffinfo. This change will be part of 1.35.0.

The new header looks like:

magicNumber = 779314790
version = 0001
index offset = 0
index read length = 0
numReads = 445487
readLength = 704
key length = 4
num flow Reads = 650
read format code = 1
flow chars = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTC
key = TCAG

Kindly,
Sarah

Sarah thank you!
Indeed, in this case, the key sequence is composed of TCAG + barcode (10mer) + GAT (spacer). I look forward to testing the new keytrim option.
Sincerely,
Lionel