Hello mothur forum,
I have one group whose read count differs based on what other groups it’s extracted with:
I run sff.info then trim.seqs (maxambig=0, maxhomop=8, flip=T, bdiffs=1, pdiffs=2, qwindowaverage=35, qwindowsize=50, minlength=200, maxlength=500) on all files in an sff. The number of reads for the group in question, which is E04, is 19629.
I then run trim.seqs with the same parameters as above but this time there are only a subset of flies in the oligos group. After running, the number of reads for E04 is 36311.
I ran trim.seqs, same parameters, but only included E04 in the oligos file and there were 36311 reads.
I ran trim.seqs, same parameters, this time with E04 and the other groups NOT included in the original subset and 19629 reads are reported for E04.
They all have unique barcodes. Has anyone seen anything like this before or have an idea of what’s going on?
This is happening with multiple samples in multiple sff files. Some groups have the exact same number of reads each time, some are off by a few, some by a few hundred and one that almost doubles in read number.
Is there a barcode in the set you’re removing that has a similar sequence to the E04 one?
I’m using the HMP barcoded primers for the V35 region. I was able to figure out the two groups in this sff that are conflicting:
barcode TCACCTC E27
barcode TCACAC E04
When I do trim.seqs with just these two groups in the oligos file, E04 has the higher number of reads. What’s interesting is that E27 has about the same number of reads as it does when I do trim.seqs with an oligos file that has everything from the run in it.
I’m trying to understand why this is happening. I use the the same barcodes for every run and I’m seeing this issue with each run. It was easier to pinpoint which groups were conflicting in this sff file because E04 was the only group that had a large increase after subgrouping. My other three sff files have two or more groups that increase with subgrouping. Is there a way I can bin the sequences by group without removing the barcode sequence so I can figure out the real number of reads for the weird groups?
What’s the primer sequence?
Forward sequencing primer is CCGTCAATTCMTTTRAGT.
Here’s what happens when barcode is present (no differences)…
When barcode is not present (one difference)…
Sequence TCAC-CTC [the terminal TC is then aligned to the primer]
So the moral of the story is that if a barcode is used on a run, it needs to be in the oligos file. If you aren’t interested in the sample then in the oligos file you could give it a “NA” group name and then remove those sequences after running trim.flows by removing the file name from the flow.files file.
Hope this helps,