Number of reads after trim.seqs inconsistent

bpyoumans · September 7, 2012, 10:02pm

Hello mothur forum,

I have one group whose read count differs based on what other groups it’s extracted with:

I run sff.info then trim.seqs (maxambig=0, maxhomop=8, flip=T, bdiffs=1, pdiffs=2, qwindowaverage=35, qwindowsize=50, minlength=200, maxlength=500) on all files in an sff. The number of reads for the group in question, which is E04, is 19629.

I then run trim.seqs with the same parameters as above but this time there are only a subset of flies in the oligos group. After running, the number of reads for E04 is 36311.

I ran trim.seqs, same parameters, but only included E04 in the oligos file and there were 36311 reads.

I ran trim.seqs, same parameters, this time with E04 and the other groups NOT included in the original subset and 19629 reads are reported for E04.

They all have unique barcodes. Has anyone seen anything like this before or have an idea of what’s going on?

Bonnie

bpyoumans · September 7, 2012, 10:24pm

This is happening with multiple samples in multiple sff files. Some groups have the exact same number of reads each time, some are off by a few, some by a few hundred and one that almost doubles in read number.

pschloss · September 10, 2012, 6:52pm

Is there a barcode in the set you’re removing that has a similar sequence to the E04 one?

bpyoumans · September 10, 2012, 8:11pm

Hey Pat,

I’m using the HMP barcoded primers for the V35 region. I was able to figure out the two groups in this sff that are conflicting:
barcode TCACCTC E27
barcode TCACAC E04

When I do trim.seqs with just these two groups in the oligos file, E04 has the higher number of reads. What’s interesting is that E27 has about the same number of reads as it does when I do trim.seqs with an oligos file that has everything from the run in it.

I’m trying to understand why this is happening. I use the the same barcodes for every run and I’m seeing this issue with each run. It was easier to pinpoint which groups were conflicting in this sff file because E04 was the only group that had a large increase after subgrouping. My other three sff files have two or more groups that increase with subgrouping. Is there a way I can bin the sequences by group without removing the barcode sequence so I can figure out the real number of reads for the weird groups?

pschloss · September 11, 2012, 3:57pm

What’s the primer sequence?

bpyoumans · September 11, 2012, 4:18pm

Forward sequencing primer is CCGTCAATTCMTTTRAGT.

pschloss · September 12, 2012, 1:52pm

Here’s what happens when barcode is present (no differences)…

Barcode  TCACCTC
Sequence TCACCTC

When barcode is not present (one difference)…

Barcode  TCACAC
Sequence TCAC-CTC [the terminal TC is then aligned to the primer]

So the moral of the story is that if a barcode is used on a run, it needs to be in the oligos file. If you aren’t interested in the sample then in the oligos file you could give it a “NA” group name and then remove those sequences after running trim.flows by removing the file name from the flow.files file.

Hope this helps,
Pat

Topic		Replies	Views
Count following trim.seqs is way down mothur bugs	4	4455	April 18, 2014
problem with trim.seqs command Commands in mothur	4	3031	February 4, 2013
Stuck in Trim.flows Commands in mothur	12	4588	November 12, 2015
Incomplete trimming using sffinfo? mothur bugs	1	2884	February 7, 2012
Advice on combining sff's Theory behind mothur	1	3068	October 13, 2013

Number of reads after trim.seqs inconsistent

Related Topics