trim.flows on FLX+ or Junior 3.0 data

I’m trying to trim 100,000 reads and I don’t seem to get any output. I created the oligos file by hand and I’m pretty sure I have the primers and barcodes right. Grep for them at the begining of the reads checks out ok. So does visual inspection via VIM and Less. Inspection was done via copy and paste from the oligos file so typos not a factor. Is the older Titanium lookup file OK on FLX+ / Junior 3.0???

-rw-rw-r-- 1 genome genome 4 May 16 13:07 IJM7N3301.trim.flow
-rw-rw-r-- 1 genome genome 835M May 16 13:08 IJM7N3301.scrap.flow
-rw-rw-r-- 1 genome genome 0 May 16 13:08 IJM7N3301.flow.files

My oligo file #1 (w/ primer)

forward TCAATTCNTTT
barcode AGACCTCCCG long-13.8k-
barcode TCGCGGCCCG long-12.9k-
barcode TGAAGCCCGT long-9.56k-
barcode AGACACCCGT long-9.37k-
barcode ATACCACCCG long-8.03k-
barcode TCACACCCGT long-7.08k-
barcode TTCTCAACCC long-6.96k-
barcode ACGCGCCCGT long-6.92k-
barcode ACTCACCCGT long-6.82k-
barcode TGGTGAACCC long-6.73k-
barcode AACCTGGCCC long-6.61k-
barcode TCTCCGTCCC long-6.57k-

My oligo file #2 (w/o primer)
$ cat IJM7N3301.oligos

barcode AGACCTCCCG long-13.8k-
barcode TCGCGGCCCG long-12.9k-
barcode TGAAGCCCGT long-9.56k-
barcode AGACACCCGT long-9.37k-
barcode ATACCACCCG long-8.03k-
barcode TCACACCCGT long-7.08k-
barcode TTCTCAACCC long-6.96k-
barcode ACGCGCCCGT long-6.92k-
barcode ACTCACCCGT long-6.82k-
barcode TGGTGAACCC long-6.73k-
barcode AACCTGGCCC long-6.61k-
barcode TCTCCGTCCC long-6.57k-

cat IJM7N3301.trim.flow

450

cat mothur.1400260070.logfile

mothur > trim.flows(flow=IJM7N3301.flow, oligos=IJM7N3301.oligos, pdiffs=1, bdiffs=0, processors=8)

Using 8 processors.
10000

Using 8 processors.
10000

Using 8 processors.
10000

Using 8 processors.
10000

Using 8 processors.
10000

Using 8 processors.
10000

Using 8 processors.
10000

Using 8 processors.
10000
13060
13060
13059
13060
13059
13060
13060
13060

Appending files from process 6802
Appending files from process 6803
Appending files from process 6804
Appending files from process 6805
Appending files from process 6806
Appending files from process 6807
Appending files from process 6808

Output File Names:
IJM7N3301.trim.flow
IJM7N3301.scrap.flow
IJM7N3301.flow.files


mothur > quit()

In trim.flows you need to set flow=B. You might want to check this out too…

I’ll assume flow=B was a typo because that’s a true / false flag. Order B seems to work though. Thanks pat!


../../mothur/mothur "#sffinfo(sff=IJM7N3301.sff, flow=T)" ../../mothur/mothur "#summary.seqs(fasta=IJM7N3301.fasta)" ../../mothur/mothur "#trim.flows(flow=IJM7N3301.flow, oligos=IJM7N3301.oligos, pdiffs=2, bdiffs=1, processors=8, [b][/b])"

Going over the video now but all my sequences are going to scrap. My commands are below:

…/…/mothur/mothur “#sffinfo(sff=IJM7N3301.sff, flow=T)”
…/…/mothur/mothur “#summary.seqs(fasta=IJM7N3301.fasta)”
…/…/mothur/mothur “#trim.flows(flow=IJM7N3301.flow, oligos=IJM7N3301.oligos, pdiffs=2, bdiffs=1, processors=8, order=B, minflows=1050, maxflows=1050)”
…/…/mothur/mothur “#shhh.flows(file=IJM7N3301.flow.files, processors=8, lookup=…/…/mothur/LookUp_Titanium.pat)”
…/…/mothur/mothur “#trim.seqs(fasta=IJM7N3301.shhh.fasta, name=IJM7N3301.shhh.names, oligos=IJM7N3301.oligos, pdiffs=2, bdiffs=1, maxhomop=8, minlength=200, flip=T, processors=8)”
…/mothur/mothur “#summary.seqs(fasta=IJM7N3301.shhh.trim.fasta, name=IJM7N3301.shhh.trim.names)”
…/mothur/mothur “#summary.seqs()”

probably not productive but I’ve tried this as well

…/…/mothur/mothur “#trim.seqs(fasta=IJM7N3301.shhh.fasta, name=IJM7N3301.shhh.names, oligos=IJM7N3301.oligos, pdiffs=2, bdiffs=1, maxhomop=8, minlength=1050, flip=T, processors=8)”

File size rundown:

=====================================================================
OUTPUT FROM COMMAND: sffinfo(sff=IJM7N3301.sff, flow=T)

95M IJM7N3301.fasta
280M IJM7N3301.qual
834M IJM7N3301.flow

OUTPUT FROM COMMAND: summary.seqs(fasta=IJM7N3301.fasta)

3.1M IJM7N3301.summary

OUTPUT FROM COMMAND: trim.flows(flow=IJM7N3301.flow, oligos=IJM7N3301.oligos, pdiffs=2, bdiffs=1, processors=8, order=B, minflows=1050, maxflows=1050)

522M IJM7N3301.trim.flow
6.6M IJM7N3301.scrap.flow
72M IJM7N3301.long-13.8k-.flow
67M IJM7N3301.long-12.9k-.flow
50M IJM7N3301.long-9.56k-.flow
49M IJM7N3301.long-9.37k-.flow
42M IJM7N3301.long-8.03k-.flow
37M IJM7N3301.long-7.08k-.flow
36M IJM7N3301.long-6.96k-.flow
36M IJM7N3301.long-6.92k-.flow
35M IJM7N3301.long-6.82k-.flow
35M IJM7N3301.long-6.73k-.flow
35M IJM7N3301.long-6.61k-.flow
34M IJM7N3301.long-6.57k-.flow
4.0K IJM7N3301.flow.files

OUTPUT FROM COMMAND: mothur > shhh.flows(file=IJM7N3301.flow.files, processors=8, lookup=…/…/mothur/LookUp_Titanium.pat)

8.5M IJM7N3301.long-6.82k-.shhh.qual
2.7M IJM7N3301.long-6.82k-.shhh.fasta
172K IJM7N3301.long-6.82k-.shhh.names
6.7M IJM7N3301.long-6.82k-.shhh.counts
184K IJM7N3301.long-6.82k-.shhh.groups
2.0M IJM7N3301.long-13.8k-.shhh.qual
616K IJM7N3301.long-13.8k-.shhh.fasta
224K IJM7N3301.long-13.8k-.shhh.names
8.7M IJM7N3301.long-13.8k-.shhh.counts
376K IJM7N3301.long-13.8k-.shhh.groups
13M IJM7N3301.shhh.fasta
du: cannot access ‘IJM7N3301.shhh.namesi’: No such file or directory

OUTPUT FROM COMMAND …/…/mothur/mothur “#trim.seqs(fasta=IJM7N3301.shhh.fasta, name=IJM7N3301.shhh.names, oligos=IJM7N3301.oligos, pdiffs=2, bdiffs=1, maxhomop=8, minlength=200, flip=T, processors=8)”

0 IJM7N3301.shhh.trim.names
1.8M IJM7N3301.shhh.scrap.names
0 IJM7N3301.shhh.groups

Can you run this…

cut -f 1 -d " " IJM7N3301.scrap.flow | cut -f 2 -d “|” | sort | uniq -c

And post the results?

cut -f 1 -d " " IJM7N3301.scrap.flow | cut -f 2 -d “|” | sort | uniq -c

1 1670
1 b
530 bf
174 f
25 h
80 l
11 lbf
2 lf

I recently got feedback on the correct primer sets (that I previously reverse engineered) so I reprocessed and got this:

cut -f 1 -d " " IJM7N3301.scrap.flow | cut -f 2 -d “|” | sort | uniq -c

1 1670
1 b
467 bf
2606 f
25 h
82 l
7 lbf
4 lf

everything still going to scrap:

=====================================================================
OUTPUT FROM COMMAND trim.seqs(fasta=IJM7N3301.shhh.fasta, name=IJM7N3301.shhh.names, oligos=schloss.oligos, pdiffs=2, bdiffs=1, maxhomop=8, minlength=200, flip=T, processors=8)

0 IJM7N3301.shhh.trim.fasta
13M IJM7N3301.shhh.scrap.fasta
0 IJM7N3301.shhh.trim.names
1.8M IJM7N3301.shhh.scrap.names
0 IJM7N3301.shhh.groups

New primer/mid file:

forward CCGTCAATTCMTTTRAGT
barcode TTCTCAAC mock1
barcode TCACAC mock2
barcode AACCTGGC mock3
barcode AGACAC soil1
barcode ACGCGC soil2
barcode ACTCAC soil3
barcode TCTCCGTC human1
barcode TGAAGC human2
barcode TGGTGAAC human3
barcode AGACCTC mouse1
barcode ATACCAC mouse2
barcode TCGCGGC mouse3

ah, sorry - when you run shhh.flows you also need order=B

I think I’m good. Thanks! Btw the doc page on shhh.flows doesn’t list order= as an option…at least not in the example commands. I think there is a typo on that page as it’s trim.flows and not shhh.flows:

trim.flows(fasta=GQY1XT001.flow, order=A)

http://www.mothur.org/wiki/Shhh.flows#order


My results below:

=====================================================================
OUTPUT FROM COMMAND trim.seqs(fasta=IJM7N3301.shhh.fasta, name=IJM7N3301.shhh.names, oligos=schloss.oligos, pdiffs=2, bdiffs=1, maxhomop=8, minlength=200, flip=T, processors=8)

12M IJM7N3301.shhh.trim.fasta
8.0K IJM7N3301.shhh.scrap.fasta
1.8M IJM7N3301.shhh.trim.names
4.0K IJM7N3301.shhh.scrap.names
2.1M IJM7N3301.shhh.groups

cut -f 1 -d " " IJM7N3301.scrap.flow | cut -f 2 -d “|” | sort | uniq -c

1 1670
1 b
467 bf
2606 f
25 h
82 l
7 lbf
4 lf

mothur > summary.seqs(fasta=IJM7N3301.shhh.trim.fasta, name=IJM7N3301.shhh.trim.names)

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 450 450 0 4 1
2.5%-tile: 1 525 525 0 4 2532
25%-tile: 1 558 558 0 5 25319
Median: 1 567 567 0 5 50638
75%-tile: 1 579 579 0 5 75957
97.5%-tile: 1 597 597 0 6 98744
Maximum: 1 631 631 0 8 101275
Mean: 1 566.1 566.1 0 5.00191

of unique seqs: 21131

total # of seqs: 101275

Output File Names:
IJM7N3301.shhh.trim.summary