I get a segfault from shhh.flows, screenshot at end of this post. Version v.1.27.0, 64-bit Windows exe. Even1.sff is an HMP Even community extracted from SRR053818 using the sff-dump utility.
Preprocessing was:
sffinfo(sff=Even1.sff, flow=T)
trim.flows(flow=Even1.flow, oligos=oligos.txt, pdiffs=2, bdiffs=1, processors=8, minflows=300, maxflows=300)
These worked ok as far as I can tell, no empty files etc:
-rw-r–r-- 1 bob None 21M Oct 16 07:26 Even1.fasta
-rw-r–r-- 1 bob None 152M Oct 16 07:26 Even1.flow
-rw-r–r-- 1 bob None 17 Oct 16 07:51 Even1.flow.files
-rw-r–r-- 1 bob None 60M Oct 16 07:26 Even1.qual
-rw-r–r-- 1 bob None 152M Oct 16 07:51 Even1.scrap.flow
-rw-r–r-- 1 bob None 125M Oct 16 07:21 Even1.sff
FYI, the HMP submissions don’t have barcodes, so the oligos.txt file only has primers.
mothur > shhh.flows(file=Even1.flow.files, processors=8)
Using 8 processors.
Processing Even1.trim.flow (file 1 of 1) <<<<<
Reading flowgrams…
Identifying unique flowgrams…
Calculating distances between flowgrams…
0 0 0
0 0 0
Total time: 0 0
Clustering flowgrams…
[ERROR]: Even1.trim.shhh.dist is blank. Please correct.
********************###########
Reading matrix: ||||||||||||||||||||||||||||||||||||||||||||||||||||
Segmentation fault
I tried same command sequence under Linux, also no MPI as far as I’m aware. Same problem:
mothur > shhh.flows(file=Even1.flow.files, processors=4)
Using 4 processors.
Processing Even1.trim.flow (file 1 of 1) <<<<<
Reading flowgrams…
Segmentation fault
So I think the problem is in trim.flows - you don’t seem to have an Even1.trim.flow file. There shoudl also be separate flow files for each of your primers. It looks like everything went into the Even1.scrap.flow file (it seems to be the same size as Even1.flow). I suspect there’s a problem with how you have your oligos.txt file set up. Can you post the contents of oligos.txt and we can go from there…
Pat
Thanks Pat – here is my oligos.txt:
reverse TACGGYTACCTTGTTAYGACTN
forward NGAGTTTGATCCTGGCTCAG
forward CTGCTGCCTCCCCGTAGG
reverse ATTACCGCGGCTGCTGN
reverse CCGTCAATTCMTTTRAGN
forward NACGCGAAGAACCTTAC
I got the sequences from the XML, you can verify them by clicking on the “RNA primer” box in the spot descriptor here:
http://www.ncbi.nlm.nih.gov/sra?term=SRR053818
I verified the primers are present in the sequences by converting to the sra to fasta. Since orientation conventions vary, I tried all combinations of swapping forward-reverse and revcomping the primer sequences, but same result, nothing matched. everything goes into scrap.
Robert.
The problem is the “reverse” line. I’d remove that and try again. trim.flows/trim.seqs expects the sequence to end with the reverse primer sequence if its given. Since you’ll rarely if ever get to the end it’s best to just leave it out. I supect if you look at the first column of the scrap.flow file you’ll see your sequence names followed by “|r” indicating they were scrapped because they were missing the reverse primers.
Pat
I removed the reverse lines. Same result, all goes to scrap. Most scrap lines (90%) have |f suffux, the rest have |lf.
Sorry - I should have looked more closely. Your forward primers are really the reverse primers and vice versa. The HMP sequenced those datasets from the 3’ to 5’ end of the gene. When we use forward / reverse, it’s in reference to the sequence orientation.
Pat - sorry to trouble you again, but I’m still not able to get it to work. I tried these oligo.txt files with the reverse primers (per the XML) annotated as forward (per mothur):
forward TACGGYTACCTTGTTAYGACTN
forward ATTACCGCGGCTGCTGN
forward CCGTCAATTCMTTTRAGN
and revcomped:
forward NAGTCRTAACAAGGTARCCGTA
forward NCAGCAGCCGCGGTAAT
forward NCTYAAAKGAATTGACGG
Same result, all scrap. Would you be kind enough to post a reply with a correct oligo.txt file? I’d really appreciate it, thanks.
forward ATTACCGCGGCTGCTGG v13
forward CCGTCAATTCMTTTRAGT v35
forward TACGGYTACCTTGTTAYGACTT v69
Using this should work… Are you sure they didn’t remove the primers as well?
Many thanks!! That worked. Feature suggestion: in my own stuff, I have a command “primer_normalize” that take a FASTA file with any number of primers and searches them against a file with sequences (fasta, fastq etc). It reports various stats, e.g. orientations, %matched, nr diffs with each hit, and reports pairs of primers that could make amplicons. This allows automatic generation of files like oligos.txt with primer sequences copy/pasted from papers and other sources that may not follow the same conventions. Just a thought.