chimera.slayer: fatal error in Megablast

I’m running into a recurring problem running chimera.slayer on my data that I can’t seem to pin-down - any suggestions would be appreciated.

I’m running simple batch processes to align and identify chimeras in a series of files with simulated reads:

unique.seqs(fasta=Even_By_Gene_1500_wFuse_1.fa)
align.seqs(fasta=current, reference=silva.bacteria.fasta, flip=t, processors=8)
chimera.slayer(fasta=current, reference=silva.both.align, processors=8)
unique.seqs(fasta=Even_By_Gene_1500_wFuse_2.fa)
align.seqs(fasta=current, reference=silva.bacteria.fasta, flip=t, processors=8)
chimera.slayer(fasta=current, reference=silva.both.align, processors=8)

But the process is inconsistently hanging - eating up CPU cycles but not going any where. The STDERR looks like this:

sh: Syntax error: EOF in backquote substitution
sh: Syntax error: EOF in backquote substitution
[formatdb] ERROR: Could not open Even_By_Gene_1500_wFuse_genus_3f.unique.73597974.template.unaligned.fasta

[megablast] FATAL ERROR: blast: Unable to open input file Even_By_Gene_1500_wFuse_genus_3f.unique.73597974.candidate.unaligned.fastaClostridium_beijerinckii_12_i17d13s15_11067100171

And the STDOUT looks like this:


Staphylococcus_epidermidis_1__45__Rhodobacter_sphaeroides_1_i23d18s16_1 yes
Methanobrevibacter_smithii_1__59__Lactobacillus_gasseri_6_i15d19s11_1 yes
Processing sequence: 1375
Processing sequence: 1300
Processing sequence: 1300
Bacillus_cereus_3__79__Staphylococcus_epidermidis_3_i20d18s12_1 yes
Processing sequence: 1300
Processing sequence: 1375
Acinetobacter_baumannii_4__56__Staphylococcus_epidermidis_5_i15d22s22_1 yes
Processing sequence: 1375
Rhodobacter_sphaeroides_2__67__Acinetobacter_baumannii_1_i23d23s15_1 yes
Deinococcus_radiodurans_3__56__Listeria_monocytogenes_6_i16d32s22_1 yes
Staphylococcus_aureus_1__64__Bacillus_cereus_13_i22d27s15_1 yes
Streptococcus_mutans_1__39__Clostridium_beijerinckii_4_i12d28s16_1 yes
Lactobacillus_gasseri_4__20__Bacteroides_vulgatus_7_i11d21s18_1 yes
Acinetobacter_baumannii_2__70__Pseudomonas_aeruginosa_2_i19d22s17_1 yes
Bacteroides_vulgatus_1__49__Listeria_monocytogenes_6_i15d23s12_1 yes
Clostridium_beijerinckii_3__70__Pseudomonas_aeruginosa_2_i17d17s14_1 yes
Lactobacillus_gasseri_5__40__Listeria_monocytogenes_6_i15d22s16_1 yes
Escherichia_coli_3__64__Staphylococcus_aureus_1_i22d17s18_1 yes
Clostridium_beijerinckii_1__33__Streptococcus_agalactiae_4_i15d23s6_1 yes
Clostridium_beijerinckii_12__73__Propionibacterium_acnes_3_i20d18s13_1 yes
Streptococcus_mutans_5__52__Lactobacillus_gasseri_2_i19d26s16_1 yes
Processing sequence: 1375
Rhodobacter_sphaeroides_1__47__Escherichia_coli_5_i18d32s12_1 yes
Helicobacter_pylori_1__77__Acinetobacter_baumannii_4_i22d23s18_1 yes
Processing sequence: 1375
Processing sequence: 1375

So it’s apparently hanging on sequence 1375 - but why? I can’t see any pattern when I look at the particular sequences at that position.
-It’s making it through the first 1k sequences with no problem, and happens inconsistently, so its not a bad install of FormatDB
-Because it works inconsistently, and replacing all colons (’:’) doesn’t resolve the issue, its not the Megablast issue that’s been previously discussed (chimera.slayer: could not open file)
-Why always at sequence 1375, but not everytime at that position?

I am confounded. Any suggestions would be appreciated.

The 1375 is being outputted by each of the processors. It represents the number of sequences the process has checked. When using multiple processors with the chimera commands, mothur divides the sequences between the processes. It looks like 6 of the processes are finishing their sequences since you have 6 1375’s. I suspect there could be an issue in the input file that is causing one or more of the processors to fail. Have you tried running the command with processors=1?