I’m running into a recurring problem running chimera.slayer on my data that I can’t seem to pin-down - any suggestions would be appreciated.
I’m running simple batch processes to align and identify chimeras in a series of files with simulated reads:
unique.seqs(fasta=Even_By_Gene_1500_wFuse_1.fa)
align.seqs(fasta=current, reference=silva.bacteria.fasta, flip=t, processors=8)
chimera.slayer(fasta=current, reference=silva.both.align, processors=8)
unique.seqs(fasta=Even_By_Gene_1500_wFuse_2.fa)
align.seqs(fasta=current, reference=silva.bacteria.fasta, flip=t, processors=8)
chimera.slayer(fasta=current, reference=silva.both.align, processors=8)
…
But the process is inconsistently hanging - eating up CPU cycles but not going any where. The STDERR looks like this:
sh: Syntax error: EOF in backquote substitution
sh: Syntax error: EOF in backquote substitution
[formatdb] ERROR: Could not open Even_By_Gene_1500_wFuse_genus_3f.unique.73597974.template.unaligned.fasta
[megablast] FATAL ERROR: blast: Unable to open input file Even_By_Gene_1500_wFuse_genus_3f.unique.73597974.candidate.unaligned.fastaClostridium_beijerinckii_12_i17d13s15_11067100171
And the STDOUT looks like this:
…
Staphylococcus_epidermidis_1__45__Rhodobacter_sphaeroides_1_i23d18s16_1 yes
Methanobrevibacter_smithii_1__59__Lactobacillus_gasseri_6_i15d19s11_1 yes
Processing sequence: 1375
Processing sequence: 1300
Processing sequence: 1300
Bacillus_cereus_3__79__Staphylococcus_epidermidis_3_i20d18s12_1 yes
Processing sequence: 1300
Processing sequence: 1375
Acinetobacter_baumannii_4__56__Staphylococcus_epidermidis_5_i15d22s22_1 yes
Processing sequence: 1375
Rhodobacter_sphaeroides_2__67__Acinetobacter_baumannii_1_i23d23s15_1 yes
Deinococcus_radiodurans_3__56__Listeria_monocytogenes_6_i16d32s22_1 yes
Staphylococcus_aureus_1__64__Bacillus_cereus_13_i22d27s15_1 yes
Streptococcus_mutans_1__39__Clostridium_beijerinckii_4_i12d28s16_1 yes
Lactobacillus_gasseri_4__20__Bacteroides_vulgatus_7_i11d21s18_1 yes
Acinetobacter_baumannii_2__70__Pseudomonas_aeruginosa_2_i19d22s17_1 yes
Bacteroides_vulgatus_1__49__Listeria_monocytogenes_6_i15d23s12_1 yes
Clostridium_beijerinckii_3__70__Pseudomonas_aeruginosa_2_i17d17s14_1 yes
Lactobacillus_gasseri_5__40__Listeria_monocytogenes_6_i15d22s16_1 yes
Escherichia_coli_3__64__Staphylococcus_aureus_1_i22d17s18_1 yes
Clostridium_beijerinckii_1__33__Streptococcus_agalactiae_4_i15d23s6_1 yes
Clostridium_beijerinckii_12__73__Propionibacterium_acnes_3_i20d18s13_1 yes
Streptococcus_mutans_5__52__Lactobacillus_gasseri_2_i19d26s16_1 yes
Processing sequence: 1375
Rhodobacter_sphaeroides_1__47__Escherichia_coli_5_i18d32s12_1 yes
Helicobacter_pylori_1__77__Acinetobacter_baumannii_4_i22d23s18_1 yes
Processing sequence: 1375
Processing sequence: 1375
So it’s apparently hanging on sequence 1375 - but why? I can’t see any pattern when I look at the particular sequences at that position.
-It’s making it through the first 1k sequences with no problem, and happens inconsistently, so its not a bad install of FormatDB
-Because it works inconsistently, and replacing all colons (’:’) doesn’t resolve the issue, its not the Megablast issue that’s been previously discussed (chimera.slayer: could not open file)
-Why always at sequence 1375, but not everytime at that position?
I am confounded. Any suggestions would be appreciated.