My name is Brittany Jones, I am a research assistant at Southern Illinois University in Carbondale, IL. I am working with an environmental microbial data set for our Acid Mine Drainage project however, I have hit a snag in processing the data set.
screen.seqs(fasta=current, group=current, maxamig=0, maxlength=305)
pcr.seqs(fasta=silva.nr_v123.align, start=8000, end=27000, keepdots=F, processors=20)
system(mv silva.nr_v123.pcr.align silva.v4.align)
screen.seqs(fasta=current, count=current, summary=current, start=2368, end=17316, maxhomop=8)
[ERROR]: Could not open dnr.trim.contigs.good.unique.align7029.num.temp
Was repeated for multiple align#.num.temp files
[ERROR]: found 451212 sequences in your fast file, and 3007015 sequences in your summary file, quitting.
filter.seqs(fasta=current, vertical=T, trump=., processors=20)
[ERROR]: Sequences are not al the same length, please correct.
was repeated multiple times
[ERROR]: Could not open dnr.trim.contigs.good.unique.count_table
Unable to open dnr.trim.contigs.good.unique.count_table. Trying default /share/apps/mothur-1.37.6/dnr.trim.contigs. etc…
Unable to open /share/apps/mothur-1.37.6/dnr.trim.contigs.good.unique.count_table
[WARNING]: This command can take a name file and you did not provide one. The current name file is dnr.trim.contigs.good.names which
seems to match dnr.trim.contigs.good.unique.uniqe.align
[ERROR]: Did not complete summary.seqs
Script File Continues with more errors, but the problem seems to start in the lines of commands above. When I break the data set down to 4 or 5 samples at a time, it works no problem. However, if I run all 17 of the samples together this is the errors I get. I’m also using Mothur-1.37.6 and Silva v123. However, our collaborator is getting the same error and he is using the updated version of Mothur and the newer Silva v128. I am using HPC resources, so computational power and memory are not issues. We noticed that the unique.seqs command does not reduce the dataset greatly.
Has anyone else come across this problem before? Or anyone have any ideas we can get our data to process as a whole set? (We are using the V4 region of the 16S rRNA).