how to process huge pyrosequencing data using Mothur?

I was processing my 454-pyrosequencing data following the steps told in the
Analysis examples-Pyrosequencing 16SrRNA sequence analysis-Sogin data analysis; but after using this command
mothur > dist.seqs(fasta=sogin.unique.filter.fasta, cutoff=0.10)
which was meant for calculating the column formatted distance matrix, my system kept on working continously for 4 days on the command and finally it crashed.
Can you help me with this problem? I am still new at this.
My data consisted of 35 soil samples which were processed using 454-pyrosequencing and the reads obtained after sequencing were on an average around 6000-7000/sample.

Thanks,
Dharmesh

Can you double check that you were giving dist.seqs your aligned sequences and not the unaligned sequences? By the way, I’d gently suggest that you follow our work up of the Costello dataset instead.

Hi,
I am processing a data set generated after 454 pyrosequencing using Costello analysis example.
The problem is that after the “screen seq.” command, all the files are generated normally like names, good names, bad names, good align , bad align and so on but
after completion of the command mothur says like this:
“Your namefile does not include the sequence AY281352 please correct.”
for around 680 sequences.

Next when I randomly checked the sequences number in the names file. generated after the command, they were there in the file.
I couldn’t understand the problem and then I redid the analysis again from the beginning by making new group files again but the problem still persists.

Regards,
Dharmesh

Please post the exact commands you are using as well as the version number of mothur you are using…

Exact commands used are as follows…
cat *.group > china.groups
cat *.fasta > china.fasta

summary.seqs(fasta=china.fasta)

  • trim seqs. command was not used because fasta files were already processed for that.
    unique.seqs(fasta=china.fasta)
    summary.seqs(fasta=china.unique.fasta)
    align.seqs(candidate=china.unique.fasta, template=Eztaxon_bacteria_aligned.fasta, processors=6)
    summary.seqs(fasta=china.unique.align)
    screen.seqs(fasta=china.unique.align, group=china.groups, name=china.names, start=314, end=770, maxhomop=7, minlength=300, alignreport=china.unique.align.report)

mothur v.1.8
Last updated: 2/02/2010
this mothur version was used.

Regards,
Dharmesh

A couple of things… I’m not sure how you generated the groups files, but that could be a problem. I would really encourage you to start from the sff file(s) and run trim.seqs to automatically generate the groups file for you and to use the quality score trimming. Alternatively, you could use the make.groups command to make the group files for you. I suspect this is the problem.

Also, you are using version 1.8, which was from last February. We are up to version 1.14. There may have been a few things implemented/fixed since then that would also solve this problem.

Pat