how to process huge pyrosequencing data using Mothur?

Dharmesh · April 19, 2010, 10:51am

I was processing my 454-pyrosequencing data following the steps told in the
Analysis examples-Pyrosequencing 16SrRNA sequence analysis-Sogin data analysis; but after using this command
mothur > dist.seqs(fasta=sogin.unique.filter.fasta, cutoff=0.10)
which was meant for calculating the column formatted distance matrix, my system kept on working continously for 4 days on the command and finally it crashed.
Can you help me with this problem? I am still new at this.
My data consisted of 35 soil samples which were processed using 454-pyrosequencing and the reads obtained after sequencing were on an average around 6000-7000/sample.

Thanks,
Dharmesh

pschloss · April 19, 2010, 9:07pm

Can you double check that you were giving dist.seqs your aligned sequences and not the unaligned sequences? By the way, I’d gently suggest that you follow our work up of the Costello dataset instead.

Dharmesh · November 4, 2010, 6:18am

Hi,
I am processing a data set generated after 454 pyrosequencing using Costello analysis example.
The problem is that after the “screen seq.” command, all the files are generated normally like names, good names, bad names, good align , bad align and so on but
after completion of the command mothur says like this:
“Your namefile does not include the sequence AY281352 please correct.”
for around 680 sequences.

Next when I randomly checked the sequences number in the names file. generated after the command, they were there in the file.
I couldn’t understand the problem and then I redid the analysis again from the beginning by making new group files again but the problem still persists.

Regards,
Dharmesh

pschloss · November 4, 2010, 9:53am

Please post the exact commands you are using as well as the version number of mothur you are using…

Dharmesh · November 5, 2010, 1:10am

Exact commands used are as follows…
cat *.group > china.groups
cat *.fasta > china.fasta

summary.seqs(fasta=china.fasta)

trim seqs. command was not used because fasta files were already processed for that.
unique.seqs(fasta=china.fasta)
summary.seqs(fasta=china.unique.fasta)
align.seqs(candidate=china.unique.fasta, template=Eztaxon_bacteria_aligned.fasta, processors=6)
summary.seqs(fasta=china.unique.align)
screen.seqs(fasta=china.unique.align, group=china.groups, name=china.names, start=314, end=770, maxhomop=7, minlength=300, alignreport=china.unique.align.report)

mothur v.1.8
Last updated: 2/02/2010
this mothur version was used.

Regards,
Dharmesh

pschloss · November 5, 2010, 10:03am

A couple of things… I’m not sure how you generated the groups files, but that could be a problem. I would really encourage you to start from the sff file(s) and run trim.seqs to automatically generate the groups file for you and to use the quality score trimming. Alternatively, you could use the make.groups command to make the group files for you. I suspect this is the problem.

Also, you are using version 1.8, which was from last February. We are up to version 1.14. There may have been a few things implemented/fixed since then that would also solve this problem.

Pat

Topic		Replies	Views
Problems with screen.seqs and filter.seqs commands Commands in mothur	7	7203	September 20, 2012
align.seqs shortens some pyrosequences Commands in mothur	1	2195	April 23, 2013
align.seqs in mothur 1.13 mothur bugs	1	3228	October 12, 2010
groupfile has more valid sequences in it than my namefile mothur bugs	7	11369	October 24, 2012
Mothur and RDP Integrating mothur with other programs	1	5648	May 9, 2014

how to process huge pyrosequencing data using Mothur?

Related topics