groups file not created after screen.seqs

Hi there,

After running screen.seqs, as in

mothur > screen.seqs(fasta=child.trim.contigs.fasta, group=child.contigs.groups, maxambig=0, minlength=167, maxlength=189, maxhomop=5, processors=14)

The following files were created:
child.trim.contigs.bad.accnos

child.trim.contigs.good.fasta

It took a long time for the command to run but a groups file was not created. Any idea why?

Thanks!

Claire

If I had to guess, it’s because none of your sequences were good - Is the *good.fasta file empty? If that’s the case, then you probably want to re-think hte maxlength, minlength, and maxhomop settings.

Pat

The good.fasta file has almost 9GB of sequences and the .accnos files has around 5GB. I just realized that besides the groups file, the summary file was not created either. I am running this command using ssh on a remote server and after checking the server’s activity I saw that mothur is still running, after almost 24 hrs of running this. Is it possible that these files are still being written and that’s why they haven’t been created? I know there are a lot of sequences but I’ve done this before and this particular command has never taken this long.

Thanks,

Claire

How many sequences are there? Did the command tell you it was done? You might be able to speed things up by running unique.seqs before screen.seqs.

The summary file would only be generated after you run summary.seqs.

Hi again,

HEre is the summary.seqs info before and after I ran screen.seqs

mothur > summary.seqs(fasta=child.trim.contigs.fasta)

Using 14 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 97 97 0 2 1
2.5%-tile: 1 155 155 0 3 3586541
25%-tile: 1 165 165 0 4 35865401
Median: 1 185 185 2 4 71730802
75%-tile: 1 189 189 9 5 107596203
97.5%-tile: 1 198 198 16 6 139875063
Maximum: 1 203 202 61 90 143461603
Mean: 1 179.066 179.066 4.6127 4.37821
# of Seqs: 143461603

Output File Names:
child.trim.contigs.summary

  1. screen.seqs(fasta=child.trim.contigs.fasta, group=child.contigs.groups, maxambig=0, minlength=167, maxlength=189, maxhomop=5)

  2. mothur > summary.seqs(fasta=child.trim.contigs.good.fasta, processors=12)


Using 12 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 167 167 0 3 1
2.5%-tile: 1 184 184 0 4 956137
25%-tile: 1 185 185 0 4 9561368
Median: 1 185 185 0 5 19122735
75%-tile: 1 185 185 0 5 28684102
97.5%-tile: 1 188 188 0 5 37289332
Maximum: 1 189 189 0 5 38245468
Mean: 1 185.247 185.247 0 4.72236
# of Seqs: 38245468

Output File Names:
child.trim.contigs.good.summary

WHen I run mothur using ssh in a remote server, the terminal screen freezes after a few hours and I can’t see the current status of that process. I just opened the log file:



Department of Microbiology & Immunology University of Michigan http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type ‘help()’ for information on the commands that are available

Type ‘quit()’ to exit program
Interactive Mode


mothur > screen.seqs(fasta=child.trim.contigs.fasta, group=child.contigs.groups, maxambig=0, minlength=167, maxlength=189, maxhomop=5, processors=14) (END)
That's all the info there is in this logfile, yet the process is still running (for 2 days now). In terms of running screen.seqs with an unique fasta file, wouldn't I have to repopulate the groups file afterwards? When I come to run make.shared, for example, I would have a non unique repopulated list file, yet the groups file would have been created from an unique fasta file.

Thanks for your help,

Claire

In terms of running screen.seqs with an unique fasta file, wouldn’t I have to repopulate the groups file afterwards? When I come to run make.shared, for example, I would have a non unique repopulated list file, yet the groups file would have been created from an unique fasta file.

No - unique.seqs doesn’t touch the groups file and doesn’t get rid of any sequences. All of our SOPs use unique.seqs.