Issues with setting up pipeline using NCBI SRAs

Hello, I am an undergraduate senior and am very new to mothur. I am doing a project in school following the Mothur tutorial through the Galaxy Server using published NCBI SRAs instead of the tutorial data. I am running into an issue with my count tables having a lack of metadata (output from Count.seqs). The NCBI source page reads that the layout is paired so am skipping the pairing step and Make.contigs (when I try these steps, the command fails). These are the steps I have done before getting an issue

Before data-cleaning steps, the samples contained 160,000 and 190,000 sequences respectively.

1. Download and Extract Reads in FASTQ/FASTA (used two different samples from same study)

2. Create collection of the two outputs

3. FASTQ to FASTA converter on collection

4. Summary.seqs on converted collection (logfile=yes)

5. Make.groups on converted collection (automatically from collection)

5. Screen.seqs on converted collection and make.groups output(maxlength= 251, maxambig=0, groupfile=make.groups output, logfile=yes)

6. Unique.seqs on Screen.seqs output (output format=Name file, logfile=yes)

7. Count.seqs on names output from Unique.seqs and group file from Screen.seqs

This is where the issue occurs. Count.seqs usually fails (for some reason a few times only one of the samples failed) but if I exclude the group file, then I am missing metadata for further steps. This really becomes an issue when getting to the Cluster.split command.

If anyone has any feedback, I would really appreciate it.

I am happy to help. Could you be running out of disk space or memory resources?

If not, could you tell me the version of mothur you are running? Could you post the exact commands you ran in mothur, as well as the error messages mothur is reporting?

Hello, thank you for your help!

It’s possible I’m running out of space, I’m running this off my Mac laptop (but use Windows) which has 8 GB RAM. However, I was able to use the tutorial’s data just fine (similar length sequences and amount of data). The tutorial does not say which version of mothur it uses but I’m assuming a more recent one?

Here is a link to the tutorial I’m using:
16S Microbial Analysis with mothur

Part of the features of Galaxy was searching for the command names from a tool box and then adjusting the parameters so the above commands were chosen and any adjustments are in parentheses. The error messages I was receiving were either “TERM environmental variable not set” or “required metadata values missing”

Hopefully this helps!

The “TERM environmental variable not set” should not cause a crash.

If you send the input files you are using with count.seqs to mothur.bugs@gmail.com, I can take a closer look for you.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.