Issues with setting up pipeline using NCBI SRAs

Squid-Inks · March 12, 2021, 2:30am

Hello, I am an undergraduate senior and am very new to mothur. I am doing a project in school following the Mothur tutorial through the Galaxy Server using published NCBI SRAs instead of the tutorial data. I am running into an issue with my count tables having a lack of metadata (output from Count.seqs). The NCBI source page reads that the layout is paired so am skipping the pairing step and Make.contigs (when I try these steps, the command fails). These are the steps I have done before getting an issue

Before data-cleaning steps, the samples contained 160,000 and 190,000 sequences respectively.

1. Download and Extract Reads in FASTQ/FASTA (used two different samples from same study)

2. Create collection of the two outputs

3. FASTQ to FASTA converter on collection

4. Summary.seqs on converted collection (logfile=yes)

5. Make.groups on converted collection (automatically from collection)

5. Screen.seqs on converted collection and make.groups output(maxlength= 251, maxambig=0, groupfile=make.groups output, logfile=yes)

6. Unique.seqs on Screen.seqs output (output format=Name file, logfile=yes)

7. Count.seqs on names output from Unique.seqs and group file from Screen.seqs

This is where the issue occurs. Count.seqs usually fails (for some reason a few times only one of the samples failed) but if I exclude the group file, then I am missing metadata for further steps. This really becomes an issue when getting to the Cluster.split command.

If anyone has any feedback, I would really appreciate it.

westcott · March 15, 2021, 1:27pm

I am happy to help. Could you be running out of disk space or memory resources?

If not, could you tell me the version of mothur you are running? Could you post the exact commands you ran in mothur, as well as the error messages mothur is reporting?

Squid-Inks · March 18, 2021, 7:22pm

Hello, thank you for your help!

It’s possible I’m running out of space, I’m running this off my Mac laptop (but use Windows) which has 8 GB RAM. However, I was able to use the tutorial’s data just fine (similar length sequences and amount of data). The tutorial does not say which version of mothur it uses but I’m assuming a more recent one?

Here is a link to the tutorial I’m using:
16S Microbial Analysis with mothur

Part of the features of Galaxy was searching for the command names from a tool box and then adjusting the parameters so the above commands were chosen and any adjustments are in parentheses. The error messages I was receiving were either “TERM environmental variable not set” or “required metadata values missing”

Hopefully this helps!

westcott · March 23, 2021, 2:31pm

The “TERM environmental variable not set” should not cause a crash.

If you send the input files you are using with count.seqs to mothur.bugs@gmail.com, I can take a closer look for you.

system · April 2, 2021, 2:31pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
summary.seqs error - count table not unique Commands in mothur	2	1492	January 12, 2016
Error wit count.seqs Commands in mothur	3	2227	January 8, 2015
Getting data for NCBI submission Commands in mothur	14	7219	March 17, 2015
sequencing alignment problem? Commands in mothur	4	1465	September 16, 2016
Blank file after summary.seqs mothur bugs	24	11959	July 2, 2015

Issues with setting up pipeline using NCBI SRAs

Related topics