Inquiry on 16S rRNA Analysis Using Mothur and Data Trimming Techniques

I hope this message finds you well. I am currently conducting a meta-analysis involving various studies and am in the process of compiling all the data for 16S rRNA analysis using Mothur. I’m reaching out to seek your expertise on a couple of questions I have encountered during the process.

  1. Data Trimming with PCR.seqs:
    I am working with two types of read data: single reads and paired reads. The paired reads cover the V3-V4 regions, while the single reads cover V3 and V4 separately. After aligning this data with the V3-V4 SILVA database, I’ve observed the following:

    • Paired reads align from positions 1 to 18897 (440 bp).
    • A portion of single reads aligns from positions 1 to 7481, and most align from positions 6737 to 18897.

    Given this alignment pattern, can I use the pcr.seqs command to trim the aligned files specifically from position 6737 to 18897? I aim to standardize both the paired and single reads for further analysis.

  2. Filtering Challenges:
    I encountered an issue where applying the filter.seqs command with the trump= option resulted in the deletion of all data. However, when proceeding without this step, I am unable to obtain a phylogenetic bootstrap. I am considering using alternative tools like MEGA12 to overcome this. Do you have recommendations on managing this filtering step or insights into the implications of using a different tool to achieve accurate phylogenetic analysis?

Your guidance on these matters would be greatly appreciated. I am eager to ensure the integrity and reliability of the analysis and look forward to any advice you might have.

Hi Moh,

Thanks for your question! I gather that you’re trying to pool sequencing data from all of your studies into a single dataset. Because of the issues you’re seeing, I would strongly advocate for analyzing each dataset and type of data separately. Each dataset will be generated with its own methods (e.g. DNA extraction, PCR conditions, primers, etc). If you ask each dataset your question separately, you can pool hte results statistically to get a consensus answer. We’ve done this in studies looking at obesity and colorectal cancer that you might find to be a helpful guide.

As you mention some are single read datasets for either the V3 or V4 region. When you pool these datasets together you’ll likely have a “.” in every column leading to the removal of all columns from your dataset. Again, I’d analyze all of the studies separately. But if a study has V3, V4, and V3V4 data, I’d handle those all separately. Alternatively, I’d use the same single read, even if the reads were assembled into contigs.

Holler if any of this doesn’t make sense,
Pat