Merge .fasta files for OTU counting

I’m new to the forum, so I apologise if my question is duplicated.
I performed a 16S analysis of 192 samples following the available MiSeq SOP, but I made it separately for each sample instead of using the make.file command.

For this reason when I try to use the dist.seqs + cluster commands for OTU clustering the OTU numbers for each .fasta file are not correlative among samples.

Is there a way to merge all my filtered and aligned .fasta files keeping a sample codification in order to obtain an OTU table with the number of reads for each OTU for all samples?

I read about merge.files but I’m not sure if this command will create a groups file keeping codes for each sample…

Any help would be greatly appreciated.


You should merge the files after running make.contigs and then take them through the rest of the pipeline. Otherwise, the alignments will be out of whack after running filter.seqs on the files separately.

But processing sequences from 192 samples within the same file (including heavy steps as pre.cluster or chimera.uchime) seems computationally expensive to me…
Is there any way to split the merged file and parallelize this pipeline? (At least for some steps)

The individual step are parallelized. Also, by processing them together, we take advantage of the redundancy across samples to get further speed up.

Great. And should I use merge.files for that? Will this keep a sample identification on merged file? Or is it better to use another command?
Thank you very much for your time!

Sorry - why aren’t you using the files option in make.contigs? I think that would make your life so much easier. Alternatively, you can concatenate the files as you please

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.