Joining two mothur work flows, is it possible?

I have two data sets for a large number of samples (ca 350), one sequencing DNA and the other cDNA from the same samples. I want to have the possibility to compare the two datasets at the OTU-level, so I at least need to do the clustering of all samples together. However, as the DNA samples have already been processed through mothur, I’m now wondering if I have to start from the very beginning with all the 700 samples (DNA and cDNA) or if it is possible to merge the analysis at some point. Has anyone got some clever ideas on how to handle this? Happy for any suggestions :smiley:

You would need to do them together. Be forewarned that the process of generating cDNA probably has a very high error rate relative to your PCR and sequencing error rates.


you can process up through chimera checking independently then cat each of the pairs of files together (cat cDNA.fasta DNA.fasta…)

Thank you! I will try that one - it will save a lot of work and time :smiley:

Hi, I’m also interested in this. I don’t quite understand how we can join data using just cat. The fasta files of course yes, but how can we join the count table? If I understand correctly, it includes the essential information on how many times each sequence was found in each sample. Hmmmm or could I just transpose it, so that samples are in rows instead of columns?

The newer versions of mothur have a command to merge count tables together (here).

Great, thanks!

Unfortunately this doesn’t seem to work as I thought. It’s not capable of REALLY merging count tables, when they have identical sequences. I mean, I have count tables from different samples but partially identical sequences and I’m trying to join these for further processing.

The same problem with joining the fasta files, although this is simpler: is there any tool to join fasta files while removing duplicates?