Does this command exist?


I have samples from two different pyrosequencing runs. I am able to analyze the data from the two different SFF files individually for alpha diversity but would like to compare the two data sets using the beta diversity measurements. Is it possible to combine my two data sets to look at beta diversity? If so, how and at what point should I do this?

Thank you!

I would recommend doing this after running trim.flows, shhh.flows and trim.seqs.

And I think merge.files is the command you are after

I take advantage of this question to ask for some additional details.

I’m in the same situation. I have 5 samples to compare (they have been run singularly) and I would like to analyze the beta-diversity.
After merging .fasta files, how can I obtain a common .name file and .group file? I’ve tried to obtain these two by merge.files command but the group file contains just one group, named “trim”, instead of 5.
When I’ve tried with, the group file contains 5 groups BUT just the unique sequences are listed.

Could you post the commands you ran?

Hello, thank you very much for replying!
I’ve followed the SOP in the wiki. And I’ve tried to merge files in different points of the procedure. But the problem is always the same.

Last time, I tried as follows:

-sffinfo(sff=sample1.sff, flow=T)
-sffinfo(sff=sample2.sff, flow=T)
-sffinfo(sff=sample3.sff, flow=T)
-sffinfo(sff=sample4.sff, flow=T)
-sffinfo(sff=sample5.sff, flow=T)

trim.flows(flow=sample1.flow, minflows=360, maxflows=720, fasta=T, processors=6)
trim.flows(flow=sample2.flow, minflows=360, maxflows=720, fasta=T, processors=6)
trim.flows(flow=sample3.flow, minflows=360, maxflows=720, fasta=T, processors=6)
trim.flows(flow=sample4.flow, minflows=360, maxflows=720, fasta=T, processors=6)
trim.flows(flow=sample5.flow, minflows=360, maxflows=720, fasta=T, processors=6)


And then I went on with the other commands of the SOP (trim.seqs, unique.seqs, align.seqs, screen.seqs, filter.seqs, unique.seqs, chimera.uchime, remove.seqs, classify.seqs, dist.seqs, cluster) and calculated a-diversity indexes and rarefaction curves.

For the b-diversity I read that I need a .shared file. To do a shared file I need a group file with the list of the sequences and the names of each groups.

I did:

  • make.groups(fasta=sample1…fasta-sample2…fasta-sample3…fasta-sample4…fasta-sample5…fasta, groups=sample1-sample2-sample3–sampl4-sample5)
  • merge.files(name=sample1…names-name=sample2…names-name=sample3…names-name=sample4…names-name=sample5…names)
  • merge.files(list=sample1…list-list=sample2…list-list=sample3…list-list=sample4…list-list=sample5…list)

When I did this I was really carefull at using the .fasta, .list and .names files from the same stage, to avoid any problems. Indeed with summary.seqs I checked all steps BUT I also noticed that the group files had just the unique sequences.

I did make.shared with the merged.list file and the merged.nemes and the group file but mothur told that the number of sequences in the name files was graeter than in the group file “please, correct”.

Anyway, since I tried in many different ways, for curiosity, I made a group file at the very beginning (after shhhinfo). Running summary.seqs on the 5 .fasta files I saw that, also this time, the number of the sequences in the group file was just the sum of the unique sequences in the 5 fasta.files.

So, now, I’m asking. How can I do to analyse the beta-diversity when my samples were run separately?

Okay, a few things… If you want to make.groups with all the sequences you need to run deunique.seqs on each fasta file first.

deunique.seqs(fasta=sample1.fasta, name=names1.names) - make fasta file with all sequences in it
make.groups(fasta=current, groups=sample1) - create a group file for this sample

The merge.files command will not work well on a list file. It will cause mothur to have read errors. Also, from a biology standpoint merging samples together could change OTUs formed.

Instead of running all the samples separately, have you thought about using the sff.multiple command? The sff.multiple command runs the sffinfo, trim.flows, shhh.flows and trim.seqs commands and then combines the fasta, names and groups files for you. You can then proceed with your analysis from there. If you do not have an oligos file, the ideal place to combine the files is after trim.seqs. You can run the deunique.seqs command on each sample, then make.groups(fasta=sample1.redundant.fasta-sample2.redundant.fasta-sample3.redundant.fasta-sample4.redundant.fasta-sample5.redundant.fasta, groups= sample1-sample2-sample3-sample4-sample5).

Thank you very much for you detailed answer. I’ll try right now and let you know.
I’ve never tried with sff.multiple command because I do not have a oligos file, but now I Know how to do. Thanks!

Hello. I’ve processed my data as you suggested and It worked very well: now I have all the indexes that I could wish! Thank very much!

I had just some problems in making a merged .names file, so I’ve quitted mothur and run it in a new terminal. Then I obtained a new .names file by unique.seqs(fasta=mergedsamples). For the next steps of SOP I used just a merged .fasta file, its .names file and the .group file.

Again, thank you!

Glad to help, :slight_smile: