Merge duplicate MiSeq runs

We have implemented the MiSeq SOP wet lab for community profiling. Our first run consisted of 384 samples but the coverage for some samples is lower than expected. We plan re-run the 384 samples to boost the coverage. I want to know how to merge these two runs within Mothur? The sample names will be identical so essentially what I need to do is merge each sample file with the duplicate sample file. It it helps for the command, I can change the name of each file so they are not identical, or put each file from run 1 in one folder and each file from run 2 in another folder and create a merged folder?

Any help is greatly appreciated. Thanks in advance and thanks also for your ongoing excellent work in this area.

Chris

The make.contigs command will allow you to assign more than one set of fastq files to the same group. The file would look like:

group1 forwardFile1.fastq reverseFile1.fastq
group1 forwardFile2.fastq reverseFile2.fastq

Thanks,

For example, if you had 10PE reads but the samples were in duplicate, you would make a stability.file as suggested in your example, but it would look like this?

group1 forwardFile1.fastq reverseFile1.fastq
group1 forwardFile2.fastq reverseFile2.fastq
group2 forwardFile1.fastq reverseFile1.fastq
group2 forwardFile2.fastq reverseFile2.fastq
group3 forwardFile1.fastq reverseFile1.fastq
group3 forwardFile2.fastq reverseFile2.fastq
group4 forwardFile1.fastq reverseFile1.fastq
group4 forwardFile2.fastq reverseFile2.fastq
group5 forwardFile1.fastq reverseFile1.fastq
group5 forwardFile2.fastq reverseFile2.fastq

Since this would be 10 rows in the stability.file, am I correct in saying mothur will read this as 10 contigs, but would infact merge them into 5 samples?

Sorry if this is not a clear question, I can elaborate if needed. This is the case for my current analysis but it is only on 175 or 1150 contigs and has took over 12 hours so far, so I do not want to keep it running if there is a problem. It also appears the MAC may have crashed and might this be to an error in merging files? It should be noted that my actual stability.files contains duplicates to be merged but also samples that do not need to be merged, so might this affect anything? I presumed not.

Thanks in advance for your always helpful responses!

Chris

group1 forwardFile1.fastq reverseFile1.fastq
group1 forwardFile2.fastq reverseFile2.fastq
group2 forwardFile3.fastq reverseFile3.fastq
group2 forwardFile4.fastq reverseFile4.fastq
group3 forwardFile5.fastq reverseFile5.fastq
group3 forwardFile6.fastq reverseFile6.fastq
group4 forwardFile7.fastq reverseFile7.fastq
group4 forwardFile8.fastq reverseFile8.fastq
group5 forwardFile9.fastq reverseFile9.fastq
group5 forwardFile10.fastq reverseFile10.fastq

In the above example, mothur should create a merged fasta file containing the assembled reads from all 10 forward and reverse pairs of files. It should also create a group file with the 5 groups in it. The sequences in the 1 and 2 files would be assigned to group1, the sequences from 3 and 4 would be assigned to group2, and so on.

This is the case for my current analysis but it is only on 175 or 1150 contigs and has took over 12 hours so far, so I do not want to keep it running if there is a problem. It also appears the MAC may have crashed and might this be to an error in merging files?

How many file pairs are in the file? How many reads in each file pair?

It should be noted that my actual stability.files contains duplicates to be merged but also samples that do not need to be merged, so might this affect anything? I presumed not.

Not sure what you mean. Could you explain?

Thanks. So it is the group file I am interested in.

I needed to restart the MAC as it had crashed. I have 1150 file pairs in the stability.files and I would estimate around 25,000 reads per pair on average. It is essentially 4 runs on a MiSeq using the Schloss SOP wet lab, with 194 samples per run. The reason I need to merge some runs and not others is because we initially ran 384 samples in a single run which resulted in low coverage. We have since re-ran the samples and I need to merge the corresponding samples for analysis. Within the analysis is also 2 further runs where we ran, only this time we only loaded 194 samples to hit better coverage.

Hope this is clearer as to why my stability.files has some duplicates and some non-duplicates.

Thanks!

Chris