Merge duplicate MiSeq runs

dackfc · August 13, 2013, 8:23am

We have implemented the MiSeq SOP wet lab for community profiling. Our first run consisted of 384 samples but the coverage for some samples is lower than expected. We plan re-run the 384 samples to boost the coverage. I want to know how to merge these two runs within Mothur? The sample names will be identical so essentially what I need to do is merge each sample file with the duplicate sample file. It it helps for the command, I can change the name of each file so they are not identical, or put each file from run 1 in one folder and each file from run 2 in another folder and create a merged folder?

Any help is greatly appreciated. Thanks in advance and thanks also for your ongoing excellent work in this area.

Chris

westcott · August 14, 2013, 12:42pm

The make.contigs command will allow you to assign more than one set of fastq files to the same group. The file would look like:

group1 forwardFile1.fastq reverseFile1.fastq
group1 forwardFile2.fastq reverseFile2.fastq

dackfc · August 16, 2013, 12:35pm

Thanks,

For example, if you had 10PE reads but the samples were in duplicate, you would make a stability.file as suggested in your example, but it would look like this?

group1 forwardFile1.fastq reverseFile1.fastq
group1 forwardFile2.fastq reverseFile2.fastq
group2 forwardFile1.fastq reverseFile1.fastq
group2 forwardFile2.fastq reverseFile2.fastq
group3 forwardFile1.fastq reverseFile1.fastq
group3 forwardFile2.fastq reverseFile2.fastq
group4 forwardFile1.fastq reverseFile1.fastq
group4 forwardFile2.fastq reverseFile2.fastq
group5 forwardFile1.fastq reverseFile1.fastq
group5 forwardFile2.fastq reverseFile2.fastq

Since this would be 10 rows in the stability.file, am I correct in saying mothur will read this as 10 contigs, but would infact merge them into 5 samples?

Sorry if this is not a clear question, I can elaborate if needed. This is the case for my current analysis but it is only on 175 or 1150 contigs and has took over 12 hours so far, so I do not want to keep it running if there is a problem. It also appears the MAC may have crashed and might this be to an error in merging files? It should be noted that my actual stability.files contains duplicates to be merged but also samples that do not need to be merged, so might this affect anything? I presumed not.

Thanks in advance for your always helpful responses!

Chris

westcott · August 16, 2013, 1:06pm

group1 forwardFile1.fastq reverseFile1.fastq
group1 forwardFile2.fastq reverseFile2.fastq
group2 forwardFile3.fastq reverseFile3.fastq
group2 forwardFile4.fastq reverseFile4.fastq
group3 forwardFile5.fastq reverseFile5.fastq
group3 forwardFile6.fastq reverseFile6.fastq
group4 forwardFile7.fastq reverseFile7.fastq
group4 forwardFile8.fastq reverseFile8.fastq
group5 forwardFile9.fastq reverseFile9.fastq
group5 forwardFile10.fastq reverseFile10.fastq

In the above example, mothur should create a merged fasta file containing the assembled reads from all 10 forward and reverse pairs of files. It should also create a group file with the 5 groups in it. The sequences in the 1 and 2 files would be assigned to group1, the sequences from 3 and 4 would be assigned to group2, and so on.

This is the case for my current analysis but it is only on 175 or 1150 contigs and has took over 12 hours so far, so I do not want to keep it running if there is a problem. It also appears the MAC may have crashed and might this be to an error in merging files?

How many file pairs are in the file? How many reads in each file pair?

It should be noted that my actual stability.files contains duplicates to be merged but also samples that do not need to be merged, so might this affect anything? I presumed not.

Not sure what you mean. Could you explain?

dackfc · August 16, 2013, 5:11pm

Thanks. So it is the group file I am interested in.

I needed to restart the MAC as it had crashed. I have 1150 file pairs in the stability.files and I would estimate around 25,000 reads per pair on average. It is essentially 4 runs on a MiSeq using the Schloss SOP wet lab, with 194 samples per run. The reason I need to merge some runs and not others is because we initially ran 384 samples in a single run which resulted in low coverage. We have since re-ran the samples and I need to merge the corresponding samples for analysis. Within the analysis is also 2 further runs where we ran, only this time we only loaded 194 samples to hit better coverage.

Hope this is clearer as to why my stability.files has some duplicates and some non-duplicates.

Thanks!

Chris

Topic		Replies	Views
make.contigs sample run twice Commands in mothur	1	859	November 1, 2016
Multiple fasta files Commands in mothur	5	4152	May 9, 2014
make.contigs 1 sample sequenced on multiple runs Commands in mothur	2	652	March 27, 2018
Using preprocessed merged reads Commands in mothur	4	1418	March 27, 2017
how to use make.contigs Commands in mothur	1	1238	March 7, 2016

Merge duplicate MiSeq runs

Related topics