Make.file naming

Hi, I have a quick question about what make.file is doing. I know Pat said it takes everything from the first underscore to the left as the “sample name”, however this doesn’t seem to be the case in my situation. I have mulitple fastq files that start with the same names that I ultimately want to be part of the same group. Here is an example:

P1	P1_D0_R1_022um_R1_sub.fastq.gz	P1_D0_R1_022um_R2_sub.fastq.gz	
P1_D0	P1_D0_R1_03um_R1_sub.fastq.gz	P1_D0_R1_03um_R2_sub.fastq.gz	
P1_D0_2	P1_D0_R2_022um_R1_sub.fastq.gz	P1_D0_R2_022um_R2_sub.fastq.gz	
P1_D0_3	P1_D0_R2_03um_R1_sub.fastq.gz	P1_D0_R2_03um_R2_sub.fastq.gz	
P1_D0_4	P1_D0_R3_022um_R1_sub.fastq.gz	P1_D0_R3_022um_R2_sub.fastq.gz	
P1_D0_5	P1_D0_R3_03um_R1_sub.fastq.gz	P1_D0_R3_03um_R2_sub.fastq.gz	

Why is the first sample name it runs into called “P1” but then the next sample that is run into is named “P1_D0” (followed by 2-5 appended afterward)? How can I fix this, or make sure all of these sequence files will be regarded as the same group?

To clarify, these are all replicates of the .22um and 3um fractions and I would like to combine them all. If it is not suggested to combine replicates, how do I ensure these are at least regarded as the same group?

Thanks!

Hi there,

This does seem a bit weird. Is it possible to rename your files to replace those _ characters with something else? Alternatively, you can definitely make your own files file where you can give give the files that need to be pooled. You can do this from scratch or you could edit this file.

Pat

Thanks Pat, I did end up using the rename command (in unix) to remove any unnecessary underscores and that seems to have done the trick. How do you specify which files get pooled? Would the first column be group, then followed by sample name, then the paired fastq files?

Close - it’s the name in the first column that’s used. So you would leave the fastq files in columns 2 and 3 and give a name for the pooled group. Alternatively, you could use merge.groups to pool things after going through make.contigs

Pat

Okay that makes sense. Do all of the samples need to be included in a design file or just the ones I want to group together? Because I have other samples that I just want to stand alone

Yeah, they all need to be there - but you can put them in their own group. For example…

A1   A
A2   A
B    B
C    C
D1   D
D2   D
D3   D

Okay I got the merge to work! Merge.groups takes a while with almost 500 samples, thought it wasn’t working for a second lol.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.