I’m about to start a fairly large analysis and have a concern about the stability.files vs design files - I want to make sure I’m doing this correctly before I commit a fair amount of cluster resources. Am I correct in assuming that the stability.file really only needs to define my individual samples and their associated sequence runs, and the design.file can later be used to group each sample in different ways, based on the metadata, for analysis? For example, if my stability.files contains this:
Jan_AF0701 Jan_AF0701_S1_L001_R1_001.fastq Jan_AF0701_S1_L001_R2_001.fastq
Jan_AF0714 Jan_AF0714_S13_L001_R1_001.fastq Jan_AF0714_S13_L001_R2_001.fastq
Jan_AF0743b Jan_AF0743b_S25_L001_R1_001.fastq Jan_AF0743b_S25_L001_R2_001.fastq
Jan_AF0750 Jan_AF0750_S37_L001_R1_001.fastq Jan_AF0750_S37_L001_R2_001.fastq
Jan_AF0776 Jan_AF0776_S49_L001_R1_001.fastq Jan_AF0776_S49_L001_R2_001.fastq
Jan_AF0844 Jan_AF0844_S61_L001_R1_001.fastq Jan_AF0844_S61_L001_R2_001.fastq
And my design.file contains this (I know the names don’t match in the example, I’m just showing the top few lines of the files):
group subject tissue ptb afbactload gad mr afil6 spun afc
Oct_AF0701 Prosp335 AF PTB_mildIAI 3287 23.5 2 20.1 unspun Positive
Jan_AF0701 Prosp335 AF PTB_mildIAI 3287 23.5 2 20.1 unspun Positive
Oct_AF0702 Prosp336 AF PTB_mildIAI 1726 33.5 2 0.6 unspun Negative
Oct_AF0711 Prosp341 AF TB_noIAI 2419 40 1 0.6 unspun Negative
Oct_AF0714 Prosp344 AF PTB_noIAI 2280 31.2 0 1.0 unspun Negative
Jan_AF0714 Prosp344 AF PTB_noIAI 2280 31.2 0 1.0 unspun Negative
Will I be able to do an analysis comparing samples based on any of the metadata properties found in the design.file?
Thanks for any help you can provide. I suspect I need to attend the next MOTHUR workshop in November…