make.file data downloaded from basespace

I’m downloading demultiplexed data from basespace, each sample has it’s own path to data that looks like this-in my sample sheet I use sample name (human readable name) and sample ID (barcode of sample)

sample1-99999999/Data/Intensities/BaseCalls/R1.fastq.gz
sample1-99999999/Data/Intensities/BaseCalls/R2.fastq.gz
sample2-99999998/Data/Intensities/BaseCalls/R1.fastq.gz
sample2-99999998/Data/Intensities/BaseCalls/R2.fastq.gz

I would like to be really lazy and use mothur to do all the processing before make.contigs, using the files exactly as they are downloaded from basespace-the sample name is the first folder name before the dash.

so far I’ve moved all my samples to one folder and renamed them by appending the human readable name

system(for i in */Data/Intensities/BaseCalls/*.gz; do mv $i "fastq""/"${i%%-*}"."`basename $i`; done)

then make.file- resulting lines look like:

fastq/sample1.R1_001.fastq.gz fastq/sample1.R2_001.fastq.gz

Now I just need to use something like gawk to create the first column, but I’m not figuring it out. Help?

Not sure how good your R skills are, but you should be able to do something like this…

f_files <- list.files("fastq/*R1_001.fastq.gz") #get the names of the R1 compressed fastq files
r_files <- list.files("fastq/*R2_001.fastq.gz") #get the names of the R2 compressed fastq files
samples <- gsub("fastq/(.*)\\.R1_001.fastq.gz", "\\1", f_files) #extract the sample names from the R1 compressed fastq files
write.table(cbind(samples, f_files, r_files), file="my_samples.files", col.names=FALSE, row.names=FALSE, quote=FALSE, sep="\t") #output

I love R, but didn’t think to use it for this because I use it exclusively through RStudio. will give this a try