split.groups

Hi,

I would like to send my data to a repository. I figured the best format for my data set would be to have a fasta file for each tag/group in the groups file. Split.groups does this nicely. However, I would like to include the read quality data from the .qual files in my submission so that people who want to use the data can do the quality filtering they like.

My plan was to use trim.seqs with an oligos file but without any further filtering, then use split.groups to get a ‘raw’ data file for each tag/group. The problem is that I loose the quality data along the way. What I would need is to split the quality data from the .qual file into groups + the quality files should be trimmed from primer and tags as well so that they correspond to the fasta files.

Does this make sense? Is there a way to do it with MOTHUR? Would that be worth implementing?

Thank you and keep up the great work,
Fabian

You could use the list.seqs and get.seqs commands to select the quality data for each group, a bit tedious but a solution.

list.seqs(fasta=fastaFileGroup1)
get.seqs(qfile=yourQualityFile, accnos=current)
//change file name so its not overwritten
list.seqs(fasta=fastaFileGroup2)
get.seqs(qfile=yourQualityFile, accnos=current)

You could also post your raw data and your work flow as an Example Analysis and then point readers to the wiki :slight_smile:

Works! I was happy to see that trim.seqs outputs a trim.qual. I totally forgot that. So my .qual files are nicely trimmed. :slight_smile: . I can not provide the raw data because some of the tags in that lane belong to other people’s projects. I could try and make a fake raw data file by first extracting my sequences with trim.seqs and get.groups and then using list.seqs on my sequences and get.seqs on the raw data. Does this make sense?

Thanks,
Fabian

Alternatively, if you make an oligos file with the barcodes labelled for group “ignore” those samples will go away. Without the metadata, the sequences would be pretty worthless to anyone else.