update: fixed.how to make a .groups file

Hi all,

Before I ask the question, a brief introduction of the situation: we have the 16s Miseq sequencing data available in the forms of [xx.forward.fastq, xx.reverse.fastq] (raw data), xx.full.fasta(the company merged the raw data in a file for us) and xx.pr.fasta(they make contigs and trimmed for us).

Normally we make contigs with the fastq files and use oligo file and xx.contigs.fasta files to run trim.seqs and get the trimmed sequences and groups file. Then everyone is happy. But the problem here is when we do that, we loss half of the reads and the downstream data mining is horrible.

So we have to use their xx.pr.fasta file which is decent in sequence quality and # of reads. But the obstacle here is that we can’t get .groups file from the xx.pr.fasta file. Or to say that we don’t know how yet. It’s my understanding that xx.pr.fasta is equal to stability.trim.contigs.good.fasta file in the SOP. The .groups file was generated after make.contigs. Or in our previous case the .groups file was made after trim.seqs with the .oligos file. But with this trimmed data set, how do we make a .groups file please?

To summarize, I have a trimmed file xx.pr.fasta, which was assembled and trimmed and equal to stability.trim.contigs.good.fasta and ready for unique.seqs command. How to I make a .groups file so that I can follow SOP to do data mining, please? Thanks much!!

The xx.pr.fasta data looks like this

CA5V::D00420:88:H5NGNBCXX:2:2206:14967:51888 1:N:0:4
TACGGAGGGTGCAAGCGTTATCCGGATTCACTGGGTTTAAAGGGTGCGTAGGCGGGTATGTAAGTCAGTGGTGAAATACCGGAGCTTAACTTCGGAACTGCCATTGATACTATATACCTTGAATATTGTGGAGGTAAGCGGAATATGTCATGTAGCGGTGAAATGCTTAGAGATGACATAGAACACCGATTGCGAAGGCAGCTTACTACGCAAATATTGACGCTGAGGCACGAAAGCGTGGGGATCAAACAGGATTAGATACCCGCGTAGTCC
CA1V::D00420:88:H5NGNBCXX:2:2205:3074:49079 1:N:0:4
TACGTAGGGTCCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTGTGCAAGACCGATGTGAAATCCCCGAGCTTAACTTGGGAATTGCATTGGTGACTGCACGGCTAGAGTGTGTCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAATACCGATGGCGAAGGCAGCCCCCTGGGATAACACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCCGGTAGTCC
CA13E::D00420:88:H5NGNBCXX:2:2114:9286:22238 1:N:0:4
TACGTAGGTGGCGAGCGTTGTCCGGATTTACTGGGCGTAAAGGGAGCGTAGGCGGATTTTTAAGTGAGATGTGAAATACTCGGGCTTAACCTGAGTGCTGCATTTCAAACTGGAAGTCTAGAGTGCAGGAGAGGAGAAGGGAATTCCTAGTGTAGCGGTGAAATGCGTAGAGATTAGGAAGAACACCAGTGGCGAAGGCGCTTCTCTGGACTGTAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCCGGTAGTC
CA19V::D00420:88:H5NGNBCXX:2:2107:12094:71921 1:N:0:4
TACATAGGTTGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGTCTGTAGGTTGTATGTTAAGTCTGGCGTGAAAACTTGGGGCTCAACCCCAAATTGCGTTGGATACTGGCATACTAGTATTGTGTAGAGGTTAGCGGAATTCCTAGCGAAGCGGTGAAATGCGTAGATATTAGGAAGAACATCAACATGGCGAAGGCAGCTAACTGGGCACATATTGACACTGAGAGACGAAAGCGTGGGGAGCAAATAGGATTAGATACCCGTGTAGTCC

update: problem solved. I used list.seqs command along with some other excel tricks.