names file

I’m following the Costello pyrosequencing example to analyze my dataset. I’ve completed all the steps up to the alpha diversity measurements. I’m having difficulties deciding what names file to should use:
fasta file (unique) has 33 255 sequences
my names file has 33 255 unique sequences (i have 33 255 rows in the names file) but the second column has all the non-unique sequences.
In order to continue with the alpha diversity measurements, I need to create a groups file which contains all the sequence ID listed in the names file (unique and non-unique). So I do a list.seqs to produce an accnos file which I can use to produce a new groups file containing all 50,970 sequence IDs listed in the names file (unique and non-unique sequence IDs). Is this correct? Or should I be doing a unique.seqs on my fasta file containing only the unique sequences to get a names file containing only unique sequence IDs? I hope this is not too confusing…Thanks

if you’re following the script, then you should have the group file that you need to run read.otu. The original groups file is generated in the trim.seqs command and then you remove sequences as you go along in the screen.seqs commands.

Actually, when I ran trim.seqs I did not generate a groups file. I think this is because I didn’t use the oligo option (as my dataset already had the barcodes and primer sequences removed prior to receiving it; therefore, I don’t have an oligos file). Instead, I just didn’t use the group file option in screen.seqs and only created a groups file from my list file (created by cluster() ). What should I have used to generate a groups file and at what point should I have made this group file?

If you’re doing 454 sequence analysis, the easiest way to do it is to let trim.seqs make the groups file for you. Alternatively, what you described in your first post sounds right.


sounds good. so if I don’t include the oligos= option, I will not get a group file (which is the problem I encountered) so instead I created a group file by using list.seqs(fasta=…trim.fasta) and used this group file for downstream processing.

right, but you’ll have to do some file manipulation to make the group file.