Sequence file lost!

Hi everyone, I am new to mothur. I went through the entire workflow and was able to get taxonomy for my organisms and relative abundance. I remember their being a file with the actual sequences that helped us know the names of the organisms in our sample. I cannot find this file and its driving me crazy. What I was able to find was obviously the final fasta and final taxonomy file but theres a lot of gaps and no labels in the fasta.

Should I just combine these two files in R and go from there or is there another file I am missing?

Hmmm, I’m not sure I know what you mean. Typically people end with a cons.taxonomy file that has the consensus taxonomy for the sequences in their OTUs and a shared file that has the number of times each OTU was observed in each sample. Does that describe what you’re looking for?

You can find the pipeline we recommend and use in our own lab here

Pat

Are you looking for a fasta file with representative sequences for each of your OTUs?
get.oturep will get you there.

okay thank you! Yeah I have found the file with the sequences but they have gaps, is there a file without these gaps? My advisor believes these files are harder to NCBI blast. (I haventused get.oturep yet so maybe this will fix it)

When I run get.oturep as the last/near last step in the SOP the output contains gaps to maintain the alignment. You could always run sed to remove all of the ‘-’ characters, or even a search and replace in a text editor if your output file isn’t too large. There’s probably a cool way to ask mothur do this, but I don’t know what it is offhand.

Try using degap.seqs to remove your gaps

Pat