mothur

Choose deliminator in bin.seq fasta header

Hello,

The bin.seq function puts the data into a nice format that contains all the info I want to use in downstream analysis; however, I have found that the spaces separating the sample name and the MOTU name in the fasta header is causing me a headache when I read the fasta into R. The space gets changed to “” when I read the fasta into R, and then a simple action of splitting the header into two columns, sample and motu name, becomes tricky due to \ representing literal in R. In my case, if not more generally, being able to set the delimiter between the fasta header could make this output more usable when read into other programs for further analysis.

Best,
Katie Erickson
Post-Bac Researcher at Harvey Mudd College

Hi Katie,

Thanks for your note - our understanding is that the FASTA “standard” allows for the inclusion of further description in the header row after a space. I’d suggest that when you read in the FASTA file in R using scan that you use sep="\n" to separate by lines and then work to parse the header row using something like…

str_replace(header, pattern="(>\\S*) .*", replacement="\\1")

Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.