Choose deliminator in bin.seq fasta header

kerickson · March 31, 2020, 9:16pm

Hello,

The bin.seq function puts the data into a nice format that contains all the info I want to use in downstream analysis; however, I have found that the spaces separating the sample name and the MOTU name in the fasta header is causing me a headache when I read the fasta into R. The space gets changed to “” when I read the fasta into R, and then a simple action of splitting the header into two columns, sample and motu name, becomes tricky due to \ representing literal in R. In my case, if not more generally, being able to set the delimiter between the fasta header could make this output more usable when read into other programs for further analysis.

Best,
Katie Erickson
Post-Bac Researcher at Harvey Mudd College

pschloss · April 2, 2020, 11:44am

Hi Katie,

Thanks for your note - our understanding is that the FASTA “standard” allows for the inclusion of further description in the header row after a space. I’d suggest that when you read in the FASTA file in R using scan that you use sep="\n" to separate by lines and then work to parse the header row using something like…

str_replace(header, pattern="(>\\S*) .*", replacement="\\1")

Pat

system · April 12, 2020, 11:49am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
parsing fasta files Feature requests	0	3288	March 4, 2010
fastq.info fasta file truncates headers Commands in mothur	1	1107	September 6, 2016
Get.oturep (changing fasta file header) Integrating mothur with other programs	2	1518	September 19, 2017
colons (:) in sequence header mothur bugs	2	3047	June 27, 2013
make.file set deliminator Feature requests	2	1463	April 6, 2017

Choose deliminator in bin.seq fasta header

Related topics