get.oturep Output Formatting Problem

amcomeau · December 7, 2010, 10:31pm

Hello Pat (and friends),
I’ve recently upgraded our server to Mothur version 1.13 and since then, I’ve run into trouble with the get.oturep command. Previously, when I ran the command, the following type of output was generated:

GFY4EVZ03CZ90O|364|200
ACGT…

Notice the pipe characters (|) separating the seq name from the OTU# (#364) from the number of seqs contained in the OTU (200 seqs)…

Now, things are different:

GQN3EP204EOOYG 1714|1
ACGT…

Notice the tab (blank space) now separating the seq name from the OTU#…

Was this change done intentionally for some other formatting purpose I’ve not yet found? The problem is that while the first is a perfectly valid FASTA file format that I can open and manipulate in BioEdit (or others), the second format containing the tab is in no way compatible with the FASTA format (and unopenable in BioEdit, even with the use of the conversion filter). What was the reason for introducing a non-FASTA-format valid character in a command supposed to generate a FASTA output file?

I wouldn’t care so much if I was able to find a quick solution for converting the file into FASTA format - ie: removing the tab is not an easy thing to do and cannot simply be “Found and Replaced” in text editors…concatenating in Excel is also not a valid option for thousands of sequences that are interleaved (every other line has a title). If someone has an easy fix (other then retrograding to a previous Mothur version for that command), it would be welcome.

Or perhaps a bug fix in the next version (I can’t imagine it would involve much coding changes)

pschloss · December 8, 2010, 8:37pm

It’s not a bug, it’s a feature Yes it was done intentionally since the way we read in the sequence names it looks for a whitespace. If the name included the pipe instead of the tab, mothur would think everything in that chunk of text was the sequence name. This would screw up other things like mapping to the names and groups files.

It actually still is in fasta format. I’m surprised that your text editor can’t do it - MSWord can even do it. In TextWrangler if you do a find and click the grep option and search for “\t” that will find all the tabs. If you do find -> \t.* and replace -> [nothing] it will remove everything after the sequence name. In MSWord do Replace -> More Options [the little triangle thing] -> Use Wild Cards -> find ^t

Sorry for changing the formatting! We try to keep this to a minimum, but sometimes this is necessary.

Topic		Replies	Views
get.oturep: no fasta file? Commands in mothur	4	4274	February 17, 2012
question about get.oturep Commands in mothur	1	1442	June 1, 2015
get.oturep sequence names Commands in mothur	2	687	October 10, 2017
Error with bin.seqs command mothur bugs	9	7248	January 15, 2015
get.oturep and renaming of accessions Feature requests	3	6184	July 29, 2010

get.oturep Output Formatting Problem

Related topics