Manipulate Sequence Identifiers

Sneulinger · January 31, 2013, 9:41am

Dear mothur team,

with some data sets, it is necessary for me to manipulate sequence identifiers in the FASTA files (e.g., change sequence ID “HY5WXKH02C2LQY” to “group1HY5WXKH02C2LQY”). This is particularly important when combining data from different sequencing runs: As the sequencer “recycles” IDs, they are not unique anymore in a data set combined from different runs unless you rename them.

I would thus find it very convenient if mothur provided a command for sequence ID manipulation, such as prefixing the group name to it. I have already combed through the list of mothur’s commands, but could not find anything like it.

Thank you very much in advance for considering my request!

Kind regards, Sven

pschloss · January 31, 2013, 11:45am

they are not unique anymore in a data set combined from different runs unless you rename them

Really? HY5WXKH is a time stamp, 02 is the location on the plate and then C2LQY is the x/y coordinate of the plate (see SFF Read names - SEQanswers). Pretty sure that every sequence name is unique.

Pat

Sneulinger · January 31, 2013, 12:11pm

Hi Pat,

Thank you for your quick reply!

Within the thread you refer to, there is also the following citation:

This identifier is guaranteed to be unique only within the context of a single sequencing Run, and may or may not be unique across specific sets of Runs.

This is exactly my problem. I would like to combine sequences across sets of runs, and it frequently happens that I end up with duplicate sequence IDs when doing that. I solve the problem by prefixing some alphanumeric code to them that is specific for the respective run using a script of my own, but would find it nicer to have this functionality directly in mothur.

There are also other situations where custom sequence ID prefixes might be useful. For the use of down-stream applications other than mothur which cannot use mothur’s group file, group-specific sequence ID prefixes would be really great.

Best, Sven

coralsnot · May 27, 2013, 3:31am

Hello -

I think I may be having a similar problem. I’m trying analyse a set of sequences that came from 2 separate sequencing runs with all unique barcodes. We have created a groups file that runs through to the screen.seqs step successfully, but after that I get errors every time I try to add the groups file â€“ which means that there is no indication of â€˜sampleâ€™ at the end of the analysis. Thus, I output an OTU table that has no samples/groups, it is all lumped together. Have you come up with a solution for this?

Thank you!
Kathy

westcott · May 28, 2013, 2:37pm

Could you post the error you are getting after screen.seqs, and the commands you are running up to that point?

westcott · May 29, 2013, 3:45pm

Thanks for the feature request. The rename.seqs command will be part of 1.32.0.

Topic		Replies	Views
rename.seqs - groups file mothur bugs	2	2780	June 15, 2015
Modifying sequence IDs with Group Name Commands in mothur	3	3261	April 27, 2012
Error unique.seqs Feature requests	10	10855	January 4, 2016
Unique.seqs command is not giving required results	4	28	March 13, 2025
Merge fasta and group files Commands in mothur	6	6040	August 7, 2013

Manipulate Sequence Identifiers

Related topics