It would be helpful in the interest of minimising filesizes (and time spent reading/writing files) to be able to rename the sequences from the input sequences which, particularly for Illumina sequence data, have rather long names. This would particularly apply to the .dist files. Ideally this would be done in a reversible manner - i.e. the renaming function would keep a reference file along the lines of the .names files which would allow the original names to be restored if needed for analysis elsewhere. It would seem reasonable to name the sequences either using a sequential number or other sequential identifier that makes more use of the permissible ASCII character set for FASTA sequence names. At the moment, for each line of my dist files 88/95 characters are taken up with sequence IDs.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Rename.seqs command | 0 | 6240 | September 25, 2013 | |
Renaming sequences | 3 | 117 | March 18, 2024 | |
rename.seqs - groups file | 2 | 2773 | June 15, 2015 | |
Sequence name modification | 0 | 5106 | May 21, 2013 | |
Renaming the sequences in fasta file for asv based analysis
|
5 | 460 | October 28, 2022 |