does anyone have a foolproof and easy approach to creating group files in a Win OS envrionment?
Pat has helped me twice as I used excel and exported the columns but some odd invisible characters get inserted that mess up subsequent processes and I can’t see them in any text ditor that I use or that have been suggested to me. I don’t have access to textwrangler for the Mac (which Pat tells me shows the offending characters) and I feel bad asking Pat to help with such trivial issues.
There are two ways i used on windows before switching to linux.
You can make the entries in an excel worksheet and then save it either as a taxt document or as a CSV file. This might not get rid of the unusual characters sometimes because of the way it saves the columns into the text format. In case i see these characters, i would simply copy that character and do find/replace with none.
Once you have the entries in the excel sheet, you could copy and paste these column entires into a notepad document. This generally worked for me without any errors or unusual characters.
I recommend the FOSS notepad++ for editting text on Win
including the ability to view hidden characters: http://notepad-plus.sourceforge.net/
Shareware alternatives are EditPlus and UltraEdit.
I was just wondering which characters you were seeing, that’s all. I agree that NotePad++ is a good one, but if you have really hard to eliminate characters, I bet you would be pretty happy with EMBOSS’s TrimSeq (http://www.molgen.mpg.de/~beck/EMBOSS/trimseq.html).
I like to use MetaPad (http://liquidninja.com/metapad/) which has great features to strip trailing whitespace AND to strip first characters.
Also, I’ve been able to see some characters at the ends of files or titles using the old software BioEdit. (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). I’ll usually copy them to metapad to fix them.
Finally, if you want you can use GREP on your windows machine! Just download the unix-Utilities:
Between MetaPad and Grep I got everything done. Good luck!
I’m sorry it is such a pain to generate a group file. Part of the reason that we don’t have a command for this is because it seems like people would be starting from such diverse points. Perhaps it would be better to just jump in and go with something than to do nothing. Let me know what you guys think of this idea…
The user would enter a command like this: make.group(fasta=A.fasta-B.fasta-C.fasta, group=A-B-C)
User provides separate fasta files for each group and each fasta file contains those sequence in the group
The labels (A, B, C) would be the labels for the 3 fasta files in that order
The output would be composite.fasta representing a concatenation of the 3 fasta files and composite.groups which would be a group file.
Would this work for people? I guess I don’t know whether people would have their sequences separated by groups. Any other permutations that people can think of?
I got trimmed and paired fasta files, I grouped them and merged them. But when I run summary.seqs for the merge command output file, I got the result of total number of sequences=1, which isn’t possible. what am I missing?