creating group files

does anyone have a foolproof and easy approach to creating group files in a Win OS envrionment?

Pat has helped me twice as I used excel and exported the columns but some odd invisible characters get inserted that mess up subsequent processes and I can’t see them in any text ditor that I use or that have been suggested to me. I don’t have access to textwrangler for the Mac (which Pat tells me shows the offending characters) and I feel bad asking Pat to help with such trivial issues.

cheers

Julian

I struggled quite a bit at first with group files, and would be glad to try and help. Could you upload a sample file as an attachment ?

Pete

There are two ways i used on windows before switching to linux.

  1. You can make the entries in an excel worksheet and then save it either as a taxt document or as a CSV file. This might not get rid of the unusual characters sometimes because of the way it saves the columns into the text format. In case i see these characters, i would simply copy that character and do find/replace with none.

  2. Once you have the entries in the excel sheet, you could copy and paste these column entires into a notepad document. This generally worked for me without any errors or unusual characters.

Hope that helps.

thanks for the help (pete) but I need to sort this out at this end and Pat has helped so I know what the issue is, but not sure of the solution.

Jangidk’s suggestion was one I tried but notepad didn’t show the characters that Pat saw with textwrangler.

Julian

I recommend the FOSS notepad++ for editting text on Win
including the ability to view hidden characters:
http://notepad-plus.sourceforge.net/
Shareware alternatives are EditPlus and UltraEdit.

Robin

I was just wondering which characters you were seeing, that’s all. I agree that NotePad++ is a good one, but if you have really hard to eliminate characters, I bet you would be pretty happy with EMBOSS’s TrimSeq (http://www.molgen.mpg.de/~beck/EMBOSS/trimseq.html).

I like to use MetaPad (http://liquidninja.com/metapad/) which has great features to strip trailing whitespace AND to strip first characters.

Also, I’ve been able to see some characters at the ends of files or titles using the old software BioEdit. (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). I’ll usually copy them to metapad to fix them.

Finally, if you want you can use GREP on your windows machine! Just download the unix-Utilities:

Between MetaPad and Grep I got everything done. Good luck!

(No spaces are allowed in group names!)
Pete

I’m sorry it is such a pain to generate a group file. Part of the reason that we don’t have a command for this is because it seems like people would be starting from such diverse points. Perhaps it would be better to just jump in and go with something than to do nothing. Let me know what you guys think of this idea…

  • The user would enter a command like this: make.group(fasta=A.fasta-B.fasta-C.fasta, group=A-B-C)
  • User provides separate fasta files for each group and each fasta file contains those sequence in the group
  • The labels (A, B, C) would be the labels for the 3 fasta files in that order
  • The output would be composite.fasta representing a concatenation of the 3 fasta files and composite.groups which would be a group file.

Would this work for people? I guess I don’t know whether people would have their sequences separated by groups. Any other permutations that people can think of?

Pat

Just to update, mothur now contains the make.group Pat described above.

Hello, I tried to use the make.group command but cannot put more than 3 fasta files in it… is it normal? Regards, Caroline.

Hi Caroline, thanks for reporting this issue. It will be fixed in the next release. As a workaround, you can do the following:

make.group(fasta=fastafile1.fasta-fastafile2.fasta-fastafile3.fasta, groups=A-B-C)
make.group(fasta=fastafile4.fasta-fastafile5.fasta-fastafile6.fasta, groups=D-E-F)
merge.files(input=groupFile1-groupFile2, output=completeGroupFile)

Hi

I got trimmed and paired fasta files, I grouped them and merged them. But when I run summary.seqs for the merge command output file, I got the result of total number of sequences=1, which isn’t possible. what am I missing?

thank you,
nimrod