Mismatch .groups file after using chop.seqs

travisdawson123 · October 10, 2012, 6:18pm

Hi,

I’m compiling two sets of data from different sequencing run, which used different sequencing kits (iontorrent 100 vs 200bp kit). As such, once I merged the data I had a problem of the 200bp sequenced samples aligning better to themselves with a significant overhang that was not observed in the 100bp sequenced samples. To deal with this problem, I used chop.seqs on my merged data to cut off the overhanging bases found in the 200bp kit. In doing this, I was left with a nice looking alignment, but unfortunately there was no .groups option in the chop.seqs command, so the few sequences that were completely cut from my .fasta file (all samples that were <110bp long) were not cut from my .groups file, leaving me with an inconsistent number of sequences, and unable to conduct pre.cluster on my data. I’ve tried using screen.seqs on my fasta and groups file to get the number of sequences to match, but restricting read length to 110bp (via minlength=110) doesn’t seem to have an effect on my .groups file.

I was wondering if anyone knew how I could either create a new .groups file that matches my .fasta file (taking into account that there are approximately 35 different samples with different names in this dataset), or maybe a way to trim down my .groups file so that the number of sequences align with my .fasta and .names files so that I can run pre.cluster and then dist.seqs and cluster analyses. I’ve already tried using ‘make.groups’ on my .fasta file but I don’t know how to keep my group names consistent with the names found in my .names file.

Thanks, I have my workflow available if it will help.

-Travis

travisdawson123 · October 12, 2012, 2:13pm

Nevermind I resolved the issue. I had to chop the sequences after screen.seqs so that no sequences would be removed from my .fasta file during chop.seqs, keeping the number of sequences consistent between my .fasta and .groups files.

Topic		Replies	Views
Name file and group file sequence discrepancy Commands in mothur	5	3846	May 29, 2013
chop.seqs Commands in mothur	3	2575	January 22, 2013
groupfile has more valid sequences in it than my namefile mothur bugs	7	11369	October 24, 2012
more sequences in groupfile than in name file mothur bugs	4	4135	July 13, 2012
trim.seqs with existing groups file Feature requests	7	10227	December 20, 2011

Mismatch .groups file after using chop.seqs

Related topics