discrepancies between name files and group files

lycophidion · February 17, 2014, 6:33pm

I’ve had problems with discrepant data in my name and group files. First, when I get to the pre-cluster command, I’m getting a discrepancy between the two files file that’s shutting me down. It seems I’m missing something. My final fasta and names files prior to pre-cluster are:

218.a2.a3.trim.unique.unique.good.filter.names
218.a2.a3.trim.unique.unique.good.filter.unique.fasta

Your groupfile at that point is:
group=GQY1XT001.shhh.good.groups

Which implies that you’ve somehow trimmed your group file as you proceeded. How? My group file at that point is the one I created at the very beginning, using the merge.files and make.groups commands.

I DID figure out a work-around – to wait to make my groups and merge my reads just prior to preclustering. Problem is then I have to redo my alignment, filtering and screening. I’d imagine the net effect is just as though I created three separate alignments in Sequencher and then brought them together.

Then, a similar problem pops up when I generate my phylotypes. I started by using the taxonomy I generated when I first classified, and the name file equivalent to your final.names. I get “[ERROR]: … is not in your namefile, please correct.”

My workaround was to reclassify my final fasta and name files and the last group file. It seems to have worked, but in both cases, I think there must be a more reasonable, less Byzantine way to proceed.

Finally, a smaller (?) problem: when I classify my philotypes, I input: classify.otu(list=218.a2.a3.final.an.list, name=218.a2.a3.final.names, taxonomy=218.a2.a3.final.pds.wang.taxonomy, label=1)

For some reason, I get back:
Your file does not include the label 1. I will use 0.07.

0.07 729

Output File Names:
218.a2.a3.final.an.0.07.cons.taxonomy
218.a2.a3.final.an.0.07.cons.tax.summary

Thanks,
Mike

Thanks, Mike

westcott · February 24, 2014, 6:04pm

Which implies that you’ve somehow trimmed your group file as you proceeded. How?

The group file went through the shhh.flows command (shhh) and screen.seqs command (good).

Then, a similar problem pops up when I generate my phylotypes. I started by using the taxonomy I generated when I first classified, and the name file equivalent to your final.names. I get “[ERROR]: … is not in your namefile, please correct.”

Here's a link to mothur's name file format, http://www.mothur.org/wiki/Name_file. The first column is the unique name. Unique names are the only names that should be in fasta and taxonomy files. The second column is the list of the sequences the unique sequence represents. This list should start with the unique sequence itself.

classify.otu(list=218.a2.a3.final.an.list, name=218.a2.a3.final.names, taxonomy=218.a2.a3.final.pds.wang.taxonomy, label=1)

The classify.otu command finds the consensus taxonomy for the OTUs in your list file. You do not have label 1 because this list file was created by one of the cluster commands using average neighbor, hence the “an” tag. To cluster using taxonomy, you want the phylotype command, http://www.mothur.org/wiki/Phylotype.

Topic		Replies	Views
namefile and groupfile mismatch mothur bugs	4	4963	February 20, 2012
no equal numbers of sequences between name and group file mothur bugs	6	6893	May 5, 2012
more sequences in groupfile than in name file mothur bugs	4	4173	July 13, 2012
groupfile has more valid sequences in it than my namefile mothur bugs	7	11430	October 24, 2012
Name file and group file sequence discrepancy Commands in mothur	5	3868	May 29, 2013

discrepancies between name files and group files

Related topics