discrepancies between name files and group files

I’ve had problems with discrepant data in my name and group files. First, when I get to the pre-cluster command, I’m getting a discrepancy between the two files file that’s shutting me down. It seems I’m missing something. My final fasta and names files prior to pre-cluster are:

218.a2.a3.trim.unique.unique.good.filter.names
218.a2.a3.trim.unique.unique.good.filter.unique.fasta

Your groupfile at that point is:
group=GQY1XT001.shhh.good.groups

Which implies that you’ve somehow trimmed your group file as you proceeded. How? My group file at that point is the one I created at the very beginning, using the merge.files and make.groups commands.

I DID figure out a work-around – to wait to make my groups and merge my reads just prior to preclustering. Problem is then I have to redo my alignment, filtering and screening. I’d imagine the net effect is just as though I created three separate alignments in Sequencher and then brought them together.

Then, a similar problem pops up when I generate my phylotypes. I started by using the taxonomy I generated when I first classified, and the name file equivalent to your final.names. I get “[ERROR]: … is not in your namefile, please correct.”

My workaround was to reclassify my final fasta and name files and the last group file. It seems to have worked, but in both cases, I think there must be a more reasonable, less Byzantine way to proceed.

Finally, a smaller (?) problem: when I classify my philotypes, I input: classify.otu(list=218.a2.a3.final.an.list, name=218.a2.a3.final.names, taxonomy=218.a2.a3.final.pds.wang.taxonomy, label=1)

For some reason, I get back:
Your file does not include the label 1. I will use 0.07.

0.07 729

Output File Names:
218.a2.a3.final.an.0.07.cons.taxonomy
218.a2.a3.final.an.0.07.cons.tax.summary

Thanks,
Mike


Thanks, Mike

Which implies that you’ve somehow trimmed your group file as you proceeded. How?

The group file went through the shhh.flows command (shhh) and screen.seqs command (good).

Then, a similar problem pops up when I generate my phylotypes. I started by using the taxonomy I generated when I first classified, and the name file equivalent to your final.names. I get “[ERROR]: … is not in your namefile, please correct.”


Here's a link to mothur's name file format, http://www.mothur.org/wiki/Name_file. The first column is the unique name. Unique names are the only names that should be in fasta and taxonomy files. The second column is the list of the sequences the unique sequence represents. This list should start with the unique sequence itself.

classify.otu(list=218.a2.a3.final.an.list, name=218.a2.a3.final.names, taxonomy=218.a2.a3.final.pds.wang.taxonomy, label=1)

The classify.otu command finds the consensus taxonomy for the OTUs in your list file. You do not have label 1 because this list file was created by one of the cluster commands using average neighbor, hence the “an” tag. To cluster using taxonomy, you want the phylotype command, http://www.mothur.org/wiki/Phylotype.