filenames, not what they should be? and a group problem

Hi All
A statement and related questions relating to my use of the 454 SOP with v 1.37

Firstly, when I run remove.seqs the names file I get ends in precluster.pick.names. Two steps later, the remove.lineage name output has an identical file name. I gather from the online SOP info (and it seems very logical) that the latter should end in precluster.pick.pick.names, but it is missing a ‘pick’. I don’t know why this is and yes, I’ve tried it a few times. Anyone else find this?

Regarding groups, I have a specific question regarding the final.groups file mentioned in the 454 SOP. I realise in that instance that final.groups was ultimately derived (renamed) from the pick.pick.pick.groups file, but my question is at what step was that file generated please? I haven’t used a mock community in my sequencing so have obviously omitted that step of the 454 SOP. The last groups file generated therefore was a result of the remove.seqs command, ending with good.pick.groups. When I use this however in the make.shared command, mothur doesn’t like it; it appears to be missing a lot of sequences compared to the list file.

Have I gone wrong somewhere? Thanks in advance for any suggestions

Firstly, when I run remove.seqs the names file I get ends in precluster.pick.names. Two steps later, the remove.lineage name output has an identical file name. I gather from the online SOP info (and it seems very logical) that the latter should end in precluster.pick.pick.names, but it is missing a ‘pick’. I don’t know why this is and yes, I’ve tried it a few times. Anyone else find this?

Can you post the commands you are entering with the files being generated?

Regarding groups, I have a specific question regarding the final.groups file mentioned in the 454 SOP. I realise in that instance that final.groups was ultimately derived (renamed) from the pick.pick.pick.groups file, but my question is at what step was that file generated please? I haven’t used a mock community in my sequencing so have obviously omitted that step of the 454 SOP. The last groups file generated therefore was a result of the remove.seqs command, ending with good.pick.groups. When I use this however in the make.shared command, mothur doesn’t like it; it appears to be missing a lot of sequences compared to the list file.

In your case, it probably would have been the file generated after remove.lineage.

Pat

Hi Pat, thanks for your reply. Here are the commands and the resulting files, beginning from unique.seqs. You’ll note that I don’t get any new group file generated with remove.lineage either, hence my confusion.

unique.seqs(fasta=Kelly3832B.shhh.trim.unique.good.filter.fasta, name=Kelly3832B.shhh.trim.unique.good.names)
Outputs: Kelly3832B.shhh.trim.unique.good.filter.names
Kelly3832B.shhh.trim.unique.good.filter.unique.fasta

precluster(fasta=Kelly3832B.shhh.trim.unique.good.filter.unique.fasta, name=Kelly3832B.shhh.trim.unique.good.filter.names, group=Kelly3832B.shhh.good.groups, diffs=2)
Outputs: Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.unique.names
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.unique.fasta
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.names
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.fasta
& for each sample Kelly3832B.shhh.trim.unique.good.filter.precluster.sample.map

chimera.uchime(fasta=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.fasta, name=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.names, group=Kelly3832B.shhh.good.groups)
Outputs: Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.denovo.uchime.chimera
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.denovo.uchime.accnos

remove.seqs(accnos=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.denovo.uchime.accnos, fasta=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.fasta, name=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.names, group=Kelly3832B.shhh.good.groups, dups=T)
Outputs: Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.names
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.fasta
Kelly3832B.shhh.good.pick.groups

classify.seqs(fasta=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.fasta, name=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.names, group=Kelly3832B.shhh.good.pick.groups, template=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)
Outputs: Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pds.wang.tax.summary

remove.lineage(fasta=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.fasta, name=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.names, taxonomy=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Mitochondria-Chloroplast-Archaea-Eukaryota-unknown)
Outputs: Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.names
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pick.fasta

dist.seqs(fasta=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.15)
Output: Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pick.dist

cluster(column=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pick.dist, name=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.names)
[Note: I realise the name file should end in pick.pick.names, however as the previous names file only had one pick, only used one here as pick.pick.names does not exist]
Outputs: Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pick.an.sabund
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pick.an.rabund
Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pick.an.list

Can’t progress further as groups files generated to date not appropriate to use with the list file…

>remove.lineage(fasta=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.fasta, name=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.names, taxonomy=Kelly3832B.shhh.trim.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Mitochondria-Chloroplast-Archaea-Eukaryota-unknown)

You didn’t include your group file in remove.lineage.


Also, you're running chimera.uchime the way we have it described for 454 data. We've shifted our thinking and think it's probably wisest, to do what we do with the MiSeq data:
mothur > chimera.uchime(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
mothur > remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.accnos)

Fab, thanks Pat, don’t know how I missed the ‘groups’ in remove.lineage. That should work now. As for the chimera.uchime, I gather you thus mean running with a fasta and a count file instead of the 454 way, generating the count file by running. count.seq using the current name and group file? Is there a particular reason why this would be better? Out of curiosity I’ll run both later and see what difference it makes.

Thanks again for your keen eye.

Best,
Laura

The groups/name and count table approaches should give the same/similar results. Using a count table should use less memory and be easier to keep track of thigns.

Pat