Hello!, after I finished my mothur analysis (publication is coming) I realized that the sequences that I used are not as good as I’d like. Let me tell you why.
Before use the “align.seqs” command I have 90000 sequences aprox. and when I do the alignment each sequence shows it’s best hit for each one, but it doesn’t discriminate between good and bad alignments.
There’s a lot of sequences with a similarity between query and hit under 90% and I’d like to use only sequences with al least a similarity around 97% (aprox 17000 of my sequences). I’ve tried perl scripts to extract only what I need of the fasta file that I use in the alignment and reapeat it, but if I continue with the analysis appears a lot of errors related with inconsistence between groups, names and fasta files.
This is an example.
mothur > pre.cluster(fasta=sff.unique.good.filter.unique.fasta, name=sff.unique.good.filter.names, group=sff.good.groups, diffs=2)
Using 1 processors.
[ERROR]: Your name file contains 73043 valid sequences, and your groupfile contains 199832, please correct.
/******************************************/
Running command: unique.seqs(fasta=sff.unique.good.filter.unique.precluster.fasta, name=sff.unique.good.filter.unique.precluster.names)
[ERROR]: sff.unique.good.filter.unique.precluster.fasta is blank, aborting.
Using sff.unique.good.filter.unique.fasta as input file for the fasta parameter.
[ERROR]: sff.unique.good.filter.unique.precluster.names is blank, aborting.
/******************************************/
My question is. Is there a command that say to mothur “please remove all the sequences with alignments under XX% of my fasta, groups and names file”?
I’d be really glad if somebody could help me.
Oscar.