start>end screen.seqs

Hey,
I received many fasta files and have to count their otu.
I searched in the internet and found mothur. I tested the program, tutorials and read in the forum. After that I started to analyze my fasta files.
Finally I got all necessary results for my first two projects.
I´m using the current mothur version and current silva database from the mothur website.
In the third file I get a problem. After the align step my start number in screen.seqs is bigger than the end number. I had the problem before and I read here that flip=t during the align step should help and it worked with another file but with this data it didn´t. Can somebody with more knowledge and experience help me?

These are the commands I used:

make.group(fasta=C:\Users\TEMP\Desktop\mothur\test1.fa, groups=test)
screen.seqs(fasta=C:\Users\TEMP\Desktop\mothur\test1.fa, group=C:\Users\TEMP\Desktop\mothur\test1.groups, maxambig=0, optimize=start-end, criteria=98)
unique.seqs(fasta=C:\Users\TEMP\Desktop\mothur\test1.good.fa)
count.seqs(name=C:\Users\TEMP\Desktop\mothur\test1.good.names, group=C:\Users\TEMP\Desktop\mothur\test1.good.groups)
align.seqs(fasta=C:\Users\TEMP\Desktop\mothur\test1.good.unique.fa, reference=C:\Users\TEMP\Desktop\mothur\silva.nr_v119.align)
I used the same way with flip=t during align.seqs.
align.seqs(fasta=C:\Users\TEMP\Desktop\mothur\test1.good.unique.fa, reference=C:\Users\TEMP\Desktop\mothur\silva.nr_v119.align, flip=t)

I receive in the summary file with both ways(flip=t/and without it):
Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 1046 1046 1 0 1 1323
25%-tile: 1046 13858 10 0 3 13227
Median: 1142 13858 411 0 5 26454
75%-tile: 1142 13858 463 0 5 39680
97.5%-tile: 43113 43116 508 0 6 51584
Maximum: 43116 43116 613 0 12 52906
Mean: 7488.44 15991.9 302.294 0 4.12135

of unique seqs: 47874

total # of seqs: 52906

screen.seqs(fasta=C:\Users\TEMP\Desktop\mothur\test1.good.unique.align, summary=C:\Users\TEMP\Desktop\mothur\test1.good.unique.summary, count=C:\Users\TEMP\Desktop\mothur\test1.good.count_table, optimize=start-end, criteria=98)
Using 1 processors.
Optimizing start to 43113.
Optimizing end to 1046.

Without an criteria value only optimize=start-end I got.
Optimizing start to 42616
Optimizing end to 6333.

I assume I can´t simply ignore it? Sorry I don´t have enough knowledge to interpret the data:-(
Has somebody a solution? What can I do differently?

Thanks in advance!

Looks like you have some messy data. I suspect the 98 is too high. Instead, I’d suggest start=1142, end=13858. This means that you’ll keep sequences that start before 1142 and end after 13858. You’ll probably lose more than 2% of sequences.

Pat

Thanks I ran it again with start end as you suggested and everything else worked fine.
Thanks for help! I try it similar way with my other fasta files.