Problems with screen.seqs and filter.seqs commands



I’m very new at working with MOTHUR and 454 sequences and I have been experiencing some problems.

I have my pyrosequencing results in .fna format, so I began working with the tutorial at the “using trim.seqs” step. I have been working with fungal ITS sequences. Everything was going great until the alignment step. There, I aligned my sequences with the UNITE database, and when I look at the results with the summary.seqs command I realize that most of my sequences do not overlap in the same region, so I tried to use the screen.seqs using the option of minlength=400.

After that, I tried to run filter.seqs and I got the message " Sequences are not all the same length, please correct."

I do not know how to fix that problem, because my sequences are quite long but are aligned badly, and so I cannot run later steps to get one .fasta file that allows me to identify and remove chimeras and contaminants and start my analysis.



Miguel Ángel

Are you sure that you’re using the aligned sequences as input to screen.seqs?

For aligning my sequences I am using UNITE database, withe the following command:

mothur > align.seqs(fasta=hongos_pia0.trim.unique.fasta, reference=unite.fungal.fasta, processors=1)

After doing that I get these three files:


With the first, I get this:

mothur > summary.seqs(fasta=hongos_pia0.trim.unique.align, name=hongos_pia0.trim.names)

Start End NBases Ambigs Polymer NumSeqs Minimum: 1 3 3 0 1 1 2.5%-tile: 1 10 9 0 2 35 25%-tile: 1 444 316 0 4 348 Median: 1 462 447 0 5 695 75%-tile: 810 889 470 0 6 1042 97.5%-tile: 1361 1694 498 0 6 1355 Maximum: 1666 2197 534 0 8 1389 Mean: 312.05 695.058 375.318 0 4.99136 # of unique seqs: 1276 total # of seqs: 1389
Then, I do this:

mothur > screen.seqs(fasta=hongos_pia0.trim.unique.align, name=hongos_pia0.trim.names, minlength=400)

And get these:

Output File Names:

And finally, when I do:
mothur > filter.seqs(fasta=hongos_pia0.trim.unique.good.align, vertical=T, trump=., processors=1)

I get the error message:

Creating Filter…
Sequences are not all the same length, please correct.



I was wondering if it would be possible to optimize the screen.seqs command both for start and end with a criteria=95 instead of having minlength as criteria for getting the sequences.

This way I would get a start and an end shared for the 95% of the sequences, no matter the length (by now).



I’m not familiar with the unite database - is it actually aligned?


I’m not sure about that, I’ll check it.

Do you recommend me other database for fungal ITS sequences?



I am receiving the sequences are not all the same length error when trying to filter the alignment after screening sequences. I have aligned my sequences with the silva.bacteria.fasta reference set. I haven’t had any problems with running this analysis previously. Do you have any suggestions on how to correct for this error?

I don’t think UNITE is aligned.