I’m very new at working with MOTHUR and 454 sequences and I have been experiencing some problems.
I have my pyrosequencing results in .fna format, so I began working with the tutorial at the “using trim.seqs” step. I have been working with fungal ITS sequences. Everything was going great until the alignment step. There, I aligned my sequences with the UNITE database, and when I look at the results with the summary.seqs command I realize that most of my sequences do not overlap in the same region, so I tried to use the screen.seqs using the option of minlength=400.
After that, I tried to run filter.seqs and I got the message " Sequences are not all the same length, please correct."
I do not know how to fix that problem, because my sequences are quite long but are aligned badly, and so I cannot run later steps to get one .fasta file that allows me to identify and remove chimeras and contaminants and start my analysis.
Thanks a lot.
Are you sure that you’re using the aligned sequences as input to screen.seqs?
For aligning my sequences I am using UNITE database, withe the following command:
mothur > align.seqs(fasta=hongos_pia0.trim.unique.fasta, reference=unite.fungal.fasta, processors=1)
After doing that I get these three files:
With the first, I get this:
mothur > summary.seqs(fasta=hongos_pia0.trim.unique.align, name=hongos_pia0.trim.names)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 3 3 0 1 1
2.5%-tile: 1 10 9 0 2 35
25%-tile: 1 444 316 0 4 348
Median: 1 462 447 0 5 695
75%-tile: 810 889 470 0 6 1042
97.5%-tile: 1361 1694 498 0 6 1355
Maximum: 1666 2197 534 0 8 1389
Mean: 312.05 695.058 375.318 0 4.99136
# of unique seqs: 1276
total # of seqs: 1389
Then, I do this:
mothur > screen.seqs(fasta=hongos_pia0.trim.unique.align, name=hongos_pia0.trim.names, minlength=400)
And get these:
Output File Names:
And finally, when I do:
mothur > filter.seqs(fasta=hongos_pia0.trim.unique.good.align, vertical=T, trump=., processors=1)
I get the error message:
Sequences are not all the same length, please correct.
Thanks for your time
I was wondering if it would be possible to optimize the screen.seqs command both for start and end with a criteria=95 instead of having minlength as criteria for getting the sequences.
This way I would get a start and an end shared for the 95% of the sequences, no matter the length (by now).
I’m not familiar with the unite database - is it actually aligned?
I’m not sure about that, I’ll check it.
Do you recommend me other database for fungal ITS sequences?
I am receiving the sequences are not all the same length error when trying to filter the alignment after screening sequences. I have aligned my sequences with the silva.bacteria.fasta reference set. I haven’t had any problems with running this analysis previously. Do you have any suggestions on how to correct for this error?
I don’t think UNITE is aligned.