Hey guys,
I am completely new in the “sequence analysis world” and I got 454 sequences with the FLX+ technique. That means very long sequences (up to 950 bp) and they begin randomly with the forward or reverse primer. I figured out how to trim the sequences and that I have to create the group file by my own because the barcodes were already removed. I aligned my sequences to the silva.bacteria reference but now I completely stuck because I can´t orientate on the tutorials cause my sequences look completely different. The summary after aligning looks like this:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1051 2 0 1 2633
25%-tile: 1044 25294 640 0 5 26326
Median: 1044 27654 731 0 5 52652
75%-tile: 3161 27654 822 0 6 78977
97.5%-tile: 7930 28467 874 2 7 102670
Maximum: 43116 43117 945 7 8 105302
Mean: 2824.43 25172.2 694.752 0.189332 5.17678
of unique seqs: 99596
total # of seqs: 105302
The high number of unique sequences seems to be caused by the high variance in sequence length. What should I do now? When I try to bring the sequences to a similar start and end position I loose about 65% of my sequences. Is the reason for this variance also because the sequencing started from the reverse or the forward primer?
I have no idea what to do and I would be so happy about any help!
I wish I would have known before that this form of sequencing causes so much truble…
Thank you so much!