Hi,
I have found a problem with some samples from the last pyrosequencing run (all samples were ampliefied with same set of primers). I have followed the 454 sop to get the fasta, names and groups files per sample (allfiles=T in the trim.seqs command). Then I merged fasta, names and groups files according to different experiments (bagazo and purines2), and when I started to process the files (unique.seqs, then align.seqs with silva.bacteria.fasta) I found that after align.seqs samples from purines2 experiment give some strange results as compared with samples from bagazo experiment. I paste below the summary.seqs for the starting fasta, names and groups files and align.seqs for the samples of two experiments. In both experiments sequences are of at least 200 bp but after align.seqs in purines2 I found that too many sequences will be lost because in the aligment are below 200bp. After looking through the forum I could not find a response and I do not know the reason for these results.
I have uploaded to dropbox the files ( https://www.dropbox.com/sh/iz6gz1loj4z4hsf/AACU_AHqyp1ztoOY7TkTKkI1a?dl=0)
summary.seqs(fasta=bagazo.fasta, name=bagazo.names, processors=8)
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 2 1
2.5%-tile: 1 201 201 0 4 1671
25%-tile: 1 212 212 0 4 16704
Median: 1 218 218 0 4 33407
75%-tile: 1 221 221 0 5 50110
97.5%-tile: 1 238 238 0 6 65142
Maximum: 1 459 459 0 8 66812
Mean: 1 218.617 218.617 0 4.30652
of unique seqs: 17854
total # of seqs: 66812
unique.seqs(fasta=bagazo.fasta, name=bagazo.names)
align.seqs(fasta=bagazo.unique.fasta, reference=silva.bacteria.fasta, processors=8)
summary.seqs(fasta=bagazo.unique.align, name=bagazo.unique.names)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1076 4 0 1 1671
25%-tile: 5456 13125 211 0 4 16704
Median: 6091 13125 218 0 4 33407
75%-tile: 6099 13125 220 0 5 50110
97.5%-tile: 43096 43116 237 0 6 65142
Maximum: 43116 43116 326 0 8 66812
Mean: 7015.83 13799.9 205.111 0 4.15398
of unique seqs: 13514
total # of seqs: 66812
Output File Names:
bagazo.unique.summary
summary.seqs(fasta=purines2.fasta, name=purines2.names, processors=8)
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 2 1
2.5%-tile: 1 201 201 0 3 2492
25%-tile: 1 210 210 0 4 24912
Median: 1 218 218 0 4 49823
75%-tile: 1 227 227 0 5 74734
97.5%-tile: 1 246 246 0 6 97154
Maximum: 1 456 456 0 8 99645
Mean: 1 219.047 219.047 0 4.2945
of unique seqs: 39171
total # of seqs: 99645
Output File Names:
purines2.summary
unique.seqs(fasta=purines2.fasta, name=purines2.names) align.seqs(fasta=purines2.unique.fasta, reference=silva.bacteria.fasta, processors=8) summary.seqs(fasta=purines2.unique.align, name=purines2.unique.names)
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: -1 -1 0 0 1 2492
25%-tile: 1044 1106 4 0 1 24912
Median: 6091 13125 14 0 3 49823
75%-tile: 43096 43116 213 0 4 74734
97.5%-tile: 43115 43116 235 0 6 97154
Maximum: 43116 43116 332 0 8 99645
Mean: 17652.5 20364.8 86.2844 0 2.87403
of unique seqs: 33653
total # of seqs: 99645
Output File Names:
purines2.unique.summary