Hi!
I am trying to get OTUS form a Miseq data.
I am little confused with the trump parameter given in filter.seq command.
When I give “trump=.” I am only gettin 1 sequence at the end but it I omit this option I get 4430 unique sequence.
I want to know how important is it to use this parameter, can it be ignored.
These are the results from the commands:
Length of filtered alignment: 0
Number of columns removed: 50000
Length of the original alignment: 50000
Number of sequences used to construct filter: 5904
without trump:
Length of filtered alignment: 2598
Number of columns removed: 47402
Length of the original alignment: 50000
Number of sequences used to construct filter: 5904
The problem is likely that when you ran screen.seqs you didn’t pick start/end positions that would allow for the sequences to fully overlap. Can you post the output of summary.seqs run on the data going into screen.seqs along with how you are running screen.seqs?
And the screen.seq command is:
screen.seqs(fasta=/u2/home_u2/sulbha/Miseq/16S_v4/16S_v4_twoENd/16S_all/otu/miseq16S_all_countuniquealign, optimize=start-end, criteria=95, processors=5)
after which if I run summary command I get this:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 0 0 0 0 1 1
25%-tile: 0 0 0 0 1 1
Median: 0 0 0 0 1 1
75%-tile: 0 0 0 0 1 1
97.5%-tile: 0 0 0 0 1 1
Maximum: 0 0 0 0 1 1
Mean: 1.84467e+19 1.84467e+19 0 0 1
of Seqs: 1
and without trump parameter I get:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1 1 0 1 1
2.5%-tile: 1 6 3 0 1 111
25%-tile: 1 59 8 0 2 1108
Median: 1477 1950 19 0 3 2216
75%-tile: 2573 2598 75 0 4 3323
97.5%-tile: 2594 2598 290 0 5 4320
Maximum: 2597 2598 300 0 13 4430
Mean: 1391.98 1497.88 59.9607 0 3.07201
# of Seqs: 4430
Please let mew know where I am going wrong.
I think you might have used flip somewhere and you didn’t mean to (or you didn’t and you should have). When you run align.seqs can you used flip=T and then rerun summary.seqs?
Hi Dr. Schloss,
Thank you for the reply.
When I ran align command with fiip=T option, I am still getting only one sequence after using filter command.
Here are my commands with results:
summary.seqs(fasta=/u2/home_u2/Miseq/otu/miseq16S_all_countuniquealign)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1051 2 0 1 163
25%-tile: 2543 5428 11 0 2 1628
Median: 26797 31189 65 0 3 3255
75%-tile: 43020 43116 86 0 4 4882
97.5%-tile: 43113 43116 295 0 6 6346
Maximum: 43116 43116 300 0 10 6508
Mean: 23555.4 25623.9 77.7806 0 3.34281
Length of filtered alignment: 0
Number of columns removed: 50000
Length of the original alignment: 50000
Number of sequences used to construct filter: 5960
Please let me know where I am going wrong and how important it is to use trump parameter, can't I omit it?
Hello Dr. Schloss,
I have not made contigs, I have Miseq metagenome sequences and I blasted it against rdp database. All the reads which hits rdp were taken as 16S rRNA for this analysis.
This was suggested by you in some other post.
"One idea might be to take several bona fide 16S rRNA gene sequences and blast them against your sequence collection to identify those reads with 16S in them and then process those further. The problem with your approach is that anything will classify to something if you push it hard enough.