trump option

Hi!
I am trying to get OTUS form a Miseq data.
I am little confused with the trump parameter given in filter.seq command.
When I give “trump=.” I am only gettin 1 sequence at the end but it I omit this option I get 4430 unique sequence.
I want to know how important is it to use this parameter, can it be ignored.
These are the results from the commands:
Length of filtered alignment: 0
Number of columns removed: 50000
Length of the original alignment: 50000
Number of sequences used to construct filter: 5904

without trump:
Length of filtered alignment: 2598
Number of columns removed: 47402
Length of the original alignment: 50000
Number of sequences used to construct filter: 5904

Thanks!

The problem is likely that when you ran screen.seqs you didn’t pick start/end positions that would allow for the sequences to fully overlap. Can you post the output of summary.seqs run on the data going into screen.seqs along with how you are running screen.seqs?

Pat

Thanks for the reply.
The summary of summary.seq command is:

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: -1 -1 0 0 1 163
25%-tile: 1044 1074 3 0 1 1628
Median: 32542 37520 9 0 2 3255
75%-tile: 43103 43116 60 0 3 4882
97.5%-tile: 43116 43116 278 0 5 6346
Maximum: 43116 43116 300 0 13 6508
Mean: 24132.3 25176.9 41.6486 0 2.5169

of Seqs: 6508

And the screen.seq command is:
screen.seqs(fasta=/u2/home_u2/sulbha/Miseq/16S_v4/16S_v4_twoENd/16S_all/otu/miseq16S_all_countuniquealign, optimize=start-end, criteria=95, processors=5)

after which if I run summary command I get this:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 0 0 0 0 1 1
25%-tile: 0 0 0 0 1 1
Median: 0 0 0 0 1 1
75%-tile: 0 0 0 0 1 1
97.5%-tile: 0 0 0 0 1 1
Maximum: 0 0 0 0 1 1
Mean: 1.84467e+19 1.84467e+19 0 0 1

of Seqs: 1


and without trump parameter I get: Start End NBases Ambigs Polymer NumSeqs Minimum: 1 1 1 0 1 1 2.5%-tile: 1 6 3 0 1 111 25%-tile: 1 59 8 0 2 1108 Median: 1477 1950 19 0 3 2216 75%-tile: 2573 2598 75 0 4 3323 97.5%-tile: 2594 2598 290 0 5 4320 Maximum: 2597 2598 300 0 13 4430 Mean: 1391.98 1497.88 59.9607 0 3.07201 # of Seqs: 4430
Please let mew know where I am going wrong.

Thankls!

I think you might have used flip somewhere and you didn’t mean to (or you didn’t and you should have). When you run align.seqs can you used flip=T and then rerun summary.seqs?

pat

Hi Dr. Schloss,
Thank you for the reply.
When I ran align command with fiip=T option, I am still getting only one sequence after using filter command.
Here are my commands with results:
summary.seqs(fasta=/u2/home_u2/Miseq/otu/miseq16S_all_countuniquealign)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1051 2 0 1 163
25%-tile: 2543 5428 11 0 2 1628
Median: 26797 31189 65 0 3 3255
75%-tile: 43020 43116 86 0 4 4882
97.5%-tile: 43113 43116 295 0 6 6346
Maximum: 43116 43116 300 0 10 6508
Mean: 23555.4 25623.9 77.7806 0 3.34281

of Seqs: 6508

screen.seqs(fasta=/u2/home_u2/sulbha/Miseq/otu/miseq16S_all_countuniquealign, optimize=start-end, criteria=95, processors=5)

summary.seqs(fasta=/u2/home_u2/otu/miseq16S_all_countuniquealigngood)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1058 3 0 1 1
2.5%-tile: 1044 1062 5 0 2 150
25%-tile: 5242 6450 15 0 3 1491
Median: 26859 32435 65 0 4 2981
75%-tile: 42993 43116 105 0 4 4471
97.5%-tile: 43107 43116 295 0 6 5812
Maximum: 43112 43116 300 0 10 5960
Mean: 24017.3 26275.5 84.6394 0 3.51225

of Seqs: 5960

filter.seqs(fasta=/u2/home_u2/sulbha/Miseq/otu/miseq16S_all_countuniquealigngood,trump=.,vertical=T)


Length of filtered alignment: 0 Number of columns removed: 50000 Length of the original alignment: 50000 Number of sequences used to construct filter: 5960
Please let me know where I am going wrong and how important it is to use trump parameter, can't I omit it?

Thanks!

What does summary.seqs look like after you run make.contigs? What region are you sequencing?

Hello Dr. Schloss,
I have not made contigs, I have Miseq metagenome sequences and I blasted it against rdp database. All the reads which hits rdp were taken as 16S rRNA for this analysis.
This was suggested by you in some other post.

"One idea might be to take several bona fide 16S rRNA gene sequences and blast them against your sequence collection to identify those reads with 16S in them and then process those further. The problem with your approach is that anything will classify to something if you push it hard enough.

Pat"


Thanks!

You have sequences that don’t overlap with each other - they cannot be made into OTUs since the distance between them will be infinite.