Which to me there are very few overlaps among the sequences, so I tried the following command with the screen.seqs:
screen.seqs(fasta=16S_001_R1_trim.unique.align, name=16S-001_S1_L001_R1_001.trim.names, optimize=start-end, criteria=60)
Which looks to me, the 3404 sequences are still not over lapping. Do you have any recommendations on what parameters I should use for screen.seqs? or any other commands I should try?
Did you ever flip your sequences? Judging from your file name (R1), I suspect these are reverse reads. If so, you likely need to do the reverse compliment of them adn then align them or at least run align.seqs with the flip=T option. Your tables don’t look like they’ve aligned right.
Thanks for the reply. The samples were prepared such that some sequences are flipped and some are not…confusing? I know, it took me awhile to figure it out too, as I didn’t prepare the samples. The alignment above was ran with flip=t
I quickly tried reverse complementing all the sequences and aligned them again with option flip=t:
it should be the v3-v4 region ~500bp, but the illumina sequences have a maximum length of 151bp. The technology used was Nextera XT, where the amplicons were cut and amplified randomly. hmm…I guess there’s the problem. In screen.seqs, i used the flag “optimize=start-end”, attempted to find the region with the most overlap. The result didn’t look very good as we saw.
Huh, yeah, that’s an interesting way to do it. Why? Most people would instead amplify a specific window within the gene and sequence that. I’m afraid what you have will be a mess to analyze. About all I can suggest is to run the classifier on everythign and see what you’ve got. Unfortunately, even that’s frought with problems since the different regions of the gene evolve at different rates.