problem with pairwise.seqs and cd-HIT

Dear mothur user,

I have truble analysing my fungal ITS sequences. They are bi-directional in the way that half were sequenced from the forward and half from the reverse side (FLX +). I am running pairwise.seqs but it takes ages and I don´t know if there will be again problems with the bi-directional sequences. Actually if this would be the best way to analyse the sequences, I am happy to wait but I am not sure and worried that I waste my time waiting for the output. :wink: I also tried cd HIT and it seemed to deal well with my sequences but I can not manage to convert the files (clustr) that I can use them in mothur for make.shared etc.

It would be so great if somebody could help me with that!

Thank you very much!

And another point: I also had 16S sequences and used after aligning the overlap between forward and reverse sequences to work with. I am not sure if this was the smartest way but I end up with about 300 bp long sequences (the overlap) what should be okay, right? I know aligning is important for 16S but I am wondering if classifying the full length sequences wouldn´t be better than using this overlap procedure?

Again: thank you!

  1. For 454 you really don’t want bidirectional reads since they will not fully overlap and you cannot connect two separate reads.
  2. It is unlikely that we will ever support cd-hit since we’ve shown that it is actually quite bad at forming otus.
  3. I’m still unclear how one can do distance-based otus with ITS regions since they are non-homolgous between organisms. I would think you would treat each unique ITS sequence as a separate OTU and move on.

Thank you for answering!

It surprises me a bit as in the forum the pairwise.seqs command has been suggested for ITS as an alternative for aligning and running dist.seqs.

I know the problem with the bi-directional reads but don´t you think it is okay to use the overlap for the 16S as my sequences are about 900 bp in the beginning? Doesn´t give this a real overlap? I end up with a mean of about 330 bp long reads after I used screen and I removed all other sequences that didn´t overlap. Actually I asked this quite a long time ago and you gave me the advice of using the overlap. But I should have mentioned here that my sequences are very long.

Please tell me if I am missing something really important I am getting confused. You are the expert. :wink:

Thank you so much for trying to help!

Sorry there’s a lot of posts here and it’s impossible for me to keep track of individuals. If your reads don’t cover the same alignment coordinates then it is hard to know when/where to trim them. If you were to use pairwise.seqs, then you’d want to countends=F. I would probably suggest using pretty small cutoff since every ITS likely represents a novel taxa.

Oh, no of course you can not keep track of individuals! It is awesome that you answer questions and I thank you a lot for your advice! :smiley: