Hi~
I have met a problem when clustering OTUs with divergence 0.03, 0.05 and 0.10, from ~10k unique V6 sequences.
I set cutoff=0.20 in dist.seqs(), and used average method in cluster().
But the .rabund .sabund .list don’t give the OTUs below “unique”.
Is there a bug in mothur or the parameters I set did not suit?
Could anyone give me some suggustion?
Thank you.
             
            
              
            
           
          
            
            
              ps: When I change the cluster method from “average” to 'furthest", the cluster() function worked well.
             
            
              
            
           
          
            
            
              This typically happens when the sequences are poorly aligned or when a number of sequences do not overlap with the others.  This causes distances of essentially infinity between sequences and will not allow sequences to merge together into OTUs by the average neighbor algorithm, but they can be merged using the furthest and nearest neighbor algorithms.  Are you using screen.seqs and filter.seqs to make sure the sequences overlap in the same alignment space?
Pat
             
            
              
            
           
          
            
            
              
 pschloss:
 
This typically happens when the sequences are poorly aligned or when a number of sequences do not overlap with the others.  This causes distances of essentially infinity between sequences and will not allow sequences to merge together into OTUs by the average neighbor algorithm, but they can be merged using the furthest and nearest neighbor algorithms.  Are you using screen.seqs and filter.seqs to make sure the sequences overlap in the same alignment space?
Pat
 
 
Hi, Prof. Patrick D. Schloss. Thank you for answering my question.
summary() after using align.seqs()
total # of seqs   100355358
summary() after using screen.seqs() start=31189 end=33183
total # of seqs:        100215306
summary() after using filter.seq()
total # of seqs:   100215306
             
            
              
            
           
          
            
            
              Can you post the actual commands you used and their output?  Some of this doesn’t make sense - the maximum start after screen.seqs should be 31189 and if you use filter.seqs(trump=., vertical=T), everything should more or less start and end at the same coordinates.
Pat
             
            
              
            
           
          
            
            
              
 pschloss:
 
Can you post the actual commands you used and their output?  Some of this doesn’t make sense - the maximum start after screen.seqs should be 31189 and if you use filter.seqs(trump=., vertical=T), everything should more or less start and end at the same coordinates.
Pat
 
 
Hi, I post my commands used here. I am sorry for mistaking the results of the command screen.seqs() last time.
Start  End  NBases  Ambigs  Polymer
Start   End     NBases  Ambigs  Polymer
Start   End     NBases  Ambigs  Polymer
Start   End     NBases  Ambigs  Polymer
…
Output File Names:
 
            
              
            
           
          
            
            
              Hmm…  This looks odd - Illumina data?  Could you try using the default gap opening value in align.seqs (-5) and tell me how the rest of the output changes?  My thinking is that your use of -1 is screwing things up.  In the supplementary table 3 from the align.seqs paper, we show that -1 is the worst setting for v6 and that -5 is the desired level.
Also, are the sequences possibly a mix of sequences in both directions?
             
            
              
            
           
          
            
            
              
 pschloss:
 
Hmm…  This looks odd - Illumina data?  Could you try using the default gap opening value in align.seqs (-5) and tell me how the rest of the output changes?  My thinking is that your use of -1 is screwing things up.  In the supplementary table 3 from the align.seqs paper, we show that -1 is the worst setting for v6 and that -5 is the desired level.
Also, are the sequences possibly a mix of sequences in both directions?
 
 
Yes, it is Illumina data. And the paired-end reads were assembled so the sequences are just in the forward direction. Does Illumina or 454 matter?
I will tried gapopen=-5 in align.seqs() and let you know the output changes.
Thank you.
             
            
              
            
           
          
            
            
              
 pschloss:
 
Hmm…  This looks odd - Illumina data?  Could you try using the default gap opening value in align.seqs (-5) and tell me how the rest of the output changes?  My thinking is that your use of -1 is screwing things up.  In the supplementary table 3 from the align.seqs paper, we show that -1 is the worst setting for v6 and that -5 is the desired level.
Also, are the sequences possibly a mix of sequences in both directions?
 
 
The commands and results using gapopen=-5. The OTU average clustering did not work either.
Start   End     NBases  Ambigs  Polymer
Start   End     NBases  Ambigs  Polymer
Start   End     NBases  Ambigs  Polymer
Start   End     NBases  Ambigs  Polymer
total # of seqs:        100178513
[baijiang@node10 Run4]$ cp MetatongueRun4.unique.good.filter.unique.fasta MetatongueRun4.final.fasta
…
Output File Names:
 
            
              
            
           
          
            
            
              can you email the initial (unaligned) fasta file to mothur.bugs@gmail.com ?
             
            
              
            
           
          
            
            
              
Hi, Prof. P.Schloss,
Our data is under analysis. And I will send it you if authorized by my supervisor’s after the data analysis.
I wanna to skirt the clustering problem by using furthest method.
Thanks for your help and developing Mothur. It is a very beneficial tool.
Best.
Bai