After generating contigs and reviewing the summary, I noticed that the start and end positions of my sequences were different from what was shown in the tutorial, with start=1 and end= various (36-602). When I set the maximum sequence length to 275, I ended up losing many of my sequences. However, when I increased the maximum length to around 460, I was able to retain more sequences. Unfortunately, I lost my mock data and was unable to run the get.group command due to the absence of the mock group. I am currently facing this issue and would greatly appreciate any advice or insights that you can offer me to help resolve it.
After aligning my sequences and reviewing the summary, I noticed that the start and end sequence lengths were different from what was shown in the tutorial, with a start of 1968 and an end of 11550. I decided to use the median value for the start and end length (1-13422) and was able to obtain some results. However, I was unable to proceed with the remaining analysis steps, possibly due to losing my mock data. As a result, I was unable to run the get.group command and other subsequent steps. I would appreciate any suggestions or advice you can offer to help me resolve this issue with my data.
I have attached some screenshots of the issues I am facing and I hope you can provide me with some helpful suggestions to overcome these problems. I apologize for taking up your valuable time, but I am in need of assistance and do not know of any other ways to find a good solution. Thank you for your understanding.
Hi there - what region are you sequencing? It doesn’t look like the V4 region, which is typically right at 252 nt. The numbers you’ll see in the MiSeq SOP are for the V4 region. Looking at your first screenshot I suspect you want to use something like start=425, end=475.
Also, it looks like you aren’t using the 2x250 chemistry but the 2x300 instead. I’d encourage you to consult this blog post for why you’ll likely run into challenges using a non-V4 region and the 2x300 chemistry:
Thank you for sharing the information about the potential challenges associated with using a non-V4 region and the 2x300 chemistry for 16S rRNA gene sequencing analysis. I appreciate your insight. I’d like to learn more about these challenges and how to address them in my analysis. Could you please provide further clarification or suggest any best practices or resources that can help me in optimizing my analysis for this specific region and chemistry? I’m grateful for your guidance and support.
As I say in the blog post, you could try using the phylotype approach. Really though, I strongly discourage non-V4 regions and the 2x300 chemistry. I suspect that resequencing would be cheaper than paying people to analyze low quality data.