80 samples help

itiago · April 16, 2024, 3:52pm

Hi
I hope you are well

I have 80 samples to analyze with mothur.
This is big data, and I have only 7 processors to do so. What do you suggest to fasten the analyses? The dist.seqs will take forever, what shall I do?

Thank you for any help

pschloss · April 16, 2024, 9:32pm

Hi there,

You’re not giving me much to go off of here. 80 samples isn’t necessarily “big data”. What region are you sequencing? How far into the pipeline have you gotten? You might want to give this a read…

Pat

itiago · April 18, 2024, 9:51am

Thank you Pat I had read it several years ago, the quality of the sequences is fine, but that overlapping issue may be impacting the overall data yes.

Now II encounter another problem with this data set that was the alignment from mothur. I got an huge amount of unclassified sequences, and that reduced my data from 600k to 70k.
I was very suspicious about that and when to get some of the sequences that were classified as unclassified and did a Blast with then. From 10, 3 effectively didn’t matched with no sequences on the database, but the other had close relatives at 80% and 70%.

I went to see the alignment and noticed that for those particular sequences the alignment was really really bad, and that is why they had not been classified.

Is this a known problem in mothur? Is there any parameter I can change for that? Thank you

EllistonV · April 19, 2024, 2:50pm

Hello,

I don’t know if this is a good advice or not. That said, I work with environmental water samples, and it is not rare for me to have a huge part of my sequences as unclassified or ‘unknown,’ which I later remove with the ‘remove.lineage’ command (the average of unclassified is always around 200K). I also work with a bunch of samples as well, around 50-100.

I also ran into the problem of creating OTUs out of my data (and we got a very decent computer in the lab). I found 2 solutions to my problem: (1) increase the ‘diff’ number in the ‘pre.cluster’ step, or (2) create ASVs instead of OTUs (which is what I do). I know there is a whole battle between ASVs and OTUs, but in my field of research, ASVs are accepted. Beware that you should know what quaetion you want to answer with your data and decide if ASVs make sense or not for your problem.

You can also start with ASVs to understand your data and, later on, further group your data into OTUs. For example, you could start with ASVs, look at your data and define which ASVs are low abundance (if it is 1 sequence in 80 samples, I don’t think it’ll do any harm to remove it). Then, eliminate this low abundance seqeunces, and try clustering into OTUs again.

I hope this helps or make sense. Again, I don’t know if it is good advice or not, but it works for us. I’m also here to learn and know the opinion of everyone regarding this method because it is not the best.

Good luck!!!

itiago · April 19, 2024, 2:59pm

Hi, Thank you for replying, I have been working with quite different environments, and this a very particular one with low microbial quantity for starter. But my concern now is that if I am able to find a close relative to my sequences with BLAST, but that failed to classify on mothur because the alignment on mothur was not good, that gives me some worries.
I will be trying another parameters to align and see if it does the trick.

Moreover, the increase in diffs I already do that by default, since I am using V4V5 for this particular set and mismatch tend to occur.
I don’t wanna use ASVS due to that particular case of mismatch number that will increase artificially the diversity. And on at the end I always remove single double and triple .-tons.

EllistonV · April 19, 2024, 3:24pm

Did you remove low-abundance sequences before making OTUs? When you remove a singleton, it is usually after making OTUs. What if you remove the ‘1 sequence in 1 sample’ before clustering? This could further eliminate noise and make it easier to create OTUs. Is this possible? If yes, how could it be done?

pschloss · April 23, 2024, 8:48pm

Just to be clear… I do not advocate removing rare sequences (however defined) and do not advocate removing “unclassified” sequences.

V4-V5 sequences will give you horrible quality on the assembled reads because they do not fully overlap. the 2x300 chemistry is still bad with the error rates climbing after ~500 total nucleotides. This will result in inflated numbers of unique sequences/ASVs/OTUs. This is why I posted the blog post. This is not a mothur problem. This is a MiSeq problem.

Pat

itiago · May 24, 2024, 11:29am

Hi Pat
and is there a way to use mothur to analyse just the R1 files?
Thank you

pschloss · May 27, 2024, 5:02pm

Hi there,

I’d probably do something with trim.seqs using the quality scores to help set the trimming point.

Pat

itiago · May 28, 2024, 11:07am

Hi Pat thank you for your reply, but my question was upstream:

I already did the quality screening of the raw data and have two files R1 and R2 files with filtered sequences.

But the assembly of the contigs will have the same problem as before and I wanted to test if by using only the forward file I would get a different result in terms of # of seqs that I have at the end of the pipeline, but I don’t know how to make mothur use just one file since the make.contig gives an error if I provide it a *.file with just one file and not two for each sample.

So, How can I make mother use only one file per sample?

Thank you

pschloss · May 28, 2024, 1:51pm

Right - I would skip make.contigs and only use trim.seqs. You can look at our old 454 SOP to get a sense of what this might look like.

Pat

Topic		Replies	Views
Unique nseq & a lot of "Bacteria; unlcassified" Commands in mothur	1	2416	March 30, 2015
Problem with OTU classification mothur bugs	5	5590	April 19, 2010
Mothur for large amount of data Feature requests	7	6777	September 26, 2013
OUT classification Theory behind mothur	4	2099	August 12, 2016
Database Curation Commands in mothur	4	2163	January 30, 2015

80 samples help

Related topics