Dist.seqs taking too much time

Hi
I am using mothur v.1.47.0 on MacOS. I had 90 samples in total including the mock files as well. I have a few questions regarding the analysis.

  1. At this step when i ran the following command i got some warnings
    mothur > classify.seqs(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vseach.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax)
    Using 8 processors.
    Generating search database… DONE.
    It took 6 seconds generate search database.

Reading in the /users/hiraabid/desktop/mothur/M06339_Run_192-Semrau_Soil_Microcosm/trainset9_032012.pds.tax taxonomy… DONE.
Calculating template taxonomy tree… DONE.
Calculating template probabilities… DONE.
It took 13 seconds get probabilities.
Classifying sequences from /users/hiraabid/desktop/mothur/M06339_Run_192-Semrau_Soil_Microcosm/stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.fasta …

[WARNING]: M06339_192_000000000-KBLJK_1_1111_27355_17100 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_2107_16345_28171 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_1102_26823_10250 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_2108_8133_24247 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_1109_17267_18577 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_1108_9142_3033 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_1110_23375_19566 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_2104_12019_12043 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_2104_12545_20141 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_1113_11938_14950 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.

**** Exceeded maximum allowed command warnings, silencing warnings ****
[WARNING]: M06339_192_000000000-KBLJK_1_2107_21484_25899 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M06339_192_000000000-KBLJK_1_2101_19780_9518 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.

A lot of warnings like these showed up. Someone please what these mean? I did run the remove.lineage command after this.

  1. And in the following command it says “Your file does NOT contain sequences from the groups you wish to get.”

mothur > get.groups(count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denvo.vsearch.pick.fasta, groups=Mock)
Your file does NOT contain sequences from the groups you wish to get.
Selected 0 sequences from your count file.
Your file does NOT contain sequences from the groups you wish to get.
Selected 0 sequences from your fasta file.

Output File names:
/users/hiraabid/desktop/mothur/M06339_Run_192-Semrau_Soil_Microcosm/stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.fasta

The resulting “stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.fasta” is empty.

  1. As in the above step get.groups command did not work. The following two commands were for accessing error rates:
    i) get.groups(count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta, groups=Mock)
    ii)
    seq.error(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table, reference=zymo.mock_.16S_fasta.txt, aligned=F)
    I skipped the above two because they produced empty output files. So I used the input fasta and count file given in the get.groups command for the next step. The next command i ran was as follows
    dist.seqs(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta, cutoff=0.03)
    This command is taking too long. its been four days and it has still not completed. I reduced the number of processors but it still doesn’t work. I also tried this on windows computer with 128GB ram and it still did not complete there as well. on windows its been two days now. Can someone tell me any solution. I need to generate an OTU table as soon as possible but it is taking too long.

I hope I was able to explain my questions clearly. I am using this software for the first time that’s why I got a lot of questions. Thanks.
Hira

Hi,

What did you sequence? Are these 16S rRNA gene sequences? What type of environment are they from?

What is your mock community called? It doesn’t seem to be called “Mock”. Can you double check what it’s called by running count.groups() on your count file?

Pat

Yes these are 16SrRNA gene sequences-V4 region, From forest soil DNA samples. Mock community is from the sequencing company. They had mock community DNA in their sequencing runs and sent us a fastq file of mock 16S sequences.
I ran the count.group command as follows
mothur > count.groups(count=stability.trim.contigs.unique.good.filter.unique.pecluster.denovo.vsearch.pick.count_table)
MockZymoPos contains 83117.
S1N3dA contains 93210.
S1N3dB contains 101763.
S1N3dC contains 123378.
S1N6dA contains 95888.
S1N6dB contains 85043.
S1N6dC contains 120545.
S1N9dA contains 91544.
S1N9dB contains 76981.
S1N9dC contains 106122.
S1O3dA contains 89471.
S1O3dB contains 95563.
S1O3dC contains 98987.
S1O6dA contains 89563.
S1O6dB contains 75520.
S1O6dC contains 115488.
S1O9dA contains 100994.
S1O9dB contains 96544.
S1O9dC contains 106170.
S1SB3dA contains 83382.
S1SB3dB contains 77245.
S1SB3dC contains 98083.
S1SB6dA contains 93807.
S1SB6dB contains 84213.
S1SB6dC contains 108966.
S1SB9dA contains 83590.
S1SB9dB contains 72065.
S1SB9dC contains 94577.
S1SoilA contains 75997.
S1SoilB contains 82328.
S1SoilC contains 64934.
S2N3dA contains 85343.
S2N3dB contains 75254.
S2N3dC contains 76460.
S2N5dA contains 80294.
S2N5dB contains 81664.
S2N5dC contains 97361.
S2N8dA contains 94401.
S2N8dB contains 76192.
S2N8dC contains 79156.
S2O3dA contains 73920.
S2O3dB contains 82527.
S2O3dC contains 81649.
S2O5dA contains 88544.
S2O5dB contains 95740.
S2O5dC contains 91724.
S2O8dA contains 72242.
S2O8dB contains 80480.
S2O8dC contains 83682.
S2SB3dA contains 73362.
S2SB3dB contains 86727.
S2SB3dC contains 78357.
S2SB5dA contains 85055.
S2SB5dB contains 102298.
S2SB5dC contains 91753.
S2SB8dA contains 85630.
S2SB8dB contains 91527.
S2SB8dC contains 59642.
S2SoilA contains 56315.
S2SoilB contains 73742.
S2SoilC contains 73035.
S3N15dA contains 64924.
S3N15dB contains 60306.
S3N15dC contains 71301.
S3N4dA contains 82148.
S3N4dB contains 38162.
S3N4dC contains 82005.
S3N8dA contains 64436.
S3N8dB contains 50326.
S3N8dC contains 77608.
S3O15dA contains 91433.
S3O15dB contains 93240.
S3O15dC contains 159981.
S3O4dA contains 91333.
S3O4dB contains 117370.
S3O4dC contains 118219.
S3O8dA contains 83325.
S3O8dB contains 96295.
S3O8dC contains 122285.
S3SB15dA contains 96360.
S3SB15dB contains 78591.
S3SB15dC contains 90259.
S3SB4dA contains 101796.
S3SB4dB contains 89206.
S3SB4dC contains 81606.
S3SB8dA contains 80679.
S3SB8dB contains 96233.
S3SBdC contains 80611.
S3SoilA contains 74633.
S3SoilB contains 88300.
S3SoilC contains 72368.
WaterNeg contains 165.

Size of smallest group: 165.

Total seqs: 7914658.

Output File Names:
/users/hiraabid/desktop/mothurcopy/M06339_Run_192-Semrau_Soil_Microcosm/stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.count.summary

Your mock is called MockZymoPos

Pat

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.