Hi all,
I’m new to mothur and it’s a bumpy road so far.
So I’m following Miseq SOP to analyze our own data. We happen to sequence v6 region as well. Although a couple problems that I ran into:
- We got Hiseq data, with only one .fastq and .fna data per sample, but the sequencing was PE. I didn’t know if the company merge/contigs rfastq and ffastq for us or not. How can I tell by looking at fastq or fna data, please?
For example, fna data looks like this
H1_0
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGACGGGTCCTTAAGTCAGTTGTGAAAGTTT
GCGGCTCAACCGTAAAATTGCAGTTGATACTGGGGACCTTGAGTGCGGCAGAGGCAGGCGGAATTCGTGGTGTAGCGGTG
AAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTTGCTGGACCGTAACTGACGTTGATGCTCGAAAGTGCG
GGTATCAAACAGG
H1_1
TACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCC
CCGGCTTAACCGGGGAGGGTCATTGGAAACTGGGAGACTTGAGTGCAGAAGAGGAAAGCGGAATTCCATGTGTAGCGGTG
AAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGCTTTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCGTG
GGGAGCAAACAGG
H1_2
TACGGGGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGCGCGTAGGCGGGACGCCAAGTCAGCGGTAAAAGACT
GCAGCTAAACTGTAGCACGCCGTTGAAACTGGCGCCCTCGAGACGAGACGAGGGAGGCGGAACAAGTGAAGTAGCGGTGA
AATGCTTAGATATCACTTGGAACCCCGATAGCGAAGGCAGCTTCCCAGGCTCGATCTGACGCTGATGCGCGAGAGCGTGG
GTAGCGAACAGG
H1_3
TACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGGACAGCAAGTCTGATGTGAAAGGCG
GGGGCTCAACCCCCGGACTGCATTGGAAACTGCTGACCTGGAGTACCGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTG
AAATGCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTG
GGGAGCAAACAGG
and fastq data looks like this
@H1_0 SN7001328:511:H2WKLBCXX:1:1101:4739:2211 1:N:0:GACAGTGC orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGACGGGTCCTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGGGACCTTGAGTGCGGCAGAGGCAGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTTGCTGGACCGTAACTGACGTTGATGCTCGAAAGTGCGGGTATCAAACAGG
+
IGIEIHHHHHEEF@DHDHH@FFHFHGHHHI@HHHH00DCC@FHH?EHHIHCGHIIFHHHIHHHHHIEHHFIHHHEHHIHGIHHEHIGHGH?HIIIHIHHHHHIHHGG@HHFHHEGDGHHIIIHHHIIHHHHHHICHHHEHIHHIGHFFIHGIIIIIIIIIHIIFIHHHIHIHHG@AGDG@HIIHDIIHEDEHIIIIHFHHHHEHEHIHHGIIIIIIIIIIIIIIIHDIHGIIIIHDCIHIIHHHHIIIIIIII
@H1_1 SN7001328:511:H2WKLBCXX:1:1101:8024:2126 1:N:0:GACAGTGC orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
TACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCCCCGGCTTAACCGGGGAGGGTCATTGGAAACTGGGAGACTTGAGTGCAGAAGAGGAAAGCGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGCTTTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAACAGG
+
IIIIIIIIHHFHFHHIIIGIGHHHIIIIIIGHIIIGHIIIGII<GIIIIIIIIIHHIIIIIIIICGIIIIIIHGIIEHHHHIIIIHHHHHHHHHIHIIIHIGIHHIIHHHHHHHHHGHIIIHHHEHIIIIHHGGIIIIIIIIIIIIHHIHHIHIIHGIIIIIIIIHHIIIIIIHIIHHHIIIHIIIHIIIIIHHHIHGIHIIIIIIIIIIIHHEHEHEHHHIHHIIIHEIHIIIIGIIIHIIIIIHIIIIIHH
@H1_2 SN7001328:511:H2WKLBCXX:1:1101:14224:2150 1:N:0:GACAGTGC orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
TACGGGGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGCGCGTAGGCGGGACGCCAAGTCAGCGGTAAAAGACTGCAGCTAAACTGTAGCACGCCGTTGAAACTGGCGCCCTCGAGACGAGACGAGGGAGGCGGAACAAGTGAAGTAGCGGTGAAATGCTTAGATATCACTTGGAACCCCGATAGCGAAGGCAGCTTCCCAGGCTCGATCTGACGCTGATGCGCGAGAGCGTGGGTAGCGAACAGG
+
IIIIIIIIIIIGIIIIIIIIIIIIIHGHIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIGIIIIHIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIHIIIIIIGIGIIHIGIIIIIIIIIIIIIIIIIIHGIIIIIIIIIIIIHIIIIIIIIIIIIIIHIIHIHFIIIIIIGIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIGIIIIHIIIIIIIIIIIIHIIIIIIIGIIIIIHIII
@H1_3 SN7001328:511:H2WKLBCXX:1:1101:18363:2111 1:N:0:GACAGTGA orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
TACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGGACAGCAAGTCTGATGTGAAAGGCGGGGGCTCAACCCCCGGACTGCATTGGAAACTGCTGACCTGGAGTACCGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACAGG
-
Since we only have one fastq file per sample instead of rfastq and ffastq files, I didn’t use make.contigs command. And I merged all the fastq and fna data into a big fastq and fna data per treatment (we got 3 groups). So I’ve got 3 huge fna files and 3 fastq files. I skipped make.contigs and did screen.seqs and the rest. Would that be correct?
-
I don’t know how to make .file, like stability.file in SOP. And I don’t have group= command, 'cause I don’t have a group. Would it affect my analysis?
-
At the step mothur > remove.lineage(fasta=Lgroup.good.unique.good.filter.unique.precluster.pick.fasta,count=Lgroup.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=Lgroup.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
Mothur stopped and warning, Unable to open Lgroup.good.unique.good.filter.unique.precluster.uchime.pick.count_table
[ERROR]: did not complete remove.lineage.
I looked the database, I didn’t have an output file named Lgroup.good.unique.good.filter.unique.precluster.uchime.pick.count_table, I only got Lgroup.good.unique.good.filter.unique.precluster.count_table. But I did chimera.uchime
mothur > chimera.uchime(fasta=Lgroup.good.unique.good.filter.unique.precluster.fasta, count=Lgroup.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
Using 1 processors.
uchime by Robert C. Edgar
http://drive5.com/uchime
This code is donated to the public domain.
Checking sequences from Lgroup.good.unique.good.filter.unique.precluster.fasta …
It took 6932 secs to check 46130 sequences. 23222 chimeras were found.
Output File Names:
Lgroup.good.unique.good.filter.unique.precluster.uchime.chimeras
Lgroup.good.unique.good.filter.unique.precluster.uchime.accnos
So the process stopped right here unless I figure out what went wrong.
- I tried stability.batch file using just one sample fna file, started with screen.seqs(fasta=H1.fna, minlength=252, maxhomop=8, maxambig=0, maxlength=275), that went surprisingly well, except an error saying
mothur > count.seqs(name=current, group=current)
[WARNING]: no file was saved for group parameter.
Using H1.good.names as input file for the name parameter.
Because I don’t have a group.
I used hpc to run the data, but it got terminated, the last log info was
Output File Names:
H1.good.unique.good.filter.unique.precluster.pick.pick.fasta.30.dist
It took 0 to calculate the distances for 2 sequences.
It took 38 seconds to split the distance file.
Reading H1.good.unique.good.filter.unique.precluster.pick.pick.fasta.3.dist
Any idea why it didn’t go through?
Sorry for tons of questions here. I’m brand new and still learning.
Much thanks in advance for your help!
Cheers