Problem with Make.contigs

xikliu · July 30, 2015, 6:55pm

Hi everyone,

I paired-end sequencing data from Miseq. When I try to run make.contigs, all the data went to scrap file and the trim file is blank.
Here is how I did it:

Method 1:
make.contigs(ffastq=R1_001.fastq, rfastq=R2_001.fastq, findex=I1_001.fastq, oligos=Paleo.oligos, bdiffs=1, pdiffs=2)

The oligo file looks like this:
primer GTGTGYCAGCMGCCGCGGTAA NONE
BARCODE TAGGACGGGAGT NONE F_1_001
BARCODE AAGTCTTATCTC NONE F_1_002
BARCODE TTGCACCGTCGA NONE F_1_003
…
I have 50 samples. I did sequencing with Argonne National Lab, they are using barcodes only with forward primers.

The primer and linker info they provided to me is:
Reverse primer pad AGTCAGCCAG
Reverse primer linker CC
806 reverse primer: GGACTACHVGGGTWTCTAAT
Forward primer pad:TATGGTAATT
Forward primer linker: GT
515 forward primer: GTGYCAGCMGCCGCGGTAA

I played with the data for quite a while, remove linkers in the primer or add reverse primers instead of none, rename the sample and run, but it was still only throwing data into the scrap file.

I also tried to create a 4 column file as indicated in the Miseq SOP:
R1_001.fastq R2_001.fastq I1_001.fastq none

Then run:
make.contigs(file=stability.files, oligos=Paleo.oligos, bdiffs=1, pdiffs=2)
But it is still end up all in the scrap file.

I feel there is something wrong with the way I wrote my primer, but could not figure it out. I even contacted Argonne to confirm the primer and barcodes again.

I am really grateful if anyone can provide some suggestions!
Thanks very much in advance!!

Xikun

xikliu · July 30, 2015, 7:00pm

Another thing is:

I could not open the scrap file (fasta), it is about 6.5Gb, and the program I tried (MEGA, txt and JEdit) could not handle it. Therefore I have no idea what is going on inside…In anyone have idea how to open it? I am using windows system.

Thanks!

Xikun

xikliu · July 30, 2015, 7:44pm

Hi everyone,

I finally got some sequences out from the scrap, by doing trim.seq with the scrap fasta and my oligo file, which look like this one:
#M02149_215_000000000AGTL9_1_1101_19141_5094|bf_bdiffs_4(noMatch)fpdiffs_32(noMatch)__|_bf_fbdiffs_1000(noMatch)_rbdiffs_1000(noMatch)_fpdiffs_1000(noMatch)_rpdiffs_1000(noMatch)1
GACGGAGGGCGCAAACGTTGTTCGGAATCACTGGGCGTAAAGGGCGCGTAGGCGGATCGGTAAGTCAGACGTGAAATCCCGGGGCTCAACTCCGGGTCTGCGTTTGAAACTGTCGATCTAGAGTGCAGGAGAGGAAGGCGGAATTCCAGGTGTGGCGGTGGAATGCGTAGATATCTGGAAGAACACCAGTGGCGAAGGCGGCCTTCTGGACTGACACTGACGCTGAGGCGCGAAAGCTAGGGGAGCAAACAGG

westcott · August 3, 2015, 5:28pm

Could you send a sample of your input files to mothur.bugs@gmail.com?

xikliu · August 3, 2015, 8:12pm

Hi Sarah and Pat,

Thanks for replying me on the forum (make.contigs problem posted by Xikun).
I am now using the demultiplexed data from Argonne and processing them separately (50 samples~lol)…

The original file is kind of too large (~5Gb) and I don’t know how to break it into smaller files. Do you want me to send you the original file or could you provide me some suggestions of how to make a sample out of it? Sorry! I don’t have much experience with mothur and computing yet~

I have another two questions when I am processing my demultiplexed files with mothur:

When I run cluster.split, I set the cutoff to 0.03, but sometimes it changes to 0.02 for the output:

mothur >
cluster.split(fasta=F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta,name=F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.names,taxonomy=F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy,splitmethod=classify,taxlevel=4,cutoff=0.03)

Using 1 processors. Using splitmethod fasta. Splitting the file... /******************************************/ Running command: dist.seqs(fasta=F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.0.temp, processors=1, cutoff=0.035)

Using 1 processors.
/******************************************/

Output File Names:
F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.0.dist

It took 880 seconds to calculate the distances for 14424 sequences.
/******************************************/
Running command: dist.seqs(fasta=F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.1.temp, processors=1, cutoff=0.035)

#Mothur repeats similar output until the end.

Clustering F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.0.dist
Cutoff was 0.035 changed cutoff to 0.02
Cutoff was 0.035 changed cutoff to 0.02
It took 413 seconds to cluster
Merging the clustered files…
It took 1 seconds to merge.

Output File Names:
F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.sabund
F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.rabund
F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.list

Why does it change the cutoff?

When I run classify.otu, it gives me “XXX is not in the taxonomy file…”. But in the end it still can generate tax summary and taxonomy info for me, and when I open it, they look pretty normal:

mothur >
classify.otu(list=F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.list,taxonomy=F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy)

reftaxonomy is not required, but if given will keep the rankIDs in the summary file static. [WARNING]: This command can take a namefile and you did not provide one. The current namefile is F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.names which seems to match F1_01R1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy. unique 20498 [b]HWI-M20149_215_000000000-AGTL9_1_2114_26811_16714 is not in your taxonomy file. I will not include it in the consensus. HWI-M20149_215_000000000-AGTL9_1_2114_24066_11585 is not in your taxonomy file. I will not include it in the consensus.[/b] #mothur continues this output until the end.

Output File Names:
F1_01.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique.cons.taxonomy
F1_01.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique.cons.tax.summary
F1_01.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.0.02.cons.taxonomy
F1_01.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.0.02.cons.tax.summary

Why the taxonomy file not matching up? But it seems that this will not affect my result.

I will also post the question on the form so other people can see the answers.

Thanks very much!

Xikun

westcott · August 4, 2015, 5:49pm

Hi Xikun,
We get asked question 1 a lot, and it’s product on the average neighbor algorithm on a sparse distance matrix. Here’s Pat explanation, http://www.mothur.org/wiki/Frequently_asked_questions#Why_does_the_cutoff_change_when_I_cluster_with_average_neighbor.3F. For question 2, you just need to include the name or count file. The list file contains all the names in your dataset, but the taxonomy file contains only the uniques, so when mothur finds a redundant name it does not know anything about it’s taxonomy without the names or count file, so it ignores it. This could change the consensus taxonomy for the OTUs quite a bit.
Kindly,
Sarah

nelson5 · December 18, 2015, 6:58pm

Hi I’m experiencing the same problems with HiSeq illumina 16SrRNA data from Argonne as per Xikun.
Can you please share what was learned from this experience?
Was the only solution to ask for individual sample R1 and R2 fastq files from the sequencing facility?
I am using Mothur v 1.36.1.
Thank you

pschloss · December 21, 2015, 11:57am

You will need to increase the threshold to 0.20 to get the 0.03 label when you use the average neighbor algorithm.

Pat

nelson5 · December 23, 2015, 4:01pm

Hi Pat,
Thanks for the reply. I apologise as I should have been clearer. My question was referring to the original make.contigs command problems in this post being experienced by Xikun. Can you offer any assistance as to how this was resolved? As per Xikun, when I run the command with my fastq r1 and r2 files from Argonne I have more in the scrap.contigs file. I used the advice posted previously suggesting the use of the reverse complement of the barcode but once again experience this same problem. Is the only solution to acquire the individual sample r1 and r2 seqs from Argonne?
Thanks so much!

westcott · January 14, 2016, 9:12pm

When you have a setup like the following:

The oligo file looks like this:
primer GTGTGYCAGCMGCCGCGGTAA NONE
BARCODE TAGGACGGGAGT NONE F_1_001
BARCODE AAGTCTTATCTC NONE F_1_002
BARCODE TTGCACCGTCGA NONE F_1_003
…

What you want to do is run make.contigs to remove the barcodes and assemble the reads, and then run trim.seqs to remove the primer. We have on our short list adding an easier way to deal with oligos file setups like the one above. Coming soon… In the meantime here’s the workaround…

makeContigs.oligos:
BARCODE TAGGACGGGAGT NONE F_1_001
BARCODE AAGTCTTATCTC NONE F_1_002
BARCODE TTGCACCGTCGA NONE F_1_003
…

mothur > make.contigs(ffastq=R1_001.fastq, rfastq=R2_001.fastq, findex=I1_001.fastq, oligos=makeContigs.oligos, bdiffs=1) - assemble paired reads assigning reads to samples.

trimSeqs.Oligos
primer GTGTGYCAGCMGCCGCGGTAA NONE

mothur > trim.seqs(fasta=fastaFileFromMakeContigs, oligos=trimSeqs.Oligos, pdiffs=2) - trim primer from assembled sequences

Now putting them together:

mothur > list.seqs(fasta=outputFromTrimSeqs) - list sequences that barcodes and primer were successfully removed from
mothur > get.seqs(group=groupFileFromMakeContigs, accnos=current) - remove any sequences that failed to have primer removed from group file.

Topic		Replies	Views
Three small issues with make.contigs() on MiSeq data Commands in mothur	7	8560	June 13, 2013
help with make.contigs/illumina MiSeq data Commands in mothur	3	4097	March 3, 2014
make.contigs, how to remove primers and barcodes Commands in mothur	3	3550	August 29, 2013
make.contigs issue Commands in mothur	2	1183	September 2, 2016
make.contigs() puts everything in scrap file Commands in mothur	1	2020	April 24, 2014

Problem with Make.contigs

Related topics