Make.contigs() problem with the oligos option?

Dear All,

Can anybody explain the problem for make.contigs()?

I used mother v.1.48.0.

  1. When the oligos option is NOT included in the make.contigs(), the result looks normal and correct.
make.file(inputdir=D:\City_bumblebees\z_analysis\microbes\datasets, type=fastq, prefix=framgement)  
make.contigs(file=framgement.files)
summary.seqs(fasta=framgement.trim.contigs.fasta, count=framgement.contigs.count_table)

The groups have the Group_0, with a total of 1259009 sequences (see below)

Group count:
Group_0 62092
Group_1 65104
Group_10        64332
Group_11        66286
Group_12        50398
Group_13        58625
Group_14        42531
Group_15        42162
Group_16        63950
Group_17        64915
Group_18        68091
Group_19        45224
Group_2 64619
Group_20        48305
Group_3 68941
Group_4 59792
Group_5 62512
Group_6 65391
Group_7 65739
Group_8 66878
Group_9 63122

Total of all groups is 1259009

It took 433 secs to process 1259009 sequences.

Output File Names:
framgement.trim.contigs.fasta
framgement.scrap.contigs.fasta
framgement.contigs_report
framgement.contigs.count_table


mothur >  summary.seqs(fasta=framgement.trim.contigs.fasta, count=framgement.contigs.count_table)


Using 12 processors.

                Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        1       301     301     0       3       1
2.5%-tile:      1       456     456     0       4       31476
25%-tile:       1       479     479     0       4       314753
Median:         1       481     481     0       6       629505
75%-tile:       1       481     481     0       6       944257
97.5%-tile:     1       481     481     4       6       1227534
Maximum:        1       601     601     48      291     1259009
Mean:   1       476     476     0       5
# of unique seqs:       1259009
total # of seqs:        1259009

It took 17 secs to summarize 1259009 sequences.

Output File Names:
framgement.trim.contigs.summary
  1. However, when the oligos option is included in the make.contigs(), the first group will be automatically removed and it is NOT included in the result any more.

The same data were used as before.

make.file(inputdir=D:\City_bumblebees\z_analysis\microbes\datasets, type=fastq, prefix=framgement)  
make.contigs(file=framgement.files, oligos=oligos_file_framgement.txt)
summary.seqs(fasta=framgement.trim.contigs.fasta, count=framgement.contigs.count_table)

My oligos_file_framgement.txt is:
primer	ACTCCTACGGGAGGCAGCAG		GGACTACHVGGGTWTCTAAT	v3-v4
barcode	ATGAAG	TGCAAG	LYH01
barcode	AGCATG	TTGACG	LYH02
barcode	GTGAAC	CTGTTC	LYH10
barcode	CGCATA	GTACTC	LYH11
barcode	TGTGCA	CCGTAA	LYH12
barcode	AGTTCC	TGAATG	LYH13
barcode	GTACTT	CCAGCT	WXC01
barcode	CAGATC	GTGAAA	WXC02
barcode	TAATCG	ACTTGA	WXC03
barcode	ATCACG	TACAGC WXC04
barcode	GAGATA	CTAGCT	WXC05
barcode	CGCGGT	GAGTGG	WXC06
barcode	ACCTAA	TCATTC	km01
barcode	GTTTCG CTATAC	km02
barcode	CATTCG GACTTC	km04
barcode	TCCACA ATTGCG	km05
barcode	CGGAAT	GGTAGC	km11
barcode	TAACGA	ATATGT	km12
barcode	AGAGTA	TTAGGC	km13
barcode	AGAGCT 	TGCCAA	km14
barcode	GCACAA 	CCTTCT	km15

Group count:
Group_1        57332
Group_10        55196
Group_11        56969
Group_12        41310
Group_13        53239
Group_14        37998
Group_15        35616
Group_16        56717
Group_17        58338
Group_18        61664
Group_19        40230
Group_2 57177
Group_20        42633
Group_3 60340
Group_4 52715
Group_5 53496
Group_6 56403
Group_7 56783
Group_8 57145
Group_9 53064

Total of all groups is 1044365
It took 259 secs to process 1259009 sequences.

Here, the first group (Group_0) is NOT included in the results any more, with a total of 1044365 sequences. The length of contigs is trimmed by its primer and barcode.

mothur>summary.seqs(fasta=framgement.trim.contigs.fasta, count=framgement.contigs.count_table)


Using 12 processors.

                Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        1       277     277     0       3       1
2.5%-tile:      1       404     404     0       4       26110
25%-tile:       1       428     428     0       4       261092
Median:         1       429     429     0       6       522183
75%-tile:       1       429     429     0       6       783274
97.5%-tile:     1       429     429     4       6       1018256
Maximum:        1       549     549     37      264     1044365
Mean:   1       424     424     0       5
# of unique seqs:       1044365
total # of seqs:        1044365

It took 10 secs to summarize 1044365 sequences.

Output File Names:
framgement.trim.contigs.summary

Zhenghua.

Where are the group_# names coming from when you run make.contigs with an oligos file? Those group names should be the names in the 4th column of the oligos file. If you take the output of make.contigs without the oligos file and then run it through trim.seqs what do you get? Alternatively, if you make a separate oligos file for each sample that only has the barcodes for that sample, I’d be curious what you get.

I suspect you might be having problems because you have samples in separate files and are using an oligos file. Our original intention was to use oligos if all of the forward reads were in one file and all of the reverse reads were in a second file - not separate forward and reverse files for each sample.

Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.