sub.sample() taxonomy error message "read missing"

yingeddi2008 · March 18, 2015, 3:08pm

Hi all,

I am using the sub.sample() command to subsample a subset of MiSeq sequeces (which is always too big to process for OTU analysis, I have 1043161 unique reads after denoising and chimera check, I only want around 500-1500 unique reads).

My command looks like this:

sub.sample(fasta=all.otu.unique.fasta, name=all.otu.names, group=all.otu.groups, taxonomy=all.otu.rdp.taxonomy.80,size=500)

Where the fasta file contains all unique reads, name file contains the names of reads that are the same, group file contains which sample each reads belongs to, and taxonomy file contains the taxonomy assignment for each reads (not unique reads).

After I run this command, I got outputs and error message:

Sampling 500 from 13879970.
Deconvoluting subsampled fasta file...
/******************************************/
Running command: unique.seqs(fasta=all.otu.unique.subsample.fasta)
500     331

Output File Names:
all.otu.unique.subsample.names
all.otu.unique.subsample.unique.fasta

/******************************************/
Done.
[ERROR]: M01246_29_000000000-A461D_1_1101_13732_3258 is missing, please correct.
[ERROR]: M01246_29_000000000-A461D_1_1101_14913_3491 is missing, please correct.
[ERROR]: M01246_29_000000000-A461D_1_1101_20287_3764 is missing, please correct.

This is not a complete list of error message.

I wasn’t sure what it is talking about saying something is missing, so I checked using the very first read.

It is in the original name file, also in the group file, and taxonomy file. But it is not in the fasta file, since it is a repeated read. So how it is missing, if the read name is in the input file I provided? I am wondering whether I am doing something wrong? Should I use the not-unique fasta in this command?

However, I did get output files that seem OK:

Output File Names:
all.otu.subsample.names
all.otu.rdp.taxonomy.subsample.80
all.otu.unique.subsample.fasta
all.otu.subsample.groups

But this error message bothers me. I am afraid this would have some affect on the subsampled reads. Is anyone having the same error message? What is the affect if I am just using the output files here, ignoring the error message? I am welcome to any suggestions.

PS, I tried with different versions of mothur,they all give the same error message.

Thank you,

Eddi

westcott · March 19, 2015, 1:20pm

I suspect it’s an issue with the names file format. Could you post a line in the names file that contains the one of the “missing” names?

yingeddi2008 · March 19, 2015, 6:48pm

Thanks for your reply. I have taken M01246_29_000000000-A461D_1_1101_13732_3258 as the missing name. It has the same sequence to M01246_29_000000000-A461D_1_1101_6986_10337. The following is not a complete list, the complete list is very long.

M01246_29_000000000-A461D_1_1101_6986_10337     M01246_29_000000000-A461D_1_1101_6986_10337,M01246_29_000000000-A461D_1_1102_21228_3750,M01246_29_000000000-A461D_1_1103_7955_10631,M01246_29_000000000-A461D_1_1104_16068_11765,M01246_29_000000000-A461D_1_1104_17118_13212,M01246_29_000000000-A461D_1_1104_5934_16054,M01246_29_000000000-A461D_1_1104_21029_23119,M01246_29_000000000-A461D_1_1105_22950_6934,M01246_29_000000000-A461D_1_1105_24399_11228,M01246_29_000000000-A461D_1_1106_20861_19961,M01246_29_000000000-A461D_1_1107_21427_5556,M01246_29_000000000-A461D_1_1107_14126_11284,M01246_29_000000000-A461D_1_1107_14178_12538,M01246_29_000000000-A461D_1_1107_22649_12745,M01246_29_000000000-A461D_1_1107_7644_17247,M01246_29_000000000-A461D_1_1107_23830_18850,M01246_29_000000000-A461D_1_1108_10610_21847,M01246_29_000000000-A461D_1_1110_10744_15878,M01246_29_000000000-A461D_1_1112_8166_10114,M01246_29_000000000-A461D_1_1113_11396_26386,M01246_29_000000000-A461D_1_1114_13721_20352,M01246_29_000000000-A461D_1_2101_19564_5987,M01246_29_000000000-A461D_1_2101_25070_12450,M01246_29_000000000-A461D_1_2101_6907_12477,M01246_29_000000000-A461D_1_2101_23602_14438,M01246_29_000000000-A461D_1_2101_6541_16101,M01246_29_000000000-A461D_1_2101_15154_23084,M01246_29_000000000-A461D_1_2102_7457_5266,M01246_29_000000000-A461D_1_2102_17362_20797,M01246_29_000000000-A461D_1_2103_25747_19114,M01246_29_000000000-A461D_1_2103_7672_20752,M01246_29_000000000-A461D_1_2103_13431_28150,M01246_29_000000000-A461D_1_2104_21647_9101,M01246_29_000000000-A461D_1_2104_10423_15152,...

The sample M01246_29_000000000-A461D_1_1101_13732_3258 is at the beginning middle (25%) part of the name list.

The names file was generated by using unique.seqs() command. Let me know if there is any problem with the format.

Eddi

westcott · March 19, 2015, 7:16pm

The format looks fine. Could you send your log file, fasta, name, group and taxonomy files to mothur.bugs@gmail.com?

yingeddi2008 · March 20, 2015, 2:32pm

Hi, thanks for looking into this issue. However, the files are very big, about 5 G total. Please see below. I don’t know whether gmail can handle this. If it can not, do you have alternative ways to transfer the files?

-rw-r--r-- 1 hl0333 pi_qd0005 651M Mar 10 16:22 all.otu.unique.fasta
-rw-r--r-- 1 hl0333 pi_qd0005 2.1G Mar 10 16:21 all.otu.rdp.taxonomy.80
-rw-r--r-- 1 hl0333 pi_qd0005 631M Mar 10 16:21 all.otu.names
-rw-r--r-- 1 hl0333 pi_qd0005 700M Mar 10 16:21 all.otu.groups

And the logfile is anther 1 G because there are a lot of missing read errors listing in it.
Eddi

westcott · March 23, 2015, 6:22pm

Hi Eddi,
The error is coming from the taxonomy file. It does not seem to match the other files. It only contains 199 sequences.
Kindly,
Sarah Westcott

Topic		Replies	Views
sub.sample and taxonomy file problems mothur bugs	2	3623	January 13, 2012
duplicates in subsampled taxonomy files Commands in mothur	8	4918	March 20, 2014
sub.sample won't write names file mothur bugs	3	3980	August 8, 2012
Error in sub.sample command in v.1.26 mothur bugs	1	3408	July 27, 2012
sub.sample with fasta, name & group or fasta and count file Commands in mothur	4	3839	October 31, 2014

sub.sample() taxonomy error message "read missing"

Related topics