Hi all,
First, very appreciative to anyone who can help out!
-
I had to merge pair ends through PEAR, because my data is from Mr. DNA with a mix of Forward and Reverse sequences in the original fastq files. There’s a way to do this in Mothur by first running Make.contigs without an oligo file specified, but that returns with an error that sequence names appear in the forward, but not reverse fastq. I’ve read I can take care of this by using remove.seqs, but this requires me to first run unique. seqs. Since I want to remove sequences that are unique amongst both fastq files, I would need to tell this command to look in two files at once. Is this possible?
-
Alternatively, using the merged data with PEAR, I have gotten to trim.seqs in order to de-multiplex, but this step leaves me with only a few hundred reads per sample (Am I understanding the number given next to the sample ID in the output correctly?) and the file containing these sequences is 3.0 MB while scrap is 2.0 GB. I know the reads have very high quality. I have discovered that because reads exist in forward and reverse orientation, I can find the reverse compliments of my barcodes. If I can incorporate these reverse reads, I should double my reads per sample, but that still leaves the vast majority of reads in the scrap. Any thoughts? Here’s the header of the scrap file:
M01522_77_000000000-AR1WK_1_1101_16596_1576|b fbdiffs=1000(noMatch), rbdiffs=1000(noMatch) fpdiffs=0(match), rpdiffs=0(match)
AATAACGACCTGCATGAAACATAGAACGATTCGATAGCTGTCTTGCACACACATCTTGGTAAAATATTCATCAGTACATAGAATATGCTCGTCAGCTAATACCACTATAAAGTACAAAACGACCCTCTTAATCTTTATAGGTATAAAATCTAGTACACATCCAGGCGTTTTTATACAAGGTCATGGCCCCAAGAAGCGAGACGATGACGATAAAGAGCCAGAAGATGCAGCGTGGTAAACACAAGACTGTTTAATTGGGGGGATAAGCTTATCTAATCCACATAAAGTGAACAAAGGGATGAAAAGGAGACGAAGTAAAAAGGGGAAAGAAAGAAGTAAAGAAGAATGGAAAAAGAAGAAAAAGAAAGTAGAAAAAAGAAAACGGAAAGAGGAAAAAAAAAAAAAAAAAAAAAAAAGGAAGGGGAAGGAAAAAAGGCGGAAAG
M01522_77_000000000-AR1WK_1_1101_20602_1604|bf fbdiffs=1000(noMatch), rbdiffs=1000(noMatch) fpdiffs=1000(noMatch), rpdiffs=1000(noMatch)
AGCATTTAAATACTGGGAAGGACTTCTTGGGCCATTAAGCGAAACATGTATCTTTGCACTTATTAGTTTGTCTCCATTGGATGGCTTTGTTCACCTTAGGTGGATTAGATAAGCTTATCGCCCCAATTAAACAGTCTTGTGTTTACCACGCTGCATCTTCTGGCTCTTTATCGTCATCGTCTCGCTTCTTGGGGCCATCACCTTGTATACAAACGCCTGGATGTGTACTAGATTTTATCCCTATAAAGCTTCAGAGGGTCTTTTTGTCCTTTATTGTGGTCTTAGGTCACCAGAATATTCGGTGTAATGATGAATATTGTAAAATGATGTGTGAGAAAGAAAGATAGCGAATAGGTCTATGTATCATGAAGGTAGTTATG
M01522_77_000000000-AR1WK_1_1101_14624_1609|bf fbdiffs=1000(noMatch), rbdiffs=1000(noMatch) fpdiffs=1000(noMatch), rpdiffs=1000(noMatch)
TCCGTAGAGTAGCTTGGTGTTGGTAGATTAATTTTAAGCTCATTAAGATAGCATTTAAATACTTGGAAGGACTTCTTGTGCCCTTAAGCGAAACATGTATCTTTGCACTTATTAGTTGGGCTCCATTGGGTGGCTTTGTTCACCTTAGTTTGATTAGATAAGCTTATCTCCCCAATTAAACAGTCTTGTGTTTACCACGCTGCATCTTCTGGCTCTTTATCGTCATCGTCTCGCTTCTTGGGGCCATCACCTTGTATACAAACGCCTGGATGTGTACTAGATTTTATCCCTATAAAGCTTAAGAGGGTCTGATTGGAAGTTAATGTGGTGTTAGCTAACAAGCAGATTATATGTAAAGAAGAATATTTTAAAATGAGGTGAGAGAAAGAAAGCTATCGAATCGGAATAAGTAACATGCAGGTAGTTATG
M01522_77_000000000-AR1WK_1_1101_16028_1611|b fbdiffs=1000(noMatch), rbdiffs=1000(noMatch) fpdiffs=0(match), rpdiffs=0(match)
AATAACGACCTGCATGAAACATAGAACGATTCGATAGCTGTCTTGCACACACATCATGGTTAAATATTCATCAGTACTAAGAATATGCTGGTGTGCTAAGACCTCAATCAAGGACAAAAAGACCCTCTTATGCTTTATCGGGCTAAAATCTAGTACACATCCAGTCGTTTTTATACAAGGTGATGGCCCCAAGAAGCGAGACGATGACGATAAAGAGCCAGAAGATGCAGCGTGGTAAACACAAGACTGTTTAATTGGGGAGATAAGCTTATCTAATCCACCTAAGGTGAACAAAGAAATCAAATGGAGAGAAACAAATAAGAGAAAAGATAAAGGTGTAGAGTAATGGAGAAAGAAGTGATTCAAAGAATTGAAATGCTATCAAAAAGAGATGAAAAGGAAAAAAAAAAGAACAAGCGAAACGACGGATAAAAGGCAAAGACTG
M01522_77_000000000-AR1WK_1_1101_19243_1611|f fbdiffs=0(match), rbdiffs=0(match) fpdiffs=1000(noMatch), rpdiffs=1000(noMatch)
GAGGAGTAGCCTGTTATCCGTAGAGTAGCTTGGTGTTGGTAGATTAATTTTAAGCTCATTAAGATATCATTTAAATACTGGGAAGGACTTCTTTGGCCATTAAGCGAAACATGTATCTTTGCACTTATTAGTTGGTCTCCATTGGATGGCTTTGTTCACCTTAGGTGGATTAGATAAGCTTATCGCCCCAATTAAACAGTCTTGTGTTTACCACGCTGCATCTTCTGGCTGTTTATCGTCATCGTCTCGCTTCTTGGGGCCATCACCTTGTATACAAACGCCTGGATGTGTAGTAGATTTTAGAGCTATAAAGATTCAGAGGGTATTTTGGTGATATATAGTGGTATGAGATAAACAGCAGATTCTGTGTACTGATGAATAATGTAAAATGATGGGTGTGAAAGAGAGATATAGAATCGTTCTAG
Thanks so much!!
Dan