Next Generation Sequencing technologies are able to produce large amounts of sequence data at the cost of reducing the read length and increasing the error rate. One main challenge, under short read and high error rate, is to identify small insertions and deletions (from two to tens of nucleotides). There is a great demand for algorithms that can accurately identify small mutations. However, computational approaches to find these mutations using DNA fragment libraries or mate-pair libraries have so far been unsuccessful.
This technology provides an algorithm for identifying nucleotide insertions and deletions in a DNA fragment and enables identification and reconstruction of small insertion/deletion mutations from libraries of small DNA fragments.
This technology presents an algorithm capable of producing a list of candidates of small genomic insertions and their nucleotide sequences. Given a set of reads that are partially aligned to a reference genome, this method can find the possible positions of genomic insertions relative to the reference genome, through multiple statistical analyses. In a test, the chromosome X samples of 12 adult male patients with T-cell acute lymphoblastic leukemia have been analyzed and mutations in various genes, including small insertions and deletions, have been identified and verified via traditional Sanger sequencing. Continuing research is on further development of this method, particularly with the genomic data from cancer samples, obtained by Next Generation Sequencing systems.
Patent Pending
Tech Ventures Reference: IR 2642