Columbia Technology Ventures

Automated clone detection algorithm for characterizing programs

This technology provides a dynamic algorithm that can detect exact- and near-match code clones while optimizing computation time.

Unmet Need: Efficient identification of exact-match and near-match clones

Code clone detection is a central tool for software update technology and code plagiarism detection. Due to the complexity of detection, current methods often utilize simplified algorithms that fail to identify all exact-match and near-match clones. Computational cost also hampers many existing approaches to clone detection. To combat time and computationally intensive algorithms, current approaches utilize simplified data representations for their detection schemes, which fail to fully identify all existing clones. There are currently no available methods to effectively and automatically identify syntactically similar code fragments or processes/programs that have similar behavior, even if their code is not necessarily alike.

The Technology: High-speed, accurate algorithm for automated clone detection

This technology, termed DyCLINK, is a system to detect code relatives, such as code segments with dynamically similar execution features. These code relative detectors can be used to detect code clones, which are syntactically similar programs, enabling them to be used for tasks such as implementation-agnostic code search and classification of code with similar behavior for human understanding. This method includes generating instruction dependency graphs that are representative of behaviors of code segments, and then using these graphs to compare similarities between the various processes. Utilizing this graph structure enables this technology to be more robust as it records relationship dependencies, allowing users to find groups of program fragments which contain similar code idioms or patterns in data reuse, control flow, and context.

This technology has been validated and shown to be robust in identifying exact and near match clones.

Applications:

  • Data mining
  • Code refactoring for API updates
  • Open-source code plagiarism detection

Advantages:

  • Exact and near match clone capabilities
  • Provides large search space of possible clones
  • Optimizes computation time

Lead Inventor:

Simha Sethumadhavan, Ph.D.

Patent Information:

Patent Issued

Related Publications:

Tech Ventures Reference: