Columbia Technology Ventures

Novel algorithms for population structure inference

This technology is a suite of statistical and machine learning methods for population structure inference and translational biomedical applications.

Unmet Need: Methodology for capturing underlying population structures

The increasing availability of whole genome sequencing provides large datasets for population and medical genetics studies. Identifying latent population structure is crucial to both account for the variation in allele frequencies between subpopulations and avoid confounding factors when making genetic associations for diseases. Principal component analysis (PCA) is commonly applied to extract principal components (PCs) that can capture the population structure, but existing methods have several limitations at this scale of data.

The Technology: Novel algorithms for population structure inference from whole genome sequencing data

This technology, called ERStruct, is a software package for inferring the latent population structure of whole-genome datasets accurately and efficiently. ERStruct is a suite of statistical and machine learning methods, including Mendelian randomization, causal mediation analysis, statistical genetics, and deep learning. This robust computational algorithm can be applied in MATLAB and Python to estimate the number of top informative principal components and process ultra-dimensional data of whole human genomes in a computationally efficient way.

Applications:

  • Drug discovery
  • Drug target identification
  • Structure-based drug design
  • Causal protein biomarker validation
  • Biomarker discovery and validation
  • Personalized medicine
  • Genomics
  • Health AI
  • Clinical trial analytics
  • Fairness-aware AI in health
  • Real-world evidence analysis
  • Risk factor identification (e.g., COVID-19 severity, cardiometabolic disease)

Advantages:

  • Efficient and accurate structure inference
  • Can process ultra-dimensional whole genome data
  • Runs in MATLAB and Python environments
  • User-friendly software package
  • Outperforms traditional methods for principal components estimation

Lead Inventor:

Zhonghua Liu, Sc.D.

Related Publications:

Tech Ventures Reference: