This technology is a suite of statistical and machine learning methods for population structure inference and translational biomedical applications.
The increasing availability of whole genome sequencing provides large datasets for population and medical genetics studies. Identifying latent population structure is crucial to both account for the variation in allele frequencies between subpopulations and avoid confounding factors when making genetic associations for diseases. Principal component analysis (PCA) is commonly applied to extract principal components (PCs) that can capture the population structure, but existing methods have several limitations at this scale of data.
This technology, called ERStruct, is a software package for inferring the latent population structure of whole-genome datasets accurately and efficiently. ERStruct is a suite of statistical and machine learning methods, including Mendelian randomization, causal mediation analysis, statistical genetics, and deep learning. This robust computational algorithm can be applied in MATLAB and Python to estimate the number of top informative principal components and process ultra-dimensional data of whole human genomes in a computationally efficient way.
IR CU25377
Licensing Contact: Joan Martinez