Skip to main content

One interest of my lab is to identify disease-causing genes using genome-wide data. Through collaborations, we are conducting several studies aiming at understanding the genetic etiology of Mendelian and complex diseases using genome sequencing data. Here are a few examples.

 

Understanding the genetic risk of egg aneuploidy in IVF patients

The female reproductive lifespan is highly dependent on egg quality. Mistakes in meiosis leading to egg aneuploidy are frequent in humans. Yet, knowledge of the precise genetic landscape that causes egg aneuploidy in women is limited, as phenotypic data on the frequency of human egg aneuploidy are difficult to obtain and therefore absent in public genetic datasets. Here, we identify genetic determinants of reproductive aging via egg aneuploidy in women using a biobank of individual maternal exomes linked with maternal age and embryonic aneuploidy data. Using the exome data, we identified 404 genes bearing variants enriched in individuals with pathologically elevated egg aneuploidy rates. Analysis of the gene ontology and protein–protein interaction network implicated genes encoding the kinesin protein family in egg aneuploidy. We interrogate the causal relationship of the human variants within candidate kinesin genes via experimental perturbations and demonstrate that motor domain variants increase aneuploidy in mouse oocytes. Finally, using a knock-in mouse model, we validate that a specific variant in kinesin KIF18A accelerates reproductive aging and diminishes fertility. These findings reveal additional functional mechanisms of reproductive aging and shed light on how genetic variation underlies individual heterogeneity in the female reproductive lifespan, which might be leveraged to predict reproductive longevity. Together, these results lay the groundwork for the noninvasive biomarkers for egg quality, a first step toward personalized fertility medicine.

 

 

Whole exome sequencing identifies genes associated with Tourette’s Disorder in multiplex families

Tourette’s Disorder (TD) is a neurodevelopmental disorder that affects about 0.7% of the population and is one of the most heritable neurodevelopmental disorders. Nevertheless, because of its polygenic nature and genetic heterogeneity, the genetic etiology of TD is not well understood. In this study, we combined the segregation information in 13 TD multiplex families with high-throughput sequencing and genotyping to identify genes associated with TD. Using whole-exome sequencing and genotyping array data, we identified both small and large genetic variants within the individuals. We then combined multiple types of evidence to prioritize candidate genes for TD, including variant segregation pattern, variant function prediction, candidate gene expression, protein-protein interaction network, candidate genes from previous studies, etc. From the 13 families, 71 strong candidate genes were identified, including both known genes for neurodevelopmental disorders and novel genes, such as HTRA3, CDHR1, and ZDHHC17. The candidate genes are enriched in several gene ontology categories, such as dynein complex and synaptic membrane. Candidate genes and pathways identified in this study provide biological insight into TD etiology and potential targets for future studies.

 

 

Prevalence of neuronal ceroid lipofuscinosis

The neuronal ceroid lipofuscinoses (NCLs) are a group of fatal, typically recessive neurodegenerative lysosomal storage diseases. While clinically similar, they are genetically distinct and result from mutations in at least twelve different genes. We investigated mutations in twelve NCL genes in ~61,000 individuals represented in the Exome Aggregation Consortium (ExAC) whole exome sequencing database.

Variants extracted from ExAC were separated into pathogenic alleles and neutral polymorphisms using a decision flowchart. The analysis identified numerous variants that are annotated as pathogenic in public repositories but have a predicted frequency that is not consistent with patient studies. After filtering out the neutral polymorphic variants, carrier frequencies calculated from ExAC vary across populations and correlate well with incidence estimated from numbers of living NCL patients in the US.

Decision flowchart for identifying pathogenic NCL variants.