One interest of my lab is to identify disease-causing genes using genome-wide data. Through collaborations, I am conducting several studies aiming at understanding the genetic etiology of Mendelian and complex diseases using genome sequencing data. Here are a few examples:


Genetics of Chronic Obstructive Pulmonary Disease

Chronic obstructive pulmonary disease (COPD) is characterized by an irreversible airflow limitation in response to inhalation of noxious stimuli, such as cigarette smoke. However, only 15-20% smokers manifest COPD, suggesting a role for genetic predisposition. We performed whole exome sequencing in 62 highly susceptible smokers and 30 exceptionally resistant smokers to identify rare variants which contribute to disease risk or resistance to cigarette smoke.

Using an integrative approach including in silico (whole exome sequencing and airway transcriptomics) and in vitro (cigarette smoke-induced cytotoxicity in miRNA knockdown cell lines), we identified two candidate genes (TACC2 and MYO1E) that augment cigarette smoke-induced cytotoxicity, and potentially COPD susceptibility.

Identifying candidate COPD genes through genomic and functional approaches.


Prevalence of neuronal ceroid lipofuscinosis

The neuronal ceroid lipofuscinoses (NCLs) are a group of fatal, typically recessive neurodegenerative lysosomal storage diseases. While clinically similar, they are genetically distinct and result from mutations in at least twelve different genes. We investigated mutations in twelve NCL genes in ~61,000 individuals represented in the Exome Aggregation Consortium (ExAC) whole exome sequencing database.

Variants extracted from ExAC were separated into pathogenic alleles and neutral polymorphisms using a decision flowchart. The analysis identified numerous variants that are annotated as pathogenic in public repositories but have a predicted frequency that is not consistent with patient studies. After filtering out the neutral polymorphic variants, carrier frequencies calculated from ExAC vary across populations and correlate well with incidence estimated from numbers of living NCL patients in the US.

Decision flowchart for identifying pathogenic NCL variants.