Xiting Yan, PhD

Assistant Professor of Medicine (Pulmonary)

Research Interests

Genetics; Lung Diseases; Respiratory Hypersensitivity; Computational Biology; Genomics; Biostatistics; Molecular Medicine

Public Health Interests

Bioinformatics; Biomarkers; Biostatistics; Data analysis; Genetic epidemiology; Genetics; Genomics; Microarray; Microbial Ecology; Risk assessment; Statistical genetics; Statistical models

Research Organizations

Pulmonary, Critical Care & Sleep Medicine: Kaminski Lab

School of Public Health

Research Summary

My current research focus on developing novel statistical and computational models to analyze large scale genetic and genomic data from patients with chronic lung diseases including asthma, idiopathic pulmonary fibrosis (IPF), sarcoidosis and pediatric cystic fibrosis.

In the study on asthma collaborated with Dr. Geoffrey Chupp, we identified three subtypes of asthma or TEA clusters using gene expression data from the induced sputum and blood: those with high risk of having near-fatal asthma attacks, those with severe symptoms of asthma, and those with milder asthma. In addition, by analyzing the gene expression in the blood, we could design blood test to identify the asthma subtypes of patient to optimize the choice of treatment or drugs. Ultimately, this could lead to personalized treatment for asthma patients. A novel pathway-based clustering method was developed to achieve these results which has been compared to traditional pathway-based clustering methods to show better robustness and accuracy using both simulated data and real datasets. Currently, longitudinal induced sputum and whole blood samples are being collected from patients, which are prepared for RNA sequencing. To analyze these data, we are developing novel statistical and computational approaches to identify genetic information from the longitudinal RNA sequencing data and integrate it with the transcriptional profiles from the same data set to identify time invariant molecular endotypes of asthma.

In the study on IPF and Sarcoidosis collaborated with Dr. Naftali Kaminski, we are trying to understand the genomics and genetics of the patients. The second generation sequencing technology was used to measure both the gene expression levels and the sequence mutations in the patients. My computational team is currently working on preprocessing and analyzing these sequencing data to better understand the disease heterogeneity and pathogenesis using network analysis approaches, data integration analysis and longitudinal data analysis.

In the study on pediatric cystic fibrosis, patients provide weekly surveys and clinical visits to provide sputum and stool samples. These samples were sequenced to understand what bacteria exist, how they change over time and whether they behave differently between children with and without cystic fibrosis. My computational team is currently working on developing statistical and computational approach to analyze the longitudinal 16s rRNA sequencing data.

Extensive Research Description

Analysis of longitudinal RNA sequencing data from asthma patients;

Analysis of longitudinal gene expression data from asthma patients under bronchial thermoplasty procedures;

Analysis of longitudinal microbiome sequencing data from children with cystic fibrosis;

RNA sequencing of IPF, A1AT and SARC patients using Ion Torrent technology;

Single cell RNA sequencing data analysis;

Selected Publications

Full List of PubMed Publications

Edit this profile

Contact Info

Xiting Yan, PhD
Research Image 1

B). Heatmap showing the clustering results by KEGG pathways using MCLUST. The color represents the clustering assignment of each sample by the KEGG pathways. C). Pathway based distance matrix among the clusters. The color of entry represents the pathway based distance between the corresponding two samples. Red represents a small distance (samples are strongly related) and white represents longer distance showing the strength of the clusters (samples are weakly related). Samples within TEA cluster 3 are the most strongly related and most homogeneous, followed by cluster 1 and 2, respectively.