SysCLFT (Systems tool for Cleft Lip/palate gene discovery)
Orofacial clefts are one of the most common birth defects in the U.S., occurring in 1/800 live-births with a lifetime cost for medical treatment, educational services and lost productivity of more than $100,000 per affected person. Orofacial clefts can occur as primary phenotype – classified as “isolated” or “non-syndromic” – or as one among several phenotypes in a syndrome. There are over 400 syndromic forms of orofacial clefting, however, efforts thus far using linkage analyses have only successfully identified around 40 loci. Moreover, the genetic etiology of non-syndromic manifestation of these malformations has remained largely elusive. Thus a large deficit exists in our knowledge of genes that are associated to these structural birth defects.
The identification of genetic mutations is traditionally based on linkage analysis and sequencing of candidate genomic regions in patient tissue or animal model. This is typically time-consuming and labor-intensive, and prioritization of candidate genes within an interval is often tricky. Since many human genetic diseases are caused by aberrant developmental processes, we hypothesize that innovative processing of expression datasets in embryonic development of mammalian models will allow the identification of human disease-associated genes.
Based on this principle, we have developed an effective strategy to prioritize candidate disease associated genes based on microarray gene expression profiling on embryonic tissues. To make this tool available to the community, we have developed a web-based public resource termed SysCLFT (Systems tool for Cleft Lip/palate gene discovery) that allows efficient identification of genes associated with congenital orofacial clefts.
What is SysCLFT based on?
SysCLFT is based on a strategy termed “in silico subtraction” which is analogous to the principle of identification of cell type enriched genes by cDNA subtraction. SysCLFT presently utilizes microarray gene expression profiles of existing FaceBase mouse embryonic isolated craniofacial (CF) tissue datasets (palate, maxillary, mandible, and frontonasal). We also generated a microarray dataset from total RNA extracted and pooled in equimolar ratios from mouse whole body (WB) tissue at E10.5, E11.5 and E12.5. We identified differentially regulated genes by comparing CF microarray profiles to those representing WB. The CF specific profiles were “subtracted” from the WB control using a moderated t-test and a CF-tissue enrichment p-value was assigned to each gene. t-statistics were used to rank genes for CF enrichment. These were then utilized to generate a ranked list of CF-enriched genes, which can be viewed as SysCLFT tracks in the UCSC Genome browser to aid identification of genes with function associated with CF biology and morphogenesis.
The significance of using WB reference datasets for in silico subtraction
WB datasets as described above are comprised of multiple developing tissues and thus represent ideal “basal” gene expression profile datasets, Therefore, comparison of tissue-specific profiles against the WB control facilitates identification of tissue-specific gene expression. In a different study, we have shown that an in silico subtracted mouse lens database is an elegant tool to identify lens-enriched genes (http://bioinformatics.udel.edu/Research/iSyTE) that play key roles in lens biology, and for identification and prioritization of potential candidate genes harboring mutations at mapped human cataract loci. This is consistent with the idea that selective gene expression in a tissue may be reflective of a function in the development or function of the tissue.
How well does SysCLFT work for orofacial cleft gene discovery?
We tested the utility of SysCLFT to identify genes associated with face development and human craniofacial defects. These analyses demonstrate that the top 500 highly ranked genes after WB subtraction were enriched for candidate genes relevant to face and palate development, without being enriched for genes encoding miscellaneous house keeping factors. Interestingly, 41 of 45 genes (91%) linked to CF morphogenesis are identified at higher expression ranks with WB-subtraction compared to only 4 of 45 (9%) without WB-subtraction, highlighting effectiveness of this approach. Specifically, majority of genes with established role in CF development (e.g. Fgf7, Irf6, Pax7, Pax9, Pbx2, Pbx3, Msx1, Runx2, Satb2, Tbx22, etc.) were highly enriched in WB-subtracted datasets from different facial tissues. Most significantly, we find that in silico subtraction successfully identifies a majority (85%, n=45) of clefting and CF candidate genes within the top 5 minRank genes in mean chromosomal intervals of 13Mb, each containing 106 genes on average (Table 1).
These data demonstrate that our in silico subtraction method: 1) can be successfully applied to isolated CF tissues (palate, maxillary, mandible, frontonasal), 2) specifically identifies genes involved in CF development regardless of high or low absolute expression, while filtering out genes with high expression that are not CF tissue-specific (housekeeping genes), and 3) significantly improves prioritization of potential disease genes.