Introduction and objectives New methods of high throughput sequencing provide unparalleled access to the human genome and transcriptome. We hypothesised that next generation DNA sequencing technologies would allow us to identify an elusive novel disease gene for a pulmonary vascular disease inherited as an autosomal dominant trait: The HHT3 interval on chromosome 5 is predicted by linkage studies to contain a mutation causing pulmonary arteriovenous malformations and hereditary haemorrhagic telangiectasia.1,2
Methods Published expressed sequence tag (EST) databases and tiling array data were used to supplement sequencing analysis of the HHT3 interval. Sheared, Agilent SureSelect adaptor-ligated genomic DNAs from six related patients and four controls were hybridised to single stranded biotinylated RNA baits. Samples were pooled for multiplexed sequencing on an Illumina HiSeq2000. Sequence data were processed with RTA version 1.7.45, CASAVA Eland pair algorithm, and CASAVA 1.7 demultiplexing algorithms. Validations of sequence variants were performed using conventional PCR and Sanger sequencing.
Results Conventional exon-based sequencing strategies did not identify the HHT3 causative gene mutation. For individual candidate genes, up to 108 alternatively spliced transcripts per gene were predicted from EST databases. For intergenic regions, tiling array data indicated that up to 44 different transcribed fragments were present in the nucleus and/or cytoplasm of different cell types. For each NextGen sequencing DNA sample, ~8 million reads per sample uniquely mapped to the HHT3 interval which represents ~1/5,000 of the genome. Using a 2:1 threshold, an average of ~4,000 differences to NCBI36/hg18 were identified in each sample. 113 differences to NCBI36/hg18 were present in all six HHT3-affected individuals and absent in all four controls. 60% of novel shared variants were validated by wet lab PCR. Following exclusion in 100 normal chromosomes, and computational predictions of potential function, multiple candidate sequence variants remain.
Conclusions Genomic sequencing capturing intronic sequences yields challenging numbers of sequence variants for wet lab validations, even when multiple replicate chromosomal strategies are employed.
Cole et al. J Med Genet 2005;42(7):577–582
Govani et al. J Angiogen Res 2010;11(2):15