A new method for autozygosity mapping using single nucleotide polymorphisms (SNPs) and ExcludeAR

C G Woods; E M Valente; J Bond; E Roberts

doi:10.1136/jmg.2003.016873

Article Text

PDF

Electronic letters

A new method for autozygosity mapping using single nucleotide polymorphisms (SNPs) and ExcludeAR

C G Woods1,
E M Valente2,
J Bond1,
E Roberts1

¹Molecular Medicine Unit, St James’s University Hospital, Leeds, UK
²IRCCS CSS, San Giovanni Rotondo and CSS Mendel, Rome, Italy

Correspondence to:  C G Woods  Molecular Medicine Unit, Clinical Sciences Building, St James’s University Hospital, Beckett Street, Leeds LS9 7TF, UK; msjcgwleeds.ac.uk

https://doi.org/10.1136/jmg.2003.016873

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

SNP, single nucleotide polymorphism

The development of a silicon chip, such as the Affymetrix 10K Xba 131, bearing sufficient oligonucleotides to analyse 10 913 single nucleotide polymorphisms (SNPs) presents a new method for seeking autosomal recessive loci.¹ This letter describes a practical strategy to analyse the data output of such an “SNP-chip” for this purpose.

Autozygosity mapping, first suggest by Lander and Botstein, is the method of choice for the discovery of autosomal recessive gene loci.² The methodology seeks homozygous regions in consanguineous families. The greater the number of affected individuals who have a shared homozygous region and the greater the size of the region, the more likely it is to harbour the mutation that causes the disease. Mueller and Bishop modelled the use of a single multi-affected family and suggested that this was the most efficient strategy to determine a disease locus, particularly given the complexities of genetic heterogeneity.³ Autozygosity mapping became practical with the discovery of multiple highly polymorphic microsatellite repeat markers spread throughout the genome.⁴ Most researchers currently use optimised panels of approximately 400 markers for an initial genome-wide screen for linkage, giving 10–12 cM coverage of the autosomal genome; a process that has lead to the discovery of many recessive loci.⁵

The currently available SNP-chip detects SNPs spread throughout the genome (with the exception of the Y chromosome) and is analysed following a single hybridisation reaction with one individual’s genomic DNA. The results are produced as a simple spreadsheet of the SNP allele calls. Whilst each SNP has far less power to detect a homozygous chromosomal segment than a microsatellite marker, it is both their number (10 913 SNPs are equivalent to a 3–4 cM microsatellite marker map⁶) and their ability to detect a heterozygous region, and hence exclude linkage, that suggested their potential use in autozygosity mapping. An average microsatellite marker has a 70% chance of detecting a heterozygous region, but the approximately 30 SNPs within the same region have a >99% chance of detecting heterozygosity. (The chance that one of the 30 SNPs will be heterozygous is 1–0.7³⁰, which is >0.999, and on average nine of the 30 SNPs will be heterozygous.)

We have designed the following method to adapt Affymetrix SNP-chip output for autozygosity mapping.

Only affected individuals from a multi-affected pedigree are analysed; in general the minimum sample analysis necessary is four from a single sibship or three from two or more sibships. Parents and unaffected siblings are not analysed.
The primary results from each individual’s SNP-chip hybridisation are produced as a simple Excel spreadsheet. For each individual the results are then sorted by chromosome and genetic distance using a simple “data sort” Excel command. The column of sorted SNP allele calls is “copied.”
The sorted data are processed using ExcludeAR, a freeware spreadsheet we created for this purpose. A separate ExcludeAR spreadsheet is available for the analysis of one, two, three, and four affected individuals (see Appendix 1).
There are four versions of ExcludeAR (AR1–4) for the interpretation of data from one affected individual (AR1), two individuals (AR2), three (AR3), and four (AR4) individuals. The sorted primary SNP allele data from step 2 is pasted into ExcludeAR1 cell F:19 for one individual; for two individuals into ExcludeAR2 cells F:19 and G:19; for three individuals into ExcludeAR3 cells F:19, G:19, and H:19; or for four individuals into ExcludeAR4 cells F:19, G:19, H:19, and I:19.
ExcludeAR first detects runs of consecutive homozygous SNP allele calls identical in all of the affected individuals analysed. It then determines if each run is of statistical significance (Table 1 and Appendix 2 explain how we derived this data). For instance, for a pair of individuals, say consanguineous cousins, a run of 12 or more homozygous SNPs in both would occur only once in 1000 analyses by chance rather than being identical by descent.
ExcludeAR lists the 10 largest homozygous SNP runs by genetic size. For each result the following are given: genetic size, chromosome, genetic location on chromosome, number of homozygous SNPs in run, number of “NoCalls” in the run and whether the result reaches statistical significance. Two graphs are generated: the first shows the genetic size versus number of homozygous SNPs for the statistically significant results; the second shows the size of all statistically significant results by chromosome.
The autosomal recessive disease gene sought could be located within any of the statistically significant homozygous segments detected.
Table 1 gives an estimate of the minimum size of a homozygous region that could be detected using this method for different family structures. ExcludeAR also scans for potential homozygous deletions present in all affected individuals analysed (see Table 1 and Appendix 1).
Any regions of statistically significant homozygosity may be further analysed by conventional polymorphic microsatellite analysis. The location of the SNPs is given by reference to the Human Genome Browser, and as a DeCode genetic distance, enabling the design or selection of suitable markers.⁷

View this table:

Table 1

The probability that a run of consecutive, concordant and homozygous SNP allele calls will occur by chance

Key points

Autosomal recessive disease gene loci can be found by analysis of affected members of consanguineous families using autozygosity mapping.
Autozygosity mapping can be performed using SNPs, particularly SNP-chips bearing thousands of SNPs.
We have devised a method to analyse the raw output of an Affymetrix 10K SNP-chip using a freely available spreadsheet, ExcludeAR.
ExcludeAR detects significant regions of homozygosity in one, two, three, or four affected family members.

The approach outlined here is undoubtedly a simplification. For instance, the possibility of genetic interference between neighbouring SNPs is ignored. It should, however, provide a practical method for analysis of SNP-chip output to detect significant homozygous segments, and hence locate recessive gene loci. ExcludeAR is free and available by download from http://leedsdna.info/science/Autozygosity/spreadsheets.htm

APPENDIX 1

The ExcludeAR program was designed using the principle of the first Exclude program written by JH Edwards to detect linkage in autosomal dominant conditions; namely that after excluding regions of non-linkage any chromosomal regions that remain must contain the locus sought. This is achieved in ExcludeAR by the detection of heterozygous or homozygous but discordant results for each consecutive SNP. The SNPs that remain are homozygous and concordant, and when consecutive can be summed. The genetic distance from the start of the run to the end can be calculated. The minimum number of consecutive and concordant homozygous SNPs detected is set at 10 for analysis of one individual, but is reduced for versions of ExcludeAR analysing data from more than one person (see below). The results are ranked by genetic distance. The 10 largest regions of SNP homozygosity are shown together with the chromosome and number of SNPs involved. SNPs for which there are no results are scored as either AA, if no individual has a result, or as any other individual being analysed, i.e. “No Call”/AB, would be scored AB/AB. This permissive approach may lead to an overestimation of the number of homozygous SNPs in a run, but as >92% of SNPs are called per analysis the effect should be small. Furthermore, when the spreadsheet assesses the significance of a result it does so for the number of homozygous SNPs minus the number of “NoCalls”.

The ExcludeAR program is given in four versions, for the data from one, two, three, and four individuals. ExcludeAR2, for two individual analyses, is set to detect a minimum run size of 12 SNPs; ExcludeAR3 for seven SNPs; and ExcludeAR4 for four SNPs (see Tables 1 and 2). ExcludeAR will also alert to the possibility of homozygous deletions, detected as runs of “NoCall” in all individuals analysed. Graphs illustrate the major findings by SNP number and genetic size, and by genetic size on each chromosome.

View this table:

Table 2

The probability that consecutive SNPs would be homozygous and concordant by chance in one, two, three and four siblings

APPENDIX 2

The probability that a number of consecutive SNPs would be concordant and homozygous by chance was assessed in a singleton, two, three and four siblings and a non-sibling pair, a sib pair and non-sibling, three non-siblings, and four non-siblings. The results are summarised in Table 1 and shown in full in Tables 2 and 3. They were generated by calculation of the first individual’s chance of being homozygous for an SNP, which is 0.70 (0.35 for AA and 0.35 for BB, our data from six consanguineous northern Pakistani individuals), multiplied by [10 913 – (n–1)] (where 10 913 is the total number of autosomal SNPs on the SNP-chip and n the number of SNPs in the run being analysed), multiplied by the chance that other individuals would be concordantly homozygous. This result for one SNP is then multiplied with the results for the consecutive SNP also being homozygous in an iterative manner until a probability of <1 in 1000 was achieved. The probability of 1 in 1000 was chosen because it is in common use as the LOD score of 3, which is regarded as significant when seeking conventional linkage.

View this table:

Table 3

The probability that consecutive SNPs would be homozygous and concordant by chance in families with two to four individuals in different siblings

The probability that a number of consecutive SNPs would be concordant, homozygous and identical by chance is given for the following: a singleton, two, three and four siblings and a non-sibling pair, a sib pair and non-sibling, three non-siblings, and four non-siblings. In each family situation the smallest number of consecutive homozygous SNPs to occur less commonly than 1 in 1000 by chance is shown in bold.

Acknowledgments

We thank Andrew Dearlove and Jo McBride at the MRC Geneservice for performing the SNP-chip analysis that led to this letter and Graham Taylor, Chris Inglehearn, Carmel Toomes, and Tim Bishop for helpful comments and discussions. We also thank the Wellcome Trust for funding this research.

REFERENCES

↵
Affymetrix Genechip Human Mapping 10K array Xba1 131 details at: http://www.hgmp.mrc.ac.uk/geneservice/DNAservices/MRCg_10Kv1_2.pdf (accessed 14 May 2004).
↵
Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science1987;236:1567–70.
OpenUrl Abstract/FREE Full Text
↵
Mueller RF, Bishop DT. Autozygosity mapping, complex consanguinity, and autosomal recessive disorders. J Med Genet1993;30:798–9.
OpenUrl FREE Full Text
↵
Weber JL, May PE. Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am J Hum Genet1989;44:388–96.
OpenUrl PubMed Web of Science
↵
Sheffield VC, Stone EM, Carmi R. Use of isolated inbred human populations for identification of disease genes. Trends Genet1998;14:391–6.
OpenUrl CrossRef PubMed Web of Science
↵
Sellick GS, Garrett C, Houlston RS. A novel gene for neonatal diabetes maps to chromosome 10p12.1-p13. Diabetes2003;52:2636–8.
OpenUrl Abstract/FREE Full Text
↵
Kong A , Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K. A high-resolution recombination map of the human genome. Nat Genet2002;31:241–7 Epub.
OpenUrl CrossRef PubMed Web of Science
Collins A , Frezal J, Teague J, Mortan NE. A matric map of humans: 23 500 loci in 850 bands. Proc Natl Acad Sci USA1996;93:14771–5.
OpenUrl Abstract/FREE Full Text

Footnotes

Conflict of interest: none declared

[1] ↵
Affymetrix Genechip Human Mapping 10K array Xba1 131 details at: http://www.hgmp.mrc.ac.uk/geneservice/DNAservices/MRCg_10Kv1_2.pdf (accessed 14 May 2004).

[2] ↵
Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science1987;236:1567–70.
OpenUrl Abstract/FREE Full Text

[3] ↵
Mueller RF, Bishop DT. Autozygosity mapping, complex consanguinity, and autosomal recessive disorders. J Med Genet1993;30:798–9.
OpenUrl FREE Full Text

[4] ↵
Weber JL, May PE. Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am J Hum Genet1989;44:388–96.
OpenUrl PubMed Web of Science

[5] ↵
Sheffield VC, Stone EM, Carmi R. Use of isolated inbred human populations for identification of disease genes. Trends Genet1998;14:391–6.
OpenUrl CrossRef PubMed Web of Science

[6] ↵
Sellick GS, Garrett C, Houlston RS. A novel gene for neonatal diabetes maps to chromosome 10p12.1-p13. Diabetes2003;52:2636–8.
OpenUrl Abstract/FREE Full Text

[7] ↵
Kong A , Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K. A high-resolution recombination map of the human genome. Nat Genet2002;31:241–7 Epub.
OpenUrl CrossRef PubMed Web of Science

[8] Collins A , Frezal J, Teague J, Mortan NE. A matric map of humans: 23 500 loci in 850 bands. Proc Natl Acad Sci USA1996;93:14771–5.
OpenUrl Abstract/FREE Full Text

Log in using your username and password

Main menu

Log in using your username and password

You are here

Statistics from Altmetric.com

Request Permissions

Key points

APPENDIX 1

APPENDIX 2

Acknowledgments

REFERENCES

Footnotes

Read the full text or download the PDF:

Log in using your username and password