Abstract
Clustering is the unsupervised, semisupervised, and supervised classification of patterns into groups. The clustering problem has been addressed in many contexts and disciplines. Cluster analysis encompasses different methods and algorithms for grouping objects of similar kinds into respective categories. In this chapter, we describe a number of methods and algorithms for cluster analysis in a stepwise framework. The steps of a typical clustering analysis process include sequentially pattern representation, the choice of the similarity measure, the choice of the clustering algorithm, the assessment of the output, and the representation of the clusters.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Saeys Y, Inza I, Larrañaga P. (2007) Bioinformatics 23:2507–2517.
Densmore D, Heath TL. (2002) Euclid’s Elements, Green Lion Press, Santa Fe, NM.
Zhang T, Ramakrishnman R, Linvy M. (1996) In ACM SIGMOD International Conference on Management of Data.
Guha S, Rastogi R, Shim K. (1998) In ACM SIGMOD International Conference on Management of Data.
Guha S, Rastogi R, Shim K. (1999) In IEEE Conference on Data Engineering.
Kaufman L, Rousseeuw P. (1990) Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, New York.
Gonzalez MD. (2005) In Mathematics, University of Puerto Rico, Puerto Rico.
Massey L. (2002) In Recent Advances in Soft-Computing (RASC02), Nottingham, UK.
Butte AJ, Kohane IS. (2000) In Pacific Symposium on Biocomputing.
Krause EF. (1987) Taxicab Geometry, Dover Publications, Dover, UK.
MacQueen JB. (1967) In 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley.
Ball G, Hall D. (1967) Behav Sci 12:153–155.
Ng R, Han J. (1994) In Proceedings of 20th VLDB Conference, Santiago, Chile.
Lu SY, Fu KS. (1978) IEEE Trans Syst Man Cybern 8:381–389.
Jain A K. (1999) ACM Comp Surv 31:264–323.
Pearson K. (1896) Philos Trans Roy Soc 187:253–318.
Ester M, Kriegel H, Sander J, Xu X. (1996) In 2nd International Conference On Knowledge Discovery and Data Mining (KDD’96), pp. 226–231.
Hinneburg A, Keim D. (1998) In 4th International Conference On Knowledge Discovery and Data Mining (KDD’98), pp. 58–65.
Halkidi M, Batistakis Y, Vazirgiannis M. (2001) J. Intell Inform Syst 17: 107–145.
Dunn J. (1974) J Cybern 4:95–104.
Knudsen S. (2002) A Biologist’s Guide to Analysis of DNA Microarray Data, John Wiley & Sons, New York.
Sheikholeslami G, Chatterjee S, Zhang A. (1998) In Proceedings of 24th VLDB Conference, pp. 428–439.
Wang W, Yang J, Muntz R. (1997) In Proceedings of 23rd VLDB Conference.
Pearson K. (1901) Philos Mag 2:559–572.
Bezdeck JC, Ehrlich R, Full W. (1984) Comput Geosci 10:191–203.
Breiman L. (1996) Mach Learn 24:123–140.
Suzuki R, Shimodaira H. (2006) Bioinformatics 22:1540–1542.
Arfken G. (1985) In Mathematical Methods for Physicists, Academic Press, Orlando, FL, pp. 13–18.
Kohonen T. (1995) Self-Organizing Maps, Springer-Verlag, Heidelberg, Germany.
Herrero J, Valencia A, Dopazo J. (2001) Bioinformatics 17:126–136.
Dopazo J, Carazo JM. (1997) J Mol Evol 44:226–233.
Spearman C. (1906) Br J Psychol 2:89–108.
Kendall M. (1938) Biometrika 30:81–89.
Hall L, Özyurt I, Bezdek J. (1999) IEEE Trans Evol Comput 3:103–112.
Shannon CE. (1948) Bell Syst Tech J 27:379–423 and 623–656.
Mirkin B. (1996) Mathematical Classification and Clustering, Kluwer Academic Publishers, Dordrecht, the Netherlands.
Bandeira LPC, Sousa JMC, Kaymak U. (2003) In Fuzzy Sets and Systems – IFSA 2003, Vol. 2715. Springer, Berlin.
Witten IH, Frank E. (2005) Data Mining: Practical Machine Learning Tools and Techniques, Elsevier, San Francisco.
Dash M, Choi K, Scheuermann P, Liu H. (2002) In IEEE International Conference on Data Mining (ICDM’02).
Yu L, Liu H. (2003) in Proceedings ICML, Washington, DC.
Xiong M, Fang X, Zhao J. (2001) Genome Res 11:1878–1887.
Blanco R, Larrañaga P, Inza I, Sierra B. (2004) Int J Patt Recog. Artif Intell 18:1373–1390.
Subbarao C, Subbarao NV, Chandu SN. (1995) Environ Geol 28:175–180.
Fisher RA. (1936) Ann Eugen 7:179–188.
Frank I, Friedman J. (1993) Technometrics 35:109–148.
Friedman JH, Tukey JW. (1974) IEEE Trans Comput 23:881–890.
Wold H. (1966) In Multivariate Analysis (Krishnaiaah PR, Ed.), Academic Press, New York, pp. 391–420.
Sturn A. (2000) The Institute for Genomic Research, Rockville, MD.
Jiang D, Tang C, Zhang A. (2004) Trans Knowl Data Eng 16:1370–1386.
Kullback S, Leibler RA. (1951) Ann Math Stat 22:79–86.
Xu R. (2005) IEEE Trans Neural Netw 16:645–678.
Johnson SC. (1967) Psychometrika 2:241–254.
Ward JH. (1963) J Am Stat Assoc 58:236–244.
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrousky E, Lander ES, Golub TR. (1999) Proc Natl Acad Sci 96:2907–2912.
Fung, G. (2001) A Comprehensive Overview of Basic Clustering Algorithms. Available at http://pages.cs.wisc.edu/∼gfung/
Berkhin, P. (2002) Survey of clustering data mining techniques. Technical report,Accrue.
Hertz J, Krogh A, Palmer RG. (1991) Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA.
Fritzke B. (1994) Neural Netw 7:1441–1460.
Goldberg DE. (1989) Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Redwood City, CA.
Holland JH. (1975) Adaption in Natural and Artificial Systems, University of Michigan Press, Ann Arbor.
Schewefel HP. (1981) Numerical Optimization of Computer Models, John Wiley and Sons, New York.
Fogel LJ, Owens AJ, Wals MJ. (1965) Artificial Intelligence Through Simulated Evolution, John Wiley and Sons, New York.
Madeira SC, Oliveira AL. (2004) IEEE/ACM Trans Comput Biol Bioinform 1:24–45.
Davies DL, Bouldin DW. (1979) IEEE Trans Patt Recog Mach Intell 1:224–227.
Dudoit S, Fridlyand J. (2003) Bioinformatics 19:1090–1099.
Duran BS, Odell PL. (1974) Cluster Analysis: A Survey, Springer-Verlag, New York.
Diday E, Simon JC. (1976) Clustering analysis. In Digital Pattern Recognition, Springer-Verlag, Secaucus, NJ.
Michalski R, Stepp RE, Diday E. (1981) In Progress in Pattern Recognition (Kanal L, Rosenfeld A, Eds.), Vol. 1, Springer-Verlag, North-Holland, New York,pp. 33–55.
Hillis D, Bull J. (1993) Syst Biol 42:182–192.
Felsenstein J, Kishino H. (1993) Syst Biol 42:193–200.
Zharkikh A, Li WH. (1992) Mol Biol Evol 9:1119–1147.
Efron B, Halloran E, Holmes S. (1996) Proc Natl Acad Sci 93:13429–13434.
Sanderson MJ, Wojciechwski MF. (2000) Syst Biol 49:671–685.
Shimodaira H. (2002) Syst Biol 51:492–508.
Shimodaira H. (2004) Ann Stat 32:2616–2641.
Suzuki R, Shimodaira H. (2004) In 15th International Conference on Genome Informatics.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Frades, I., Matthiesen, R. (2010). Overview on Techniques in Cluster Analysis. In: Matthiesen, R. (eds) Bioinformatics Methods in Clinical Research. Methods in Molecular Biology, vol 593. Humana Press. https://doi.org/10.1007/978-1-60327-194-3_5
Download citation
DOI: https://doi.org/10.1007/978-1-60327-194-3_5
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-60327-193-6
Online ISBN: 978-1-60327-194-3
eBook Packages: Springer Protocols