Skip to main content

Overview on Techniques in Cluster Analysis

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 593))

Abstract

Clustering is the unsupervised, semisupervised, and supervised classification of patterns into groups. The clustering problem has been addressed in many contexts and disciplines. Cluster analysis encompasses different methods and algorithms for grouping objects of similar kinds into respective categories. In this chapter, we describe a number of methods and algorithms for cluster analysis in a stepwise framework. The steps of a typical clustering analysis process include sequentially pattern representation, the choice of the similarity measure, the choice of the clustering algorithm, the assessment of the output, and the representation of the clusters.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Saeys Y, Inza I, Larrañaga P. (2007) Bioinformatics 23:2507–2517.

    Article  CAS  PubMed  Google Scholar 

  2. Densmore D, Heath TL. (2002) Euclid’s Elements, Green Lion Press, Santa Fe, NM.

    Google Scholar 

  3. Zhang T, Ramakrishnman R, Linvy M. (1996) In ACM SIGMOD International Conference on Management of Data.

    Google Scholar 

  4. Guha S, Rastogi R, Shim K. (1998) In ACM SIGMOD International Conference on Management of Data.

    Google Scholar 

  5. Guha S, Rastogi R, Shim K. (1999) In IEEE Conference on Data Engineering.

    Google Scholar 

  6. Kaufman L, Rousseeuw P. (1990) Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, New York.

    Google Scholar 

  7. Gonzalez MD. (2005) In Mathematics, University of Puerto Rico, Puerto Rico.

    Google Scholar 

  8. Massey L. (2002) In Recent Advances in Soft-Computing (RASC02), Nottingham, UK.

    Google Scholar 

  9. Butte AJ, Kohane IS. (2000) In Pacific Symposium on Biocomputing.

    Google Scholar 

  10. Krause EF. (1987) Taxicab Geometry, Dover Publications, Dover, UK.

    Google Scholar 

  11. MacQueen JB. (1967) In 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley.

    Google Scholar 

  12. Ball G, Hall D. (1967) Behav Sci 12:153–155.

    Article  CAS  PubMed  Google Scholar 

  13. Ng R, Han J. (1994) In Proceedings of 20th VLDB Conference, Santiago, Chile.

    Google Scholar 

  14. Lu SY, Fu KS. (1978) IEEE Trans Syst Man Cybern 8:381–389.

    Article  Google Scholar 

  15. Jain A K. (1999) ACM Comp Surv 31:264–323.

    Article  Google Scholar 

  16. Pearson K. (1896) Philos Trans Roy Soc 187:253–318.

    Article  Google Scholar 

  17. Ester M, Kriegel H, Sander J, Xu X. (1996) In 2nd International Conference On Knowledge Discovery and Data Mining (KDD’96), pp. 226–231.

    Google Scholar 

  18. Hinneburg A, Keim D. (1998) In 4th International Conference On Knowledge Discovery and Data Mining (KDD’98), pp. 58–65.

    Google Scholar 

  19. Halkidi M, Batistakis Y, Vazirgiannis M. (2001) J. Intell Inform Syst 17: 107–145.

    Article  Google Scholar 

  20. Dunn J. (1974) J Cybern 4:95–104.

    Article  Google Scholar 

  21. Knudsen S. (2002) A Biologist’s Guide to Analysis of DNA Microarray Data, John Wiley & Sons, New York.

    Google Scholar 

  22. Sheikholeslami G, Chatterjee S, Zhang A. (1998) In Proceedings of 24th VLDB Conference, pp. 428–439.

    Google Scholar 

  23. Wang W, Yang J, Muntz R. (1997) In Proceedings of 23rd VLDB Conference.

    Google Scholar 

  24. Pearson K. (1901) Philos Mag 2:559–572.

    Google Scholar 

  25. Bezdeck JC, Ehrlich R, Full W. (1984) Comput Geosci 10:191–203.

    Article  Google Scholar 

  26. Breiman L. (1996) Mach Learn 24:123–140.

    Google Scholar 

  27. Suzuki R, Shimodaira H. (2006) Bioinformatics 22:1540–1542.

    Article  CAS  PubMed  Google Scholar 

  28. Arfken G. (1985) In Mathematical Methods for Physicists, Academic Press, Orlando, FL, pp. 13–18.

    Google Scholar 

  29. Kohonen T. (1995) Self-Organizing Maps, Springer-Verlag, Heidelberg, Germany.

    Google Scholar 

  30. Herrero J, Valencia A, Dopazo J. (2001) Bioinformatics 17:126–136.

    Article  CAS  PubMed  Google Scholar 

  31. Dopazo J, Carazo JM. (1997) J Mol Evol 44:226–233.

    Article  CAS  PubMed  Google Scholar 

  32. Spearman C. (1906) Br J Psychol 2:89–108.

    Google Scholar 

  33. Kendall M. (1938) Biometrika 30:81–89.

    Google Scholar 

  34. Hall L, Özyurt I, Bezdek J. (1999) IEEE Trans Evol Comput 3:103–112.

    Article  Google Scholar 

  35. Shannon CE. (1948) Bell Syst Tech J 27:379–423 and 623–656.

    Google Scholar 

  36. Mirkin B. (1996) Mathematical Classification and Clustering, Kluwer Academic Publishers, Dordrecht, the Netherlands.

    Google Scholar 

  37. Bandeira LPC, Sousa JMC, Kaymak U. (2003) In Fuzzy Sets and Systems – IFSA 2003, Vol. 2715. Springer, Berlin.

    Google Scholar 

  38. Witten IH, Frank E. (2005) Data Mining: Practical Machine Learning Tools and Techniques, Elsevier, San Francisco.

    Google Scholar 

  39. Dash M, Choi K, Scheuermann P, Liu H. (2002) In IEEE International Conference on Data Mining (ICDM’02).

    Google Scholar 

  40. Yu L, Liu H. (2003) in Proceedings ICML, Washington, DC.

    Google Scholar 

  41. Xiong M, Fang X, Zhao J. (2001) Genome Res 11:1878–1887.

    CAS  PubMed  Google Scholar 

  42. Blanco R, Larrañaga P, Inza I, Sierra B. (2004) Int J Patt Recog. Artif Intell 18:1373–1390.

    Article  Google Scholar 

  43. Subbarao C, Subbarao NV, Chandu SN. (1995) Environ Geol 28:175–180.

    Article  Google Scholar 

  44. Fisher RA. (1936) Ann Eugen 7:179–188.

    Google Scholar 

  45. Frank I, Friedman J. (1993) Technometrics 35:109–148.

    Article  Google Scholar 

  46. Friedman JH, Tukey JW. (1974) IEEE Trans Comput 23:881–890.

    Article  Google Scholar 

  47. Wold H. (1966) In Multivariate Analysis (Krishnaiaah PR, Ed.), Academic Press, New York, pp. 391–420.

    Google Scholar 

  48. Sturn A. (2000) The Institute for Genomic Research, Rockville, MD.

    Google Scholar 

  49. Jiang D, Tang C, Zhang A. (2004) Trans Knowl Data Eng 16:1370–1386.

    Article  Google Scholar 

  50. Kullback S, Leibler RA. (1951) Ann Math Stat 22:79–86.

    Article  Google Scholar 

  51. Xu R. (2005) IEEE Trans Neural Netw 16:645–678.

    Article  PubMed  Google Scholar 

  52. Johnson SC. (1967) Psychometrika 2:241–254.

    Article  Google Scholar 

  53. Ward JH. (1963) J Am Stat Assoc 58:236–244.

    Article  Google Scholar 

  54. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrousky E, Lander ES, Golub TR. (1999) Proc Natl Acad Sci 96:2907–2912.

    Article  CAS  PubMed  Google Scholar 

  55. Fung, G. (2001) A Comprehensive Overview of Basic Clustering Algorithms. Available at http://pages.cs.wisc.edu/∼gfung/

  56. Berkhin, P. (2002) Survey of clustering data mining techniques. Technical report,Accrue.

    Google Scholar 

  57. Hertz J, Krogh A, Palmer RG. (1991) Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA.

    Google Scholar 

  58. Fritzke B. (1994) Neural Netw 7:1441–1460.

    Article  Google Scholar 

  59. Goldberg DE. (1989) Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Redwood City, CA.

    Google Scholar 

  60. Holland JH. (1975) Adaption in Natural and Artificial Systems, University of Michigan Press, Ann Arbor.

    Google Scholar 

  61. Schewefel HP. (1981) Numerical Optimization of Computer Models, John Wiley and Sons, New York.

    Google Scholar 

  62. Fogel LJ, Owens AJ, Wals MJ. (1965) Artificial Intelligence Through Simulated Evolution, John Wiley and Sons, New York.

    Google Scholar 

  63. Madeira SC, Oliveira AL. (2004) IEEE/ACM Trans Comput Biol Bioinform 1:24–45.

    Article  CAS  PubMed  Google Scholar 

  64. Davies DL, Bouldin DW. (1979) IEEE Trans Patt Recog Mach Intell 1:224–227.

    Article  Google Scholar 

  65. Dudoit S, Fridlyand J. (2003) Bioinformatics 19:1090–1099.

    Article  CAS  PubMed  Google Scholar 

  66. Duran BS, Odell PL. (1974) Cluster Analysis: A Survey, Springer-Verlag, New York.

    Google Scholar 

  67. Diday E, Simon JC. (1976) Clustering analysis. In Digital Pattern Recognition, Springer-Verlag, Secaucus, NJ.

    Google Scholar 

  68. Michalski R, Stepp RE, Diday E. (1981) In Progress in Pattern Recognition (Kanal L, Rosenfeld A, Eds.), Vol. 1, Springer-Verlag, North-Holland, New York,pp. 33–55.

    Google Scholar 

  69. Hillis D, Bull J. (1993) Syst Biol 42:182–192.

    Google Scholar 

  70. Felsenstein J, Kishino H. (1993) Syst Biol 42:193–200.

    Google Scholar 

  71. Zharkikh A, Li WH. (1992) Mol Biol Evol 9:1119–1147.

    CAS  PubMed  Google Scholar 

  72. Efron B, Halloran E, Holmes S. (1996) Proc Natl Acad Sci 93:13429–13434.

    Article  CAS  PubMed  Google Scholar 

  73. Sanderson MJ, Wojciechwski MF. (2000) Syst Biol 49:671–685.

    Article  CAS  PubMed  Google Scholar 

  74. Shimodaira H. (2002) Syst Biol 51:492–508.

    Article  PubMed  Google Scholar 

  75. Shimodaira H. (2004) Ann Stat 32:2616–2641.

    Article  Google Scholar 

  76. Suzuki R, Shimodaira H. (2004) In 15th International Conference on Genome Informatics.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Frades, I., Matthiesen, R. (2010). Overview on Techniques in Cluster Analysis. In: Matthiesen, R. (eds) Bioinformatics Methods in Clinical Research. Methods in Molecular Biology, vol 593. Humana Press. https://doi.org/10.1007/978-1-60327-194-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-194-3_5

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-193-6

  • Online ISBN: 978-1-60327-194-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics