Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

Michael J. Brusco; Hans-Friedrich Köhn

doi:10.1007/s11336-009-9126-z

Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

Published online by Cambridge University Press: 01 January 2025

Michael J. Brusco and

Hans-Friedrich Köhn

Show author details

Michael J. Brusco*: Affiliation:
Florida State University
Hans-Friedrich Köhn: Affiliation:
University of Missouri-Columbia
*: Requests for reprints should be sent to Michael J. Brusco, Department of Marketing, College of Business, Florida State University, Tallahassee, FL 32306-1110, USA. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The clique partitioning problem (CPP) requires the establishment of an equivalence relation for the vertices of a graph such that the sum of the edge costs associated with the relation is minimized. The CPP has important applications for the social sciences because it provides a framework for clustering objects measured on a collection of nominal or ordinal attributes. In such instances, the CPP incorporates edge costs obtained from an aggregation of binary equivalence relations among the attributes. We review existing theory and methods for the CPP and propose two versions of a new neighborhood search algorithm for efficient solution. The first version (NS-R) uses a relocation algorithm in the search for improved solutions, whereas the second (NS-TS) uses an embedded tabu search routine. The new algorithms are compared to simulated annealing (SA) and tabu search (TS) algorithms from the CPP literature. Although the heuristics yielded comparable results for some test problems, the neighborhood search algorithms generally yielded the best performances for large and difficult instances of the CPP.

Keywords

equivalence relation clique partitioning clustering heuristics tabu search simulated annealing neighborhood search

Type: Theory and Methods
Information: Psychometrika , Volume 74 , Issue 4 , December 2009 , pp. 685 - 703

DOI: https://doi.org/10.1007/s11336-009-9126-z [Opens in a new window]
Copyright: Copyright © 2009 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Arabie, P., Hubert, L., De Soete, G. (1996). An overview of combinatorial data analysis. In Arabie, P., Hubert, L.J., De Soete, G. (Eds.), Clustering and classification (pp. 5–64). River Edge: World Scientific.CrossRef Google Scholar

Barthélemy, J.-P., Monjardet, B. (1981). The median procedure in cluster analysis and social choice theory. Mathematical Social Sciences, 1, 235–267.CrossRef Google Scholar

Barthélemy, J.-P., Monjardet, B. (1988). The median procedure in data analysis: new results and open problems. In Bock, H.H. (Eds.), Classification and related methods in data analysis (pp. 309–316). Amsterdam: North-Holland.Google Scholar

Barthélemy, J.-P., Monjardet, B. (1995). The median procedure for partitions. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 19, 3–34.CrossRef Google Scholar

Blake, C.L., & Merz, C.J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html.Google Scholar

Borda, J.C. (1784). Mèmoire sur les élections au scrutin. Histoire de l’académie royale des sciences pour 1781. Paris.Google Scholar

Brusco, M.J., Jacobs, L.W., Bongiorno, R.J., Lyons, D.V., Tang, B. (1995). Improving personnel scheduling at airline stations. Operations Research, 43, 741–751.CrossRef Google Scholar

Brusco, M.J., Köhn, H.-F. (2008). Optimal partitioning of a data set based on the p-median model. Psychometrika, 73, 89–105.CrossRef Google Scholar

Brusco, M.J., Köhn, H.-F. (2008). Comment on ‘Clustering by passing messages between data points’. Science, 319, 726.CrossRef Google Scholar PubMed

Brusco, M.J., Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72, 583–600.CrossRef Google Scholar

Brusco, M.J., Steinley, D. (2007). A variable neighborhood search method for generalized blockmodeling of two-mode binary matrices. Journal of Mathematical Psychology, 51, 325–338.CrossRef Google Scholar

Charon, I., Hudry, O. (2006). Noising methods for a clique partitioning problem. Discrete Applied Mathematics, 154, 754–769.CrossRef Google Scholar

Condorcet, M.J.A.N. (1785). Caritat, marquis de Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Paris.Google Scholar

De Amorim, S.G., Barthélemy, J.-P., Ribeiro, C.C. (1992). Clustering and clique partitioning: Simulated annealing and tabu search approaches. Journal of Classification, 9, 17–41.CrossRef Google Scholar

Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B, 39, 1–38.CrossRef Google Scholar

Dorndorf, U., Pesch, E. (1994). Fast clustering algorithms. ORSA Journal on Computing, 6, 141–153.CrossRef Google Scholar

Forgy, E.W. (1965). Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768.Google Scholar

Garcia, C.G., Pérez-Brito, D., Campos, V., Marti, R. (2006). Variable neighborhood search for the linear ordering problem. Computers and Operations Research, 33, 3549–3565.CrossRef Google Scholar

Glover, F. (1989). Tabu search—Part I. ORSA Journal on Computing, 1, 190–206.CrossRef Google Scholar

Glover, F. (1990). Tabu search—Part II. ORSA Journal on Computing, 2, 4–32.CrossRef Google Scholar

Gower, J.C., Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 5, 5–48.CrossRef Google Scholar

Grim, J. (2006). EM cluster analysis for categorical data. In Yeung, D.-Y., Kwok, J.T., Fred, A.L.N., Roll, F., de Ridder, D. (Eds.), Structural, syntactic, and statistical pattern recognition (pp. 640–648). Berlin: Springer.CrossRef Google Scholar

Grötschel, M., Wakabayashi, Y. (1989). A cutting plane algorithm for a clustering problem. Mathematical Programming, 45, 59–96.CrossRef Google Scholar

Grötschel, M., Wakabayashi, Y. (1990). Facets of the clique partitioning polytope. Mathematical Programming, 47, 367–387.CrossRef Google Scholar

Hansen, P., Mladenović, N. (1997). Variable neighborhood search for the p-median. Location Science, 5, 207–226.CrossRef Google Scholar

Hansen, P., Mladenović, N. (2001). J-Means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognition, 34, 405–413.CrossRef Google Scholar

Hartigan, J.A. (1975). Clustering algorithms, New York: Wiley.Google Scholar

Hartigan, J.A., Wong, M.A. (1979). Algorithm AS136: a K-means clustering program. Applied Statistics, 28(1), 100–108.CrossRef Google Scholar

ILOG (1999). ILOG CPLEX 6.5 User’s manual. Mountain View, CA: Author.Google Scholar

Jacobs, L.W., Brusco, M.J. (1995). Note: A local-search heuristic for large set-covering problems. Naval Research Logistics, 42, 1129–1140.3.0.CO;2-M>CrossRef Google Scholar

Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241–254.CrossRef Google Scholar PubMed

Kaufman, L., Rousseeuw, P.J. (1990). Finding groups in data: an introduction to cluster analysis, New York: Wiley.CrossRef Google Scholar

Kemeny, J.G. (1959). Mathematics without numbers. Daedalus, 88, 577–591.Google Scholar

Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P. (1983). Optimization by simulated annealing. Science, 220, 671–680.CrossRef Google Scholar PubMed

Klastorin, T. (1985). The p-median problem for cluster analysis: a comparative test using the mixture model approach. Management Science, 31, 84–95.CrossRef Google Scholar

Kochenberger, G., Glover, F., Alidaee, B., Wang, H. (2005). Clustering of microarray data via clique partitioning. Journal of Combinatorial Optimization, 10, 77–92.CrossRef Google Scholar

MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Le Cam, L.M., Neyman, J. (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (pp. 281–297). Berkeley: University of California Press.Google Scholar

Marcotorchino, J.-F. (1981). Agrégation des similarités en classification automatique. Thèse d’Etat, Université Paris VI.Google Scholar

Marcotorchino, F., Michaud, P. (1981). Heuristic approach to the similarity aggregation problem. Methods of Operations Research, 43, 395–404.Google Scholar

McLachlan, G., Peel, D. (2000). Finite mixture models, New York: Wiley.CrossRef Google Scholar

Mehrotra, A., Trick, M. (1998). Cliques and clustering: a combinatorial approach. Operations Research Letters, 22, 1–12.CrossRef Google Scholar

Michaud, P., Marcotorchino, J.-F.et al. (1980). Optimisation en analyse des donneés relationnelles. In Diday, E.et al. (Eds.), Data analysis and informatics (pp. 655–670). Berlin: Springer.Google Scholar

Mirkin, B.G. (1974). The problems of approximation in space of relations and qualitative data analysis. Information and Remote Control, 35, 1424–1431.Google Scholar

Mirkin, B.G. (1979). Group choice, New York: Wiley.Google Scholar

Mladenović, N., Hansen, P. (1997). Variable neighborhood search. Computers and Operations Research, 24, 1097–1100.CrossRef Google Scholar

Oosten, M., Rutten, J., Spieksma, F. (2001). The clique partitioning problem: facets and patching facets. Networks, 38, 209–226.CrossRef Google Scholar

Opitz, O., Schader, M. (1984). Analyse qualitativer Daten: Einführung und Übersicht. Teil 1. OR Spektrum, 6, 67–83. Analysis of qualitative data: Introduction and survey. Part 1CrossRef Google Scholar

Opitz, O., Schader, M. (1984). Analyse qualitativer Daten: Einführung und Übersicht. Teil 2. OR Spektrum, 6, 133–140. Analysis of qualitative data: Introduction and survey. Part 2CrossRef Google Scholar

Pacheco, J., Valencia, O. (2003). Design of hybrids for the minimum sum-of-squares clustering problem. Computational Statistics and Data Analysis, 43, 235–248.CrossRef Google Scholar

Palubeckis, G. (1997). A branch-and-bound approach using polyhedral results for a clustering problem. INFORMS Journal on Computing, 9, 30–42.CrossRef Google Scholar

Règnier, S. (1965). Sur quelques aspects mathématiques des problèmes de classification automatique. I.C.C. Bulletin, 4, 175–191.Google Scholar

Schader, M., Tüshaus, U. (1985). Ein Subgradientenverfahren zur Klassifikation qualitativer Daten. OR Spektrum, 7, 1–15. A subgradient procedure for classifying qualitative dataCrossRef Google Scholar

Tüshaus, U. (1983). Aggregation binärer Relationen in der qualitativen Datenanalyse, Königsstein: Athenäum. Aggregation of binary relations in qualitative data analysisGoogle Scholar

Vescia, G. (1985). Descriptive classification of cetacea: whales, porpoises and dolphins. In Marcotorchino, J.-F., Proth, J.M., Janssen, J. (Eds.), Data analysis in real life environment: ins and outs of solving problems (pp. 7–24). Amsterdam: Elsevier.Google Scholar

Wakabayashi, Y. (1986). Aggregation of binary relations: algorithmic and polyhedral investigations. PhD Thesis, Universität Augsburg, Germany.Google Scholar

Wakabayashi, Y. (1998). The complexity of computing medians of relations. IME-USP, 3, 323–349.Google Scholar

Wang, H., Obremski, T., Alidaee, B., Kochenberger, G. (2008). Clique partitioning for clustering: a comparison with K-means and latent class analysis. Communications in Statistics—Simulation and Computation, 37, 1–13.CrossRef Google Scholar

Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.CrossRef Google Scholar

Zahn, C.T. (1964). Approximating symmetric relations by equivalence relations. SIAM Journal on Applied Mathematics, 12, 840–847.CrossRef Google Scholar

Article contents

Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests