Hostname: page-component-78c5997874-xbtfd Total loading time: 0 Render date: 2024-11-05T09:36:10.019Z Has data issue: false hasContentIssue false

Using evolutionary algorithms to select text features for mining design rationale

Published online by Cambridge University Press:  30 January 2020

Miriam Lester
Affiliation:
Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT, USA
Miguel Guerrero
Affiliation:
Department of Mathematics and Computer Science, Colorado College, Colorado Springs, CO, USA
Janet Burge*
Affiliation:
Department of Mathematics and Computer Science, Colorado College, Colorado Springs, CO, USA
*
Author for correspondence: Janet Burge, E-mail: [email protected]

Abstract

At its heart, design is a decision-making process. These decisions, and the reasons for making them, comprise the design rationale (DR) for the designed artifact. If available, DR provides a comprehensive record of the reasoning behind the decisions made during the design. Unfortunately, while this information is potentially quite valuable, it is usually not explicitly captured. Instead, it is often buried in other design and development artifacts. In this paper, we study how to identify rationale from text documents, specifically software bug reports and design discussion transcripts. The method we examined is statistical text mining where a model is built to use document features to classify sentences. Choosing which features are most likely to be good predictors is important. We studied two evolutionary algorithms to optimize feature selection – ant colony optimization and genetic algorithms. We found that for many types of rationale, models built with an optimized feature set outperformed those built using all the document features.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aghdam, MH, Ghasem-Aghaee, N and Basiri, ME (2009) Text feature selection using ant colony optimization. Expert Systems with Applications 36, 68436853.CrossRefGoogle Scholar
Al-Ani, A (2005) Feature subset selection using ant colony optimization. International Journal of Computational Intelligence 2, 5358.Google Scholar
Ali, SI and Shahzad, W (2012) A feature subset selection method based on symmetric uncertainty and ant colony optimization. International Conference on Emerging Technologies. New York, NY: IEEE, pp. 1–6.CrossRefGoogle Scholar
Alkadhi, R, Nonnenmacher, M, Guzman, E and Bruegge, B (2018) How do developers discuss rationale? International Conference on Software Analysis, Evolution and Reengineering, Campobasso, Italy, March 20–23, pp. 357–369.Google Scholar
Anderson, EJ and Ferris, MC (1994) Genetic algorithms for combinatorial optimization: the assemble line balancing problem. ORSA Journal on Computing 6, 161173.CrossRefGoogle Scholar
Atkinson-Abutridy, J, Mellish, C and Aitken, S (2004) Combining information extraction with genetic algorithms for text mining. IEEE Intelligent Systems 19, 2230.CrossRefGoogle Scholar
Basiri, ME and Nemati, S (2009) A novel hybrid ACO-GA algorithm for text feature selection. IEEE Congress on Evolutionary Computation. New York, NY: IEEE, pp. 2561–2568.CrossRefGoogle Scholar
Brazier, FMT, van Langen, PHG and Treur, J (1997) A compositional approach to modelling design rationale. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 11, 125139.CrossRefGoogle Scholar
Brin, S and Page, L (1998) The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107117.CrossRefGoogle Scholar
Burge, JE (2005) Software Engineering Using Design RATionale (Dissertation). Worcester, Massachusetts: Worcester Polytechnic Institute.Google Scholar
Burge, J and Bracewell, R (2008) Special issue: design rationale. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 22, 309310.CrossRefGoogle Scholar
Burge, JE and Brown, DC (2003) Rationale support for maintenance of large scale systems. Workshop on Evolution of Large-Scale Industrial Software Applications (ELISA), ICSM’03, Amsterdam, Netherlands.Google Scholar
Burge, JE and Brown, DC (2004) An integrated approach for software design checking using design rationale. In Gero, JS (ed), Design Computing and Cognition ’04. Dordrecht: Springer Netherlands, pp. 557575.CrossRefGoogle Scholar
Burge, J, Carroll, JM, McCall, R and Mistrik, I (2008) Rationale-Based Software Engineering. Heidelberg: Springer.CrossRefGoogle Scholar
Casillas, J, Cordón, O and Herrera, F (2000) Learning fuzzy rules using ant colony optimization algorithms. Proc. 2nd International Workshop on Ant Algorithms, Brussels, Belgium, September 8–9, pp. 13–21.Google Scholar
Chung, PWH and Bañares-Alcántara, R (Guest Eds) (1997) Special issue on representation and use of design rationale. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 11, 89167.CrossRefGoogle Scholar
Cunningham, H, Maynard, D and Bontcheva, K (2011) Text processing with GATE (Version 6). University of Sheffield Department of Computer Science. 15 April 2011. ISBN 0956599311.Google Scholar
de Campos, LM, Fernández-Luna, JM, Gámez, JA and Puerta, JM (2002) Ant colony optimization for learning Bayesian networks. International Journal of Approximate Reasoning 31, 291311.CrossRefGoogle Scholar
de Medeiros, AP and Schwabe, D (2008) Kuaba approach: integrating formal semantics and design rationale representation to support design reuse. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 22, 399419.CrossRefGoogle Scholar
Deneubourg, J-L, Aron, S, Goss, S and Pasteels, JM (1990) The self-organizing exploratory pattern of the argentine ant. Journal of Insect Behavior 3, 159168.CrossRefGoogle Scholar
Dorigo, M (1992) Optimization, Learning and Natural Algorithms (PhD Thesis). Politecnico di Milano, Italy.Google Scholar
Dorigo, M and Stützle, T (1999) ACO algorithms for the traveling salesman problem. In Evolutionary Algorithms in Engineering and Computer Science: Recent Advances in Genetic Algorithms, Evolution Strategies, Evolutionary Programming, Genetic Programming and Industrial Applications. Chichester: John Wiley & Sons.Google Scholar
Dorigo, M and Stützle, T (2004) Ant Colony Optimization. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Dorigo, M and Stützle, T (2010) Ant colony optimization: overview and recent advances. In Gendreau, M and Yves Potvin, J (eds), Handbook of Metaheuristics. Boston, MA: Springer, pp. 227263.CrossRefGoogle Scholar
Dorigo, M, Maniezzo, V and Colorni, A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 26, 2941.CrossRefGoogle ScholarPubMed
Dutoit, A, McCall, R, Mistrik, I and Paech, B (2006) Rationale Management in Software Engineering. Berlin: Springer-Verlag.Google Scholar
Goldberg, D (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: AddisonWesley.Google Scholar
Holland, JH (1975/1992) Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Cambridge, MA: MIT Press.Google Scholar
Hripcsak, G and Rothschild, A (2005) Agreement, the F-Measure, and reliability in information retrieval. Journal of the American Medical Informatics Association 12, 296298.CrossRefGoogle ScholarPubMed
Jiang, P-p, Liu, P, Zhu, Z and Zhao, L (2009) Optimization of text feature subsets based on GATS algorithm. IEEE International Symposium on IT in Medicine & Education. New York, NY: IEEE, pp. 924–927.CrossRefGoogle Scholar
Kanan, HR, Faez, K and Hosseinzadeh, M (2007) Face recognition system using ant colony optimization-based selected features. IEEE Symposium on Computational Intelligence in Security and Defense Applications, Honolulu, HI, pp. 57–62.CrossRefGoogle Scholar
King, JMP and Bañares-Alcántara, R (1997) Extending the scope and use of design rationale records. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 11, 155167.CrossRefGoogle Scholar
Kohavi, R and John, GH (1997) Wrappers for feature subset selection. Artificial Intelligence 97, 273324.CrossRefGoogle Scholar
Kurtanovic, Z and Maalej, W (2018) On user rationale in software engineering. Requirements Engineering 23, 357379.CrossRefGoogle Scholar
Lee, J (1997) Design rationale systems: understanding the issues. IEEE Expert 12, 7885.CrossRefGoogle Scholar
Leguizamon, G and Michalewicz, Z (1999) A new version of ant system for subset problems. Proceedings of the 1999 Congress on Evolutionary Computation, Vol. 2. IEEE, pp. 1459–1464.CrossRefGoogle Scholar
Lessing, L, Dumitrescu, I and Stützle, T (2004) A comparison between ACO algorithms for the set covering problem. International Workshop on Ant Colony Optimization and Swarm Intelligence. Berlin, Heidelberg: Springer.Google Scholar
Lester, M and Burge, J (2018) Identifying design rationale using ant colony optimization. Proc. of the International Conference on Design, Computing, and Cognition, Lake Cuomo, Italy, pp. 581–600.Google Scholar
Liang, Y, Liu, Y, Kwong, C and Lee, W (2012) Learning the ‘Whys’: discovering design rationale using text mining – an algorithm perspective. Computer-Aided Design 44, 916930.CrossRefGoogle Scholar
López, C, Codocedo, V, Astudillo, H and Cysneiros, LM (2012) Bridging the gap between software architecture rationale formalisms and actual architecture documents: an ontology-driven approach. Science of Computer Programming 77, 6680.CrossRefGoogle Scholar
Mao, F, Mercer, RE and Xiao, L (2014) Extracting Imperatives from Wikipedia Article for Deletion Discussions. Proceedings of the first workshop on Argumentation Mining, Baltimore, Maryland, ACL, p. 106.CrossRefGoogle Scholar
Marcus, MP, Marcinkiewicz, MA and Santorini, B (1993) Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19, 313330.Google Scholar
Martin-Bautista, MJ and Vila, M-A (1999) A survey of genetic feature selection in mining issues. Proceedings of the 1999 Congress on Evolutionary Computation, CEC 99, Vol. 2. New York, NY: IEEE, pp. 1314–1321.CrossRefGoogle Scholar
Mathur, T (2015) Improving Classification Results Using Class Imbalance Solutions & Evaluating the Generalizability of Rationale Extraction Techniques (Master's Thesis). Miami University.Google Scholar
McCall, R (1991) PHI: a conceptual foundation for design hypermedia. Design Studies 12, 3041.CrossRefGoogle Scholar
McCall, R (2018) Using argumentative, semantic grammar for capture of design rationale. Proceedings of the International Conference on Design Computing and Cognition ’18, Lecco Italy, 2–4 July 2018, pp. 571–580.Google Scholar
Moens, M-F, Boiy, E, Palau, RM and Reed, C (2007) Automatic detection of arguments in legal texts. Proceedings of the 11th International Conference on Artificial Intelligence and Law. New York, NY: ACM, pp. 98–107.CrossRefGoogle Scholar
Moran, T and Carroll, J (Eds) (1996) Design Rationale Concepts, Techniques, and Use. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Mukherjee, I, AL-Fayoumi, M, Mahanti, PK, Jha, R and Al-Bidewi, I (2010) Content analysis based on text mining using genetic algorithm. 2nd International Conference on Computer Technology and Development. New York, NY: IEEE, pp. 432–436.CrossRefGoogle Scholar
Ozyurt, IB (2012) Automatic identification and classification of noun argument structures in biomedical literature. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9, 16391648.CrossRefGoogle ScholarPubMed
Palau, M and Moens, M (2009) Argumentation mining: the detection, classification and structure of arguments in text. Proc. of the 12th International Conference on Artificial Intelligence and Law (ICAIL ’09),Barcelona, Spain, June 8–12, 2009, pp. 98–107.CrossRefGoogle Scholar
Prakken, H, Reed, C and Walton, D (2003) Argumentation schemes and generalisations in reasoning about evidence. In Proceedings of the 9th international conference on Artificial intelligence and law (ICAIL ’03). Association for Computing Machinery, New York, NY, USA, pp. 3241.CrossRefGoogle Scholar
Roeva, O, Fidanova, S and Atanassova, V (2013) Hybrid ACO-GA for parameter identification of an E. coli cultivation process model. International Conference on Large-Scale Scientific Computing. Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
Rogers, B, Gung, J, Qiao, Y and Burge, J (2012) Exploring techniques for rationale extraction from existing documents. New Ideas and Emerging Results Track, International Conference on Software Engineering, Zurich, Switzerland, June 2012.CrossRefGoogle Scholar
Rogers, B, Qiao, Y, Gung, J, Mathur, T and Burge, J (2014) Using text mining techniques to extract rationale from existing documentation. International Conference on Design Computing and Cognition, London, UK 23–25 June, pp. 457–474.Google Scholar
Rogers, B, Justice, C, Mathur, T and Burge, JE (2016) Generalizability of document features for identifying rationale. Design Computing and Cognition '16. Berlin, Heidelberg: Springer, pp. 633–651.Google Scholar
Saraç, E and Özel, SA (2014) An ant colony optimization based feature selection for web page classification. The Scientific World Journal 2014, 16.CrossRefGoogle ScholarPubMed
Shipman, F and McCall, R (1994) Supporting knowledge-base evolution with incremental formalization. Proc. CHI'94, Boston, Massachusetts, April 24–28, pp. 285–291.Google Scholar
Shipman, F and McCall, R (1997) Integrating different perspectives on design rationale: Supporting the emergence of design rationale from design communication. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 11, 141154.CrossRefGoogle Scholar
Skiena, S (2017) The Data Science Design Manual. Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
Solnon, C and Fenet, S (2006) A study of ACO capabilities for solving the maximum clique problem. Journal of Heuristics 12, 155180.CrossRefGoogle Scholar
Zaiyadi, M and Baharudin, B (2010) A proposed hybrid approach for feature selection in text document categorization. World Academy of Science, Engineering and Technology 48, 111116.Google Scholar