Hostname: page-component-586b7cd67f-dsjbd Total loading time: 0 Render date: 2024-11-25T05:01:09.482Z Has data issue: false hasContentIssue false

Generic top-down discrimination for sorting and partitioning in linear time*

Published online by Cambridge University Press:  28 June 2012

FRITZ HENGLEIN*
Affiliation:
Department of Computer Science, University of Copenhagen (DIKU), Copenhagen, Denmark (e-mail: [email protected])
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We introduce the notion of discrimination as a generalization of both sorting and partitioning, and show that discriminators (discrimination functions) can be defined generically, by structural recursion on representations of ordering and equivalence relations. Discriminators improve the asymptotic performance of generic comparison-based sorting and partitioning, and can be implemented not to expose more information than the underlying ordering, respectively equivalence relation. For a large class of order and equivalence representations, including all standard orders for regular recursive first-order types, the discriminators execute in the worst-case linear time. The generic discriminators can be coded compactly using list comprehensions, with order and equivalence representations specified using Generalized Algebraic Data Types. We give some examples of the uses of discriminators, including the most-significant digit lexicographic sorting, type isomorphism with an associative-commutative operator, and database joins. Source code of discriminators and their applications in Haskell is included. We argue that built-in primitive types, notably pointers (references), should come with efficient discriminators, not just equality tests, since they facilitate the construction of discriminators for abstract types that are both highly efficient and representation-independent.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012

References

Abramsky, S. & Jung, A. (1992) Domain theory. In Handbook of Logic in Computer Science Semantic Structures, Abramsky, S., Gabbay, Dov M. & Maibaum, T. S. E. (eds), vol. 3. New York, NY: Oxford University Press, pp. 1168.Google Scholar
Aho, A., Hopcroft, J. & Ullman, J. (1983) Data Structures and Algorithms. Boston, MA: Addison-Wesley.Google Scholar
Ajtai, M., Komlós, J. & Szemerédi, E. (1983) Sorting in c log n parallel steps. Combinatorica 3, 119.CrossRefGoogle Scholar
Al-Badarneh, A. & El-Aker, F. (2004) Efficient adaptive in-place radix sorting. Informatica 15 (3), 295302.CrossRefGoogle Scholar
Ambus, T. (2004, July) Multiset Discrimination for Internal and External Data Management. M.Phil. thesis, DIKU, University of Copenhagen, Denmark. Available at: http://plan-x.org/projects/msd/msd.pdf.Google Scholar
Andersson, A., Hagerup, T., Nilsson, S. & Raman, R. (1998) Sorting in linear time? J. Comput. Syst. Sci. (JCSS) 57 (1), 7493.CrossRefGoogle Scholar
Andersson, A. & Nilsson, S. (1994) A new efficient radix sort. In Proceedings of the 35th Anniual IEEE Symposium on Foundations of Computer Science (FOCS), Santa Fe, NM, USA. pp. 714721.CrossRefGoogle Scholar
Andersson, A. & Nilsson, S. (1998) Implementing radixsort. J. Exp. Algorithmics 3, 7.CrossRefGoogle Scholar
Batcher, K. E. (1968) Sorting networks and their applications. In Proceedings of AFIPS Spring Joint Computer Conference, vol. 32. Montvale, NJ: AFIPS Press, pp. 307314.Google Scholar
Bentley, J. (1983) Programming pearls: Aha! algorithms. Commun. ACM 26 (9), 623627.CrossRefGoogle Scholar
Bentley, J. (1986) Programming pearls: Little languages. Commun. ACM 29 (8), 711721.CrossRefGoogle Scholar
Cai, J. & Paige, R. (1991) Look ma, no hashing, and no arrays neither. In Proceedings of the 18th Annual ACM Symposium on Principles of Programming Languages (POPL), Orlando, FL, USA, January, pp. 143154.Google Scholar
Cai, J. & Paige, R. (1995) Using multiset discrimination to solve language processing problems without hashing. Theor. Comput. Sci. (TCS) 145 (1–2), 189228.CrossRefGoogle Scholar
Cormen, T. H., Leiserson, C. E., Rivest, R. L. & Stein, C. (2001) Introduction to Algorithms. 2nd ed., the MIT Electrical Engineering and Computer Science Series. ISBN 0-262-03293-7 (MIT Press) and 0-07-013151-1 (McGraw-Hill).Cambridge, MA and New York, NY: MIT Press and McGraw-Hill.Google Scholar
Danvy, O., Henglein, F., Mairson, H. & Pettorossi, A. (eds). (2008) Automatic Program Development – A Tribute to Robert Paige. Netherlands: Springer. ISBN 978-1-4020-6584-2 (Print), 978-1-4020-6585-9 (Online).CrossRefGoogle Scholar
Dean, J. & Ghemawat, S. (2004, December) MapReduce: Simplified data processing on large clusters. In Proceedings of 6th Symposium on Operating Systems Design and Implementation (OSDI), San Francisco, CA, USA, pp. 137150.Google Scholar
Dershowitz, N. & Manna, Z. (1979) Proving termination with multiset orderings. Commun. ACM 22 (8), 465476.CrossRefGoogle Scholar
Franceschini, G., Muthukrishnan, S. & Pǎtraşcu, M. (2007) Radix sorting with no extra space. In Proceedings of 15th European Symposium on Algorithms (esa), Eilat, Israel. Lecture Notes in Computer Science (LNCS), vol. 4698. Berlin, Germany: Springer, pp. 194205.Google Scholar
Fredman, M. L. & Willard, D. E. (1993) Surpassing the information-theoretic bound with fusion trees. J. Comput. Syst. Sci. (JCSS) 47, 424436.CrossRefGoogle Scholar
Gil, Y. & Zibin, Y. (2005) Efficient algorithms for isomorphisms of simple types. Math. Struct. Comput. Sci. (MSCS) 15 (05), 917957.CrossRefGoogle Scholar
Haskell, Glasgow. (2005) The Glasgow Haskell Compiler. Available at: http://www.haskell.org/ghc.Google Scholar
Grust, T., Sakr, S. & Teubner, J. (2004) XQuery on SQL hosts. In Proceedings of the 30th Int'l Conference on Very Large Databases (VLDB 2004), Toronto, Canada, vol. 30, 263 pp.Google Scholar
Han, Y. & Thorup, M. (2002) Integer sorting in o(n expected time and linear space. In Proceedings of the 43rd Annual IEEE Sympositum on Foundations of Computer Science (FOCS). Washington, DC: IEEE Computer Society, pp. 135144.Google Scholar
Henglein, F. (2003, September) Multiset Discrimination. Manuscript (incomplete). Denmark: Department of Computer Science, University of Copenhagen (DIKU).Google Scholar
Henglein, F. (2008) Generic discrimination: Sorting and partitioning unshared data in linear time. Proceeding of the 13th ACM Sigplan International Conference on Functional Programming (ICFP '08), Hook, J. & Thiemann, P. (eds). New York, NY: ACM, pp. 91102. Nominated by ACM SIGPLAN for CACM Research Highlights (available at: http://sigplan.org/CACMPapers.htm).CrossRefGoogle Scholar
Henglein, F. (2009) What is a sorting function? J. Log. Algebr. Program. (JLAP) 78 (5), 381401. Invited submission to special issue on 19th Nordic Workshop on Programming Theory (NWPT).CrossRefGoogle Scholar
Henglein, F. (2010) Optimizing relational algebra operations using discrimination-based joins and lazy products. In Proceedings of ACM Sigplan 2010 Workshop on Partial Evaluation and Program Manipulation. New York, NY: ACM, pp. 7382. Also DIKU TOPPS D-report no. 611.CrossRefGoogle Scholar
Henglein, F. & Larsen, K. F. (2010a) Generic multiset programming for language-integrated querying. In Proceedings of the 6th ACM Sigplan Workshop on Generic Programming (WGP). New York, NY: ACM, pp. 4960.CrossRefGoogle Scholar
Henglein, F. & Larsen, K. (2010b) Generic multiset programming with discrimination-based joins and symbolic Cartesian products. Higher-Order Symb. Comput. (HOSC) 23, 337370. Publication date: November 24, 2011.CrossRefGoogle Scholar
Hinze, R. (2000) Generalizing generalized tries. J. Funct. Program. 10 (4), 327351.CrossRefGoogle Scholar
Hoare, C. A. R. (1961) Algorithm 63: Partition. Commun. ACM 4 (7), 321.Google Scholar
Hudak, P., Peterson, J. & Fasel, J. H. (1999, May) A Gentle Introduction to Haskell, Version 98. Online tutorial. Available at: http://www.haskell.org/tutorial.Google Scholar
Jeuring, J. & Jansson, P. (1996) Polytypic programming. In Advanced Functional Programming. Lecture Notes in Computer Science, vol. 1129. London, UK: Springer-Verlag, pp. 68114.CrossRefGoogle Scholar
Jha, S., Palsberg, J., Zhao, T. & Henglein, F. (2008) Efficient type matching. In: Automatic Program Developmen, Henglein, D., and Pettorossi, M (eds.). Netherlands: Springer, ISBN 978-1-4020-6584-2 (Print), 978-1-4020-6585-9 (Online).Google Scholar
Jouannaud, J. P. & Lescanne, P. (1982) On multiset orderings. Inf. Process. Lett. 25 (2), 5763.CrossRefGoogle Scholar
Knuth, D. (1998) The Art of Computer Programming: Sorting and Searching. 2nd ed., vol. 3. Boston, MA: Addison-Wesley.Google Scholar
Maus, A.. (2002) ARL, a faster in-place, cache-friendly sorting algorithm. Proceedings of the Norwegian Informatics Conference (NIK), Kongsberg, Norway, Tapir Akademisk Forlag. ISBN 82-91116-45-8.Google Scholar
Mehlhorn, K. (1984) Data Structures and Algorithms 1: Sorting and Searching. EATCS Monographs on Theoretical Computer Science, vol. I. Berlin, Germany: Springer-Verlag.CrossRefGoogle Scholar
Paige, R. (1991) Optimal Translation of User Input in Dynamically Typed Languages. (Draft).Google Scholar
Paige, R. (1994) Efficient translation of external input in a dynamically typed language. In Proceedings of 13th World Computer Congress, Pehrson, B. & Simon, I. (eds), vol. 1. North Holland: Elsevier Science B.V. pp. 603608.Google Scholar
Paige, R. & Tarjan, R. E. (1987) Three partition refinement algorithms. Siam J. Comput. 16 (6), 973989.CrossRefGoogle Scholar
Paige, R. & Yang, Z. (1997) High-level reading and data structure compilation. In Proceedings of 24th ACM Sigplan-Sigact Symposia on Principles of Programming Languages (POPL), Paris, France. New York, NY: ACM Press, pp. 456469. Available at: http://www.acm.org.Google Scholar
Peyton Jones, S. (2003) The Haskell 98 language. J. Funct.Program. (JFP) 13 (1), 0146.Google Scholar
Shell, D. L. (1959) A high-speed sorting procedure. Commun. ACM 2 (7), 3032.CrossRefGoogle Scholar
Sinha, R. & Zobel, J. (2003) Efficient Trie-Based Sorting Of Large Sets Of Strings. In Oudshoorn, Michael J. (ed), CRPIT, vol. 16. Sydney, NSW: Australian Computer Society (ACS), pp. 1118.Google Scholar
Strachey, C. (2000) Fundamental concepts in programming languages. Higher-Order Symb. Comput. 13 (1), 1149.CrossRefGoogle Scholar
Tarjan, R. (1983) Data Structures and Network Flow Algorithms. Regional Conference Series in Applied Mathematics, vol. CMBS 44. Philadelphia, PA: SIAM.CrossRefGoogle Scholar
Trinder, P. & Wadler, P. (1988, August) List comprehensions and the relational calculus. In Proceedings of 1988 Glasgow Workshop on Functional Programming, pp. 115–123.Google Scholar
Williams, J. W. J. (1964) Algorithm 232 – Heapsort. Commun. ACM 7 (6), 347348.Google Scholar
Zibin, Y., Gil, J. & Considine, J. (2003) Efficient algorithms for isomorphisms of simple types. In Proceedings of 30th Annual ACM Sigplan-Sigact Symposium on Principles of Programming Languages (POPL), SIGPLAN Notices, vol. 38, no. 1. New York, NY: ACM Press, pp. 160171.CrossRefGoogle Scholar
Submit a response

Discussions

No Discussions have been published for this article.