Hostname: page-component-cd9895bd7-gxg78 Total loading time: 0 Render date: 2024-12-23T17:57:22.108Z Has data issue: false hasContentIssue false

Fixpoint semantics and optimization of recursive Datalog programs with aggregates*

Published online by Cambridge University Press:  23 August 2017

CARLO ZANIOLO
Affiliation:
University of California, Los Angeles, Los Angeles, CA, USA (e-mails: [email protected], [email protected], [email protected], [email protected], [email protected])
MOHAN YANG
Affiliation:
University of California, Los Angeles, Los Angeles, CA, USA (e-mails: [email protected], [email protected], [email protected], [email protected], [email protected])
ARIYAM DAS
Affiliation:
University of California, Los Angeles, Los Angeles, CA, USA (e-mails: [email protected], [email protected], [email protected], [email protected], [email protected])
ALEXANDER SHKAPSKY
Affiliation:
University of California, Los Angeles, Los Angeles, CA, USA (e-mails: [email protected], [email protected], [email protected], [email protected], [email protected])
TYSON CONDIE
Affiliation:
University of California, Los Angeles, Los Angeles, CA, USA (e-mails: [email protected], [email protected], [email protected], [email protected], [email protected])
MATTEO INTERLANDI
Affiliation:
Microsoft, Redmond, WA, USA (e-mail: [email protected])

Abstract

A very desirable Datalog extension investigated by many researchers in the last 30 years consists in allowing the use of the basic SQL aggregates min, max, count and sum in recursive rules. In this paper, we propose a simple comprehensive solution that extends the declarative least-fixpoint semantics of Horn Clauses, along with the optimization techniques used in the bottom-up implementation approach adopted by many Datalog systems. We start by identifying a large class of programs of great practical interest in which the use of min or max in recursive rules does not compromise the declarative fixpoint semantics of the programs using those rules. Then, we revisit the monotonic versions of count and sum aggregates proposed by Mazuran et al. (2013b, The VLDB Journal 22, 4, 471–493) and named, respectively, mcount and msum. Since mcount, and also msum on positive numbers, are monotonic in the lattice of set-containment, they preserve the fixpoint semantics of Horn Clauses. However, in many applications of practical interest, their use can lead to inefficiencies, that can be eliminated by combining them with max, whereby mcount and msum become the standard count and sum. Therefore, the semantics and optimization techniques of Datalog are extended to recursive programs with min, max, count and sum, making possible the advanced applications of superior performance and scalability demonstrated by BigDatalog (Shkapsky et al. 2016. In SIGMOD. ACM, 1135–1149) and Datalog-MC (Yang et al. 2017. The VLDB Journal 26, 2, 229–248).

Type
Regular Papers
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Work done while at UCLA.

*

This work was supported in part by NSF grants IIS-1218471, IIS-1302698 and CNS-1351047 and U54EB020404 awarded by NIH Big Data to Knowledge (BD2K).

References

Aref, M. et al. 2015. Design and implementation of the LogicBlox system. In Proc. of SIGMOD. ACM, 1371–1382.Google Scholar
Arni, F., Ong, K., Tsur, S., Wang, H. and Zaniolo, C. 2003. The deductive database system LDL++. Theory and Practice of Logic Programming 3, 1, 6194.CrossRefGoogle Scholar
Chimenti, D., O'Hare, A. B., Krishnamurthy, R., Tsur, S., West, C. and Zaniolo, C. 1987. An overview of the LDL system. IEEE Data Engineering Bulletin 10, 4, 5262.Google Scholar
Condie, T., et al. 2017. Advanced Applications by Least-Fixpoint Algorithms Specified using Aggregates in Datalog. Technical Report 170012, UCLA CSD.Google Scholar
Faber, W., Pfeifer, G. and Leone, N. 2011. Semantics and complexity of recursive aggregates in answer set programming. Artificial Intelligence 175, 1, 278298.Google Scholar
Furfaro, F., Greco, S., Ganguly, S. and Zaniolo, C. 2002. Pushing extrema aggregates to optimize logic queries. Information Systems 27, 5 (July), 321343.CrossRefGoogle Scholar
Ganguly, S., Greco, S. and Zaniolo, C. 1995. Extrema predicates in deductive databases. Journal of Computer and System Sciences 51, 2, 244259.Google Scholar
Gelfond, M. and Zhang, Y. 2014. Vicious circle principle and logic programs with aggregates. Theory and Practice of Logic Programming 14, 4–5, 587601.Google Scholar
Greco, S., Zaniolo, C. and Ganguly, S. 1992. Greedy by choice. In Proc. of PODS. ACM, 105–113.Google Scholar
Kemp, D., Meenakshi, K., Balbin, I. and Ramamohanarao, K. 1989. Propagating constraints in recursive deductive databases. In North American Conference on Logic Programming, 981–998.Google Scholar
Kemp, D. B. and Stuckey, P. J. 1991. Semantics of logic programs with aggregates. In Proc. of ISLP, 387–401.Google Scholar
Liu, Y. A., Stoller, S. D., Lin, B. and Gorbovitski, M. 2012. From clarity to efficiency for distributed algorithms. SIGPLAN Notices 47, 10 (October), 395410.Google Scholar
Mazuran, M., Serra, E. and Zaniolo, C. 2013a. A declarative extension of Horn clauses, and its significance for Datalog and its applications. Theory and Practice of Logic Programming 13, 4–5, 609623.Google Scholar
Mazuran, M., Serra, E. and Zaniolo, C. 2013b. Extending the power of Datalog recursion. The VLDB Journal 22, 4, 471493.Google Scholar
Morris, K. A., Ullman, J. D. and Gelder, A. V. 1986. Design overview of the NAIL! system. In Proc. of ICLP, 554–568.Google Scholar
Mumick, I. S., Pirahesh, H. and Ramakrishnan, R. 1990. The magic of duplicates and aggregates. In VLDB. Morgan Kaufmann Publishers Inc., 264277.Google Scholar
Mumick, I. S. and Shmueli, O. 1995. How expressive is stratified aggregation? Annals of Mathematics and Artificial Intelligence 15, 3–4, 407435.CrossRefGoogle Scholar
Pelov, N., Denecker, M. and Bruynooghe, M. 2007. Well-founded and stable semantics of logic programs with aggregates. Theory and Practice of Logic Programming 7, 3, 301353.Google Scholar
Przymusinski, T. C. 1988. Perfect model semantics. In Proc. of ICLP/SLP, 1081–1096.Google Scholar
Ramakrishnan, R., Srivastava, D. and Sudarshan, S. 1992. CORAL - control, relations and logic. In Proc. of PVLDB, 238–250.Google Scholar
Ross, K. A. and Sagiv, Y. 1992. Monotonic aggregation in deductive databases. In Proc. of PODS, 114–126.Google Scholar
Seo, J., Park, J., Shin, J. and Lam, M. S. 2013. Distributed socialite: a Datalog-based language for large-scale graph analysis. Proceedings of the VLDB Endowment 6, 14, 19061917.Google Scholar
Shkapsky, A., Yang, M., Interlandi, M., Chiu, H., Condie, T. and Zaniolo, C. 2016. Big data analytics with Datalog queries on Spark. In Proc. of SIGMOD. ACM, 1135–1149.Google Scholar
Shkapsky, A., Yang, M. and Zaniolo, C. 2015. Optimizing recursive queries with monotonic aggregates in DeALS. In Proc. of ICDE. IEEE, 867–878.Google Scholar
Shkapsky, A., Zeng, K. and Zaniolo, C. 2013. Graph queries in a next-generation Datalog system. Proceedings of the VLDB Endowment 6, 12, 12581261.CrossRefGoogle Scholar
Son, T. C. and Pontelli, E. 2007. A constructive semantic characterization of aggregates in answer set programming. Theory and Practice of Logic Programming 7, 3, 355375.Google Scholar
Srivastava, D. and Ramakrishnan, R. 1992. Pushing constraint selections. In Journal of Logic Programming, 301–315.Google Scholar
Sudarshan, S. and Ramakrishnan, R. 1991. Aggregation and relevance in deductive databases. In Proc. of VLDB, 501–511.Google Scholar
Swift, T. and Warren, D. S. 2010. Tabling with answer subsumption: Implementation, applications and performance. In Proc. of JELIA, 300–312.Google Scholar
Vaghani, J., Ramamohanarao, K., Kemp, D. B., Somogyi, Z., Stuckey, P. J., Leask, T. S. and Harland, J. 1994. The Aditi deductive database system. VLDB Journal 3, 2, 245288.Google Scholar
van Emden, M. H. and Kowalski, R. A. 1976. The semantics of predicate logic as a programming language. Journal of the ACM 23, 4, 733742.Google Scholar
Van Gelder, A. 1993. Foundations of aggregation in deductive databases. In Deductive and Object-Oriented Databases. Springer, 1334.Google Scholar
Wang, J., Balazinska, M. and Halperin, D. 2015. Asynchronous and fault-tolerant recursive Datalog evaluation in shared-nothing engines. Proceedings of the VLDB Endowment 8, 12, 15421553.Google Scholar
Yang, M., Shkapsky, A. and Zaniolo, C. 2015. Parallel bottom-up evaluation of logic programs: DeALS on shared-memory multicore machines. In Technical Communications of ICLP.Google Scholar
Yang, M., Shkapsky, A. and Zaniolo, C. 2017. Scaling up the performance of more powerful Datalog systems on multicore machines. The VLDB Journal 26, 2, 229248.CrossRefGoogle Scholar
Yang, M. and Zaniolo, C. 2014. Main memory evaluation of recursive queries on multicore machines. In Proc. of IEEE Big Data, 251–260.Google Scholar
Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R. T., Subrahmanian, V. S. and Zicari, R. 1997. Advanced Database Systems. Morgan Kaufmann.Google Scholar
Zaniolo, C., Yang, M., Das, A. and Interlandi, M. 2016. The magic of pushing extrema into recursion: Simple, powerful Datalog programs. In Proc. of AMW.Google Scholar
Zhou, N.-F., Barták, R. and Dovier, A. 2015. Planning as tabled logic programming. Theory and Practice of Logic Programming 15, 4–5, 543558.CrossRefGoogle Scholar
Zhou, N.-F., Kameya, Y. and Sato, T. 2010. Mode-directed tabling for dynamic programming, machine learning, and constraint solving. In Proc. of ICTAI '10. Washington, DC, USA, 213–218.Google Scholar