Published online by Cambridge University Press: 23 August 2017
A very desirable Datalog extension investigated by many researchers in the last 30 years consists in allowing the use of the basic SQL aggregates min, max, count and sum in recursive rules. In this paper, we propose a simple comprehensive solution that extends the declarative least-fixpoint semantics of Horn Clauses, along with the optimization techniques used in the bottom-up implementation approach adopted by many Datalog systems. We start by identifying a large class of programs of great practical interest in which the use of min or max in recursive rules does not compromise the declarative fixpoint semantics of the programs using those rules. Then, we revisit the monotonic versions of count and sum aggregates proposed by Mazuran et al. (2013b, The VLDB Journal 22, 4, 471–493) and named, respectively, mcount and msum. Since mcount, and also msum on positive numbers, are monotonic in the lattice of set-containment, they preserve the fixpoint semantics of Horn Clauses. However, in many applications of practical interest, their use can lead to inefficiencies, that can be eliminated by combining them with max, whereby mcount and msum become the standard count and sum. Therefore, the semantics and optimization techniques of Datalog are extended to recursive programs with min, max, count and sum, making possible the advanced applications of superior performance and scalability demonstrated by BigDatalog (Shkapsky et al. 2016. In SIGMOD. ACM, 1135–1149) and Datalog-MC (Yang et al. 2017. The VLDB Journal 26, 2, 229–248).
Work done while at UCLA.
This work was supported in part by NSF grants IIS-1218471, IIS-1302698 and CNS-1351047 and U54EB020404 awarded by NIH Big Data to Knowledge (BD2K).