1 Introduction
Model-based reasoning is one of the most prominent areas of research in artificial intelligence (AI). In model-based AI approaches, solvers accept input instances written in a given logical language and automatically compute their solutions (Geffner Reference Geffner2018).
A pillar of model-based AI is the decoupling between the knowledge model and the reasoning side, which is usually referred to as domain-independent reasoning. This supports the use of knowledge engineering approaches that separate the modelling part from the reasoning part. The main advantage of such separation is that it is possible to “optimise” one of the two parts without changing the other for obtaining overall advantage in the whole process. We follow this path, further evidencing that this modular approach also supports the use of reformulation and configuration techniques which can automatically re-formulate, re-represent or tune the knowledge model, while keeping the same input language, in order to increase the efficiency of a general solver (see, e.g. (Vallati et al. Reference Vallati, Hutter, Chrpa and McCluskey2015) still for the case of automated planning, where reformulation techniques have been widely applied). The idea is to make these techniques to some degree independent of domain knowledge and solver (that is, applicable to a range of domains and solvers technology), and use them to form a wrapper around a solver, improving its overall performance for the problem to which it is applied.
In this paper, we investigate how the configuration of knowledge models, that is, the order in which elements are listed in the considered model, can affect the performance of general automated solvers in the wider field of logic programming. In particular, we focus on two areas that can be considered to end of the spectrum: Propositional Satisfiability (SAT) and Answer Set Programming (ASP) (Baral Reference Baral2003; Gelfond and Lifschitz Reference Gelfond and Lifschitz1988; Gelfond and Lifschitz Reference Gelfond and Lifschitz1991; Brewka et al. Reference Brewka, Eiter and Truszczynski2011). In both areas, configuration has been traditionally exploited to modify the behaviour of solvers to improve performance on a considered (class of) instance(s) (Eggensperger et al. Reference Eggensperger, Lindauer and Hutter2019). With regard to the knowledge models characteristics, SAT, particularly in its CNF connotation, has a limited expressivity in terms of the syntax of the models. On the other hand, ASP has instead a great level of expressivity, having a rich syntax (Calimeri et al. Reference Calimeri, Dodaro, Fuscà, Perri and Zangari2020) aiming for better readability.
In fact, the two approaches, while sharing similar aspects, for example (i) presence of a somehow similar input format for the propositional part, (ii) the reasoning part of state-of-the-art SAT and ASP solvers employ variations of the CDCL algorithm (Mitchell Reference Mitchell2005; Gebser et al. Reference Gebser, Kaufmann and Schaub2012), and (iii) the existence of linear techniques for rewriting ASP programs falling in a certain syntactic class (cf. tight, (Erdem and Lifschitz Reference Erdem and Lifschitz2003)) and solvers that exploit such property to use SAT as ASP solver, are also very different on several respects. Among others: (a) ASP is a first-order language allowing for variables, that are eliminated during grounding, and during grounding some kind of reformulation already happens, (b) ASP rules are somehow more “constrained” than CNF clauses, since they need to preserve the head-body structure, (c) ASP allows for a number of additional constructs, like aggregates, and (d) propositional ASP is strictly more expressive than SAT.
Building on the experience gained in our prior work on SAT (Vallati and Maratea Reference Vallati and Maratea2019), the large experimental analysis presented in this paper provides a collection of results that help to understand the impact of knowledge model configuration on automated solvers from these two subareas of the logic programming field, and to provide valuable support to knowledge engineers. In particular, in this work we:
-
1. define a number of SAT and ASP syntactic features useful for analysing the structure of the formula/program at hand,
-
2. introduce a framework that, leveraging on the introduced features, allows the automated reconfiguration of the input formula/program,
-
3. employ SMAC (Hutter et al. Reference Hutter, Hoos and Leyton-Brown2011) as a configuration tool to reformulate the input formula/program, and
-
4. compare the performance of state-of-the-art SAT and ASP solvers on basic and reformulated formulae/programs coming from well-known benchmark domains taken for respective competitions (see. e.g. the reports of the last competitions available (Heule et al. Reference Heule, Järvisalo and Suda2019; Gebser et al. Reference Gebser, Maratea and Ricca2020)).
Results show that SAT solvers can greatly benefit from such reformulation, being able to solve a consistent number of additional instances or in shorter time, and that same happens to ASP solvers, to a lesser degrees, despite the limitations (a) and (b), and the wider degrees of parameters to analyse, cf. (c).
This paper is organised as follows. First, in Section 2, we present needed preliminaries about SAT and ASP, their input languages, and the configuration techniques we are going to exploit in the paper. Then, Section 3 is devoted to the configuration of SAT formulae and ASP programs, by defining the syntactic features we are going to employ for reformulation. Further, Section 4 presents the results of our experimental analysis on both SAT and ASP domains. The paper ends in Sections 5 and 6 with the analysis of related literature and conclusions, respectively.
2 Preliminaries
This section provides the essential background with regard to SAT and ASP fields, and with regard to automated configuration techniques.
2.1 SAT formulae and answer set programming
We define (ground) disjunctive ASP programs and SAT formulae so as to underline similarities, in order to make it easier in later sections to compare the presented techniques.
Syntax.
Let $\mathcal{A}$ be a propositional signature. An element $p \in \mathcal{A}$ is called atom or positive literal. The negation of an atom p, in symbols $\lnot p$ , is called negative literal. Given a literal l, we define $\overline{l}=p$ , if $l=p$ and $\overline{l}=\neg p$ , if $l=p$ for some $p\in\mathcal{A}$ . Given a set of literals M, we denote by $M^+$ the set of positive literals of M, by $M^-$ the set of negative literals of M, and by $\overline{M}$ the set $\{\overline{l}\ :\ l\in M\}$ . A clause is a finite set of literals (seen as a disjunction), and a SAT formula is a finite set of clauses (seen as a conjunction).
Example 1 Let $\varphi_{run}$ be the following SAT formula:
where $c_1, c_2, c_3$ are clauses. $\lhd$
An aggregate atom is of the form:
where $n \geq 1$ , $l_1,\ldots,l_n$ are distinct literals, and $b,w_1,\ldots,w_n$ are positive integers. For an atom p of the form (1), $\mathit{elem}(p) := \{(w_i,l_i) | i \in [1..n]\}$ , $\mathit{lits}(p) := \{l | (w, l) \in \mathit{elem}(p) \}$ , and $\mathit{bound}(p) := b$ . Moreover, ${\rm {\small COUNT}}\{l_1, \ldots, l_n\} \geq b$ denotes a shortcut for ${\rm {\small SUM}}\{w_1: l_1, w_2: l_2, \ldots, w_n: l_n\} \geq b$ where $w_1 = w_2 = \ldots = w_n = 1$ .
An ASP program $\Pi$ is a finite set of rules of the following form:
where $n > 0$ and $n \geq m$ , $p_1,\ldots,p_m$ are atoms, $p_{m+1}, \ldots, p_n$ are atoms or aggregate atoms. For a rule r of the form (2.1), let H(r) denote the set $\{p_1,\ldots,p_m\}$ of head atoms, and B(r) denote the set $\{\lnot p_{m+1},\ldots, \lnot p_k, p_{k+1}, \ldots,p_n\}$ of body literals. A rule r of the form (2) is said to be disjunctive if $m \geq 2$ , normal if $m=1$ , and a constraint if $m=0$ . Moreover, a rule r of the form $\{p_1, p_2, \ldots, p_m\} \leftarrow{} \lnot p_{m+1}, \ldots, \lnot p_{k}, p_{k+1}, \ldots, p_n$ is called choice rule and defined here as a shortcut for the following rules: $p_1 \vee p_1' \leftarrow{} B(r),$ $p_2 \vee p_2' \leftarrow{} B(r),$ $\ldots,$ $p_m \vee p_m' \leftarrow{} B(r),$ where $p_1', \ldots, p_m'$ are fresh atoms not appearing in other rules. Note that modern ASP solvers do not usually create any auxiliary atom to handle choice rules. For an expression (SAT formula or ASP program) $\gamma$ , $\mathit{atoms}(\gamma)$ denotes the set of (aggregate) atoms occurring in $\gamma$ .
Example 2 Let $\Pi_\mathit{run}$ be the following program:
Note that $r_1$ and $r_2$ are disjunctive rules, $r_3$ is a normal rule, and ${\rm {\small SUM}}\{1 : p_1, 2 : p_2, 4 : p_4\} \geq 7$ is an aggregate atom. $\lhd$
The dependency graph $G_\Pi$ of $\Pi$ has nodes $\mathit{atoms}(\Pi)$ , and an arc xy, where x and y are (aggregate) atoms, for each rule $r \in \Pi$ such that $x \in H(r)$ and $y \in B(r)$ . An atom is recursive in $\Pi$ if it is involved in a cycle of $G_\Pi$ . In the following, every program $\Pi$ is assumed to have no recursive aggregate atoms. Note that $\Pi_\mathit{run}$ has such a property.
Semantics.
An interpretation I is a set of (aggregate) atoms. Given an interpretation I, relation $\models$ is defined as follows:
-
for an atom p, $I \models p$ if $p \in I$ ; while $I \models \lnot p$ if $p\not\in I$ ;
-
for an aggregate atom p of the form (1), $I \models p$ if $\sum_{(w,l) \in \mathit{elem}(p),\ I \models l}{w} \geq \mathit{bound}(p)$ ; while $I \models\lnot p$ if $\sum_{(w,l) \in \mathit{elem}(p),\ I \models l}{w} < \mathit{bound}(p)$ ;
-
for a clause c, $I \models c$ if $I \models l$ for some $l \in c$ ;
-
for a rule r of the form (2), $I \models B(r)$ if $I \models l$ for all $l \in B(r)$ , $I \models H(r)$ if $I \models p$ for some $p \in H(r)$ , and $I \models r$ if $I \models H(r)$ whenever $I \models B(r)$ ;
-
for a SAT formula $\varphi$ , $I \models \varphi$ if $I \models c$ for all $c \in \varphi$ ;
-
for a program $\Pi$ , $I \models \Pi$ if $I \models r$ for all $r \in \Pi$ .
For an expression (SAT formula or ASP program) $\gamma$ , I is a model of $\gamma$ if $I \models \gamma$ .
Example 3 Consider $\varphi_{run}$ of Example 1 and $\Pi_{run}$ of Example 2, and $I=\{p_4\}$ . $I \models \varphi_{run}$ and $I \models \Pi_{run}$ . $\lhd$
The reduct $\Pi^I$ of a program $\Pi$ with respect to an interpretation I is $\{H(r) \leftarrow{} B(r) \ :\ r \in \Pi, I \models B(r)\}$ (Faber et al. Reference Faber, Pfeifer and Leone2011). A model I is a stable model of a program $\Pi$ if there is no model J of $\Pi^I$ such that $J \subset I$ .
Example 4 Consider $\Pi_{run}$ of Example 2 and $I=\{p_4\}$ . The reduct $\Pi_{run}^{I}$ is equal to $p_1 \vee p_4 \leftarrow{}$ . Thus, I is a stable model. $\lhd$
2.2 Input format
Modern SAT and ASP solvers usually take as input CNF formulae and ASP programs represented by means of a numeric format. Concerning SAT, the numeric format is called DIMACS. Figure 1 shows the representation of the formula $\varphi_{run}$ of Example 1 in the DIMACS format. The first line, starting by p, gives information about the formula: the instance is in CNF, and the numbers of atoms and clauses, respectively, are provided. In the DIMACS format each atom is uniquely identified by a number. After the initial descriptive line, clauses are listed. Each clause is a sequence of distinct non-null numbers ending with 0 on the same line. Positive numbers denote the corresponding positive literals, while negative numbers represent negative literals.
Concerning ASP programs, they are usually represented in the lparse format (Syrjänen Reference Syrjänen2002). As in the DIMACS format, each atom is uniquely identified by a number and each rule is represented by a sequence of numbers. Rules are listed, and each rule starts with an identifier of the rule type, as follows:
-
1 represents normal rules and constraints;
-
2 represents aggregate atoms of the type count;
-
3 represents choice rules;
-
5 represents aggregate atoms of the type sum;
-
8 represents disjunctive rules.
Figure 2 shows the representation of the program $\Pi_{run}$ of Example 2 in the lparse format. In particular, consider line 8 2 4 5 2 1 2 3 representing the disjunctive rule $r_2$ . The first number, 8, is the identifier of the rule type, the number 2 represents $|H(r_2)|$ , and 4 and 5 are the numeric identifiers of atoms $p_2$ and $p_3$ , respectively. Then, 2 and 1 represent $|B(r_2)|$ and $|B(r_2)^-|$ , respectively. Finally, 2 and 3 are the identifiers of the atoms $p_4$ and $p_1$ , respectively. Concerning the aggregate atom appearing in rule $r_3$ , this is represented by the line 5 6 7 3 0 3 5 2 1 2 4, where 5 is the identifier of the rule type, the number 6 represents the numeric identifier of the aggregate atom, 7 is the bound of the aggregate atom, then 3 and 0 represent the number of literals and negative literals in $\mathit{elem}()$ , respectively. Then, literals $p_1$ , $p_2$ , and $p_4$ are listed followed by their corresponding weights. Note that rule $r_3$ is represented by 1 7 1 0 6, where 7 and 6 are the identifiers of $p_5$ and of the aggregate atom, respectively.
2.3 Automated configuration techniques
Many algorithms have parameters that can be adjusted to optimise performance (in terms of, e.g. solution cost, or runtime to solve a set of instances). Formally, this problem can be stated as follows: given a parameterised algorithm with possible configurations $\mathcal{C}$ , a benchmark set $\Pi$ , and a performance metric $m(c,\pi)$ that measures the performance of a configuration $c\in\mathcal{C}$ on an instance $\pi\in\Pi$ (the lower the better), find a configuration $c\in\mathcal{C}$ that minimises m over $\Pi$ , that is, that minimises
The AI community has developed dedicated algorithm configuration systems to tackle this problem Hutter et al. (Reference Hutter, Hoos, Leyton-Brown and Stützle2009); Ansótegui et al. (2009); Yuan et al. (Reference Yuan, Stützle and Birattari2010) In this work, we exploit the sequential model-based algorithm configuration method SMAC (Hutter et al. Reference Hutter, Hoos and Leyton-Brown2011), which represents the state of the art of configuration tools and, differently from other existing tools, can handle continuous parameters. SMAC uses predictive models of algorithm performance (Hutter et al. Reference Hutter, Xu, Hoos and Leyton-Brown2014) to guide its search for good configurations. It uses previously observed $\langle{}$ configuration, performance $\rangle{}$ pairs $\langle{}c, f(c)\rangle{}$ and supervised machine learning (in particular, random forests (Breiman Reference Breiman2001)) to learn a function $\hat{f}:\mathcal{C} \rightarrow \mathbb{R}$ that predicts the performance of arbitrary parameter configurations, and is used to select a promising configuration. Random forests are collections of regression trees, which are similar to decision trees but have real values (here: CPU-time performance) rather than class labels at their leaves. Regression trees are known to perform well for categorical input data. Random forests share this benefit and typically yield more accurate predictions; they also allow to quantify the uncertainty of a prediction. The performance data to fit the predictive models are collected sequentially.
In a nutshell, after an initialisation phase, SMAC iterates the following three steps: (1) use the performance measurements observed so far to fit a random forest model $\hat{f}$ ; (2) use $\hat{f}$ to select a promising configuration $c \in \mathcal{C}$ to evaluate next, trading off exploration of new parts of the configuration space and exploitation of parts of the space known to perform well; and (3) run c on one or more benchmark instances and compare its performance to the best configuration observed so far.
In order to save time in evaluating new configurations, SMAC first evaluates them on a single training instance; additional evaluations are only carried out (using a doubling schedule) if, based on the evaluations to date, the new configuration appears to outperform SMAC’s best known configuration. Once the same number of runs has been evaluated for both configurations, if the new configuration still performs better then SMAC updates its best known configuration accordingly.
SMAC is an anytime algorithm (or interruptible algorithm) that interleaves the exploration of new configurations with additional runs of the current best configuration to yield both better and more confident results over time. As all anytime algorithms, SMAC improves performance over time, and for finite configuration spaces it is guaranteed to converge to the optimal configuration in the limit of infinite time.
SMAC has been used for configuring knowledge models in the fields of AI Planning (Vallati et al. Reference Vallati, Hutter, Chrpa and McCluskey2015; Vallati and Serina Reference Vallati and Serina2018) and Abstract argumentation (Cerutti et al. Reference Cerutti, Vallati and Giacomin2018).
3 Knowledge configuration
In both SAT formulae and ASP programs, clauses, and rules are usually not ordered following a principled approach, but they are ordered according to the way in which the randomised generator has been coded, or following the way in which information from the application domain has been collected, or deliberately shuffled to prevent potential biases. This is also generally true for the order in which literals of a given clause are presented in the formula, or in the program, with some differences among SAT clauses and ASP rules. However, we should consider that rules must preserve the head-body structure, so only rule’s bodies are amenable to configuration; moreover, literals in the positive and negative parts of the body can not be mixed. On the other hand, ASP programs contain further degrees of freedom given that the ASP language allows for a number of additional constructs, like aggregates.
In this section, we focus on the following question: given the set of clauses/rules, and the set of corresponding literals, in which order should they be listed to maximise the performance of a given solver, taking into account for ASP existing constraints above-mentioned and more constructs? The underlying hypothesis is that the order in which clauses, rules, and literals are listed can be tuned to highlight elements that are important for satisfying, or demonstrating the unsatisfability, of the considered instance by the considered solver. To answer the above question, here we explain how we have configured SAT formulae and ASP programs, that is, what features have been considered, and how related scores have been computed. Noteworthy, there is a significant body of work in both SAT and ASP that deal with features selection and computation, and they are outlined in Section 5; they are mainly concerned at instance-level, while our goal is to analyse the structure also at clause/rule level, for ordering among those elements.
3.1 Configuration of SAT formulae
The CNF configuration has to be performed online: as soon as a new formula is provided as input, the formula has to be configured before being presented to the solver. In a nutshell, given a set of parameters that can be used to modify the ordering of some aspect of the CNF formula, and given the value assigned to each parameter, the online configuration is performed by re-ordering clauses and literals accordingly. Notably, the value of each parameter has to be provided and can be identified via an appropriate off-line learning step.
Given the depicted online scenario, we are restricted to information about the CNF that can be quickly gathered and that are computationally cheap to extract. Furthermore, the configuration must consider only general aspects that are common to any CNF. As it is apparent, the use of a computationally expensive configuration of a single CNF, that considers elements that are specific to the given CNF, would nullify the potential performance improvement, by drastically reducing the time available for the solver to find a solution (or to demonstrate unsatisfiability).
In this work, we consider the possibility to order clauses according to the following criteria, denoted as $\mathcal{F}_c$ :)(c)
-
(c1) the number of literals of the clause ( $\mathit{size}$ );
-
(c2) the fact that the clause is binary ( $\mathit{bin}$ );
-
(c3) the fact that the clause is ternary ( $\mathit{ter}$ );
-
(c4) the number of positive literals of the clause ( $\mathit{positive}$ );
-
(c5) the number of negative literals of the clause ( $\mathit{negative}$ );
-
(c6) the fact that the clause is binary, and both literals are negative ( $\mathit{bin\_neg}$ );
-
(c7) the fact that the clause has only one negative literal ( $\mathit{only\_one\_neg}$ ).
Atoms can be listed in clauses according to the following criteria, denoted as $\mathcal{F}_m$ :)
-
(m1) the number of clauses in which the atom appears ( $\mathit{occ}$ );
-
(m2) the average size of the clauses in which the atom is involved ( $\mathit{occ\_avg}$ );
-
(m3) the number of binary clauses in which the atom in involved ( $\mathit{occ\_bin}$ );
-
(m4) the number of ternary clauses in which the atom is involved ( $\mathit{occ\_ter}$ );
-
(m5) the number of times the atom appears in clauses as positive ( $\mathit{occ\_pos}$ );
-
(m6) the number of times the atom appears in clauses as negative ( $\mathit{occ\_neg}$ );
-
(m7) the number of times the atom is involved in clauses where all literals are positive ( $\mathit{occ\_all\_pos}$ );
-
(m8) the number of times the atom is involved in clauses where all literals are negative ( $\mathit{occ\_all\_neg}$ ).
Moreover, we also include two additional categorical selectors, denoted as $\mathcal{F}_s$ :
-
(s1) to enable/disable the ordering of literals in the clauses ( $\mathit{ord\_lit}$ );
-
(s2) to order clauses according to the ordering (direct or inverse) followed by the involved literals ( $\mathit{ord\_cl}$ );
The set of proposed ordering criteria, denoted as $\mathcal{F}= \mathcal{F}_c \cup \mathcal{F}_m \cup \mathcal{F}_s$ , is aimed at being as inclusive as possible, so that different characterising aspects of clauses and atoms can be taken into account, at the same time, for the configuration process.
It is easy to notice that many of the introduced criteria focus on aspects of binary and ternary clauses. This is due to their importance in the search process. For instance, binary clauses are responsible, to a great degree, of unit propagation. There are also criteria that aim at identifying potentially relevant aspects. For instance, criterion (c7) aims at identifying clauses that may represent implication relations between literals.
There are different ways for encoding the identified degrees of freedom in CNFs as parameters. This is due to the fact that orders are not natively supported by general configuration techniques (Hutter et al. Reference Hutter, Hoos and Leyton-Brown2011; Kadioglu et al. Reference Kadioglu, Malitsky, Sellmann and Tierney2010a). Results presented by Vallati et al. (Reference Vallati, Hutter, Chrpa and McCluskey2015) suggest that purely categorical parametrisations are not indicated for the configuration of models, as they tend to fragment the configuration space and to introduce discontinuities. Those combined aspects make the exploration of the configuration space particularly challenging for learning approaches. For this reason, here we generate 7 continuous parameters for configuring the order of clauses and 8 continuous parameters for configuring the order of variables in clauses. Each parameter corresponds to one of the aforementioned criteria, and they have to be combined to generate different possible orderings of clauses and literals in CNFs. Each continuous parameter in $\mathcal{F}_c$ and $\mathcal{F}_m$ has associated a real value in the interval $[-10.0, +10.0]$ which represents (in absolute value) the weight given to the corresponding ordering criterion. Concerning selectors, $\mathit{ord\_lit}$ can assume a Boolean value, 0 or 1, whereas $\mathit{ord\_cl}$ can be 0 if clauses must be ordered using only the features of the literals appearing in the clause, 1 if clauses must be ordered using only the features of the clause, 2 if clauses must be ordered using both the features of the literals and the features of the clauses. Thus, the configuration space is $\mathcal{C} = [-10.0,+10.0]^{15} \times \{0, 1\} \times \{0, 1, 2\}$ . A (total) function $\omega : \mathcal{F} \mapsto [-10.0,+10.0]$ maps parameters in $\mathcal{F}$ to a weight, where $\omega(\mathit{ord\_lit})$ is restricted to be in $\{0,1\}$ and $\omega(\mathit{ord\_cl})$ is restricted to be in $\{0,1,2\}$ , respectively.
The configuration criteria mentioned above can be used to order any CNF. In particular, given a CNF $\varphi$ and a weight function $\omega$ , the corresponding configuration of the formula is obtained as follows. For each atom p occurring in $\varphi$ , an ordering score of p is defined as:
where c is a criterion for configuring literals’ order in the set $\mathcal{F}_m$ (i.e., from (m1) to (m8)), and $value(p, \varphi, c)$ is the numerical value of the corresponding aspect for the atom p. If $\omega(\mathit{ord\_lit}) = 1$ , then, for every clause, the involved literals are ordered (in descending order) following the score $O_{at}$ . Ties are broken following the order in the original CNF configuration. As it is apparent from equation (3), a positive (negative) value of $\omega(c)$ can be used to indicate that the aspect corresponding to the parameter c is important for the SAT solver, and that literals with that aspect should be listed early (late) in the clause to improve performance. If $\omega(\mathit{ord\_lit}) = 0$ , literals follow the order as in the provided initial CNF.
Similarly to what is presented in equation (4) for literals, clauses are ordered according to a corresponding score $O_{cl}(\mathit{cl}, \varphi, \omega)$ , defined as follows:
Example 5 Let us consider again the CNF $\varphi_{run}$ of Example 1 and reported, using the DIMACS format, in Figure 3. Suppose that we are interested in listing clauses according to their length (criterion (c1)) and to the number of involved negative literals (criterion (c5)). Similarly, we are interested in listing the literals of a clause according to the number of clauses in which they appear (criterion (m1)). In this case, we have to set $\omega(\mathit{size}) = 10.0$ , $\omega(\mathit{negative}) = 10.0$ , $\omega(\mathit{occ}) = 10.0$ , $\omega(\mathit{ord\_lit}) = 1$ , and $\omega(\mathit{ord\_cl}) = 1$ , whereas $\omega(c) = 0.0$ for all other criterion c in $\mathcal{F}$ . Then, $O_{\mathit{cl}}(``\texttt{2 3 -1 -4 0}", \varphi_{run}, \omega) = 60.0$ , since it involves 4 literals, and 2 of them are negative, thus $4 \cdot 10.0 + 2 \cdot 10.0 = 60.0$ . According to the same criteria, $O_{\mathit{cl}}(``\texttt{1 -3 0}", \varphi_{run}, \omega) = 30.0$ . In a similar way, but considering the corresponding criterion, the score of literals can be calculated, and literals are then ordered accordingly in each clause. Result is $\varphi_{\mathit{conf}}$ reported in Figure 3. Note that the first line of the considered CNF formula is unmodified, as the DIMACS format require it to be the first, and to present information in a given order. $\lhd$
The way in which the considered ordering criteria are combined, via equations (4) and (5), gives a high degree of freedom for encoding and testing different configurations. Very specific aspects can be prioritised: for instance, it would be possible to present first clauses that are binary, and where both literals are positive, by penalising criterion (c5) and giving a high positive weight to criterion (c2). Furthermore, additional criteria can be added, with no need to modify or update the overall configuration framework.
3.2 Configuration of ASP programs
In this subsection, instead, we turn our attention to the configuration of ASP programs. Similarly to SAT, we generate 23 continuous parameters for configuring the order of rules and aggregates. Each parameter corresponds to a feature that is syntactic and easy to compute, and they have to be combined to generate different possible orderings of rules and aggregates. Each continuous parameter has an associated real value in the interval $[-10.0, +10.0]$ which represents (in absolute value) the weight given to the corresponding ordering criterion. The continuous parameters are detailed in the following:)
-
(k1) occurrences of a literal in heads ( $\mathit{head\_occ}$ )
-
(k2) occurrences of a literal in bodies ( $\mathit{body\_occ}$ )
-
(k3) occurrences of a literal in positive part of bodies ( $\mathit{pos\_body\_occ}$ )
-
(k4) occurrences of a literal in negative part of bodies ( $\mathit{neg\_body\_occ}$ )
-
(k5) occurrences of a literal in bodies of “short” size ( $\mathit{short\_body\_occ}$ )
-
(k6) occurrences of a literal in positive part of bodies of “short” size (short_pos_body_occ)
-
(k7) occurrences of a literal in negative part of bodies of “short” size (short_neg_body_occ)
-
(k8) occurrences of a literal in aggregates ( $\mathit{aggregate\_occ}$ )
-
(k9) constraints ( $\mathit{constraints}$ )
-
(k10) normal rules ( $\mathit{normal}$ )
-
(k11) disjunctive rules ( $\mathit{disjunctive}$ )
-
(k12) choice rules ( $\mathit{choice}$ )
-
(k13) literals in the body ( $\mathit{body}$ )
-
(k14) literals in the positive part of the body ( $\mathit{p\_body}$ )
-
(k15) literals in the negative body of the body ( $\mathit{n\_body}$ )
-
(k16) ratio between positive and negative body literals ( $\mathit{ratio\_pos\_neg}$ )
-
(k17) Horn bodies ( $\mathit{horn}$ )
-
(k18) recursive atoms in heads ( $\mathit{rec\_head}$ )
-
(k19) recursive atoms in bodies ( $\mathit{rec\_body}$ )
-
(k20) binary or ternary rules ( $\mathit{short}$ )
-
(k21) aggregates ( $\mathit{aggregate}$ )
-
(k22) aggregate size ( $\mathit{aggregate\_size}$ )
-
(k23) ratio between aggregate size and bound ( $\mathit{aggregate\_ratio\_bound\_size}$ )
Given the structure of ASP programs, richer than SAT formulae, it is not as straightforward and compact to calculate scores as for SAT formulae; in order to calculate the final score, we have introduced three functions O l , O r , and O a for calculating scores for literals, rules and aggregate atoms, respectively, that take as input a program, an element (a literal, a rule or an aggregate atom, respectively), and a weight function $\omega : \mathcal{F} \mapsto [-10.0, 10.0]$ , where $\mathcal{F} = \{\mathit{head\_occ}, \mathit{body\_occ}, \ldots, \mathit{aggregate\_ratio\_bound\_size}\}$ , that is, it includes all the features reported from (k1) to (k23). The output of the three functions is the score of the element, computed as a sum of individual contributions brought by the features linked to the element. The score of rules and aggregates is later on used to order the rules of ASP programs.
Function O l computes the score of a given literal l by summing up, from line 2 to 9, all single contributions of features (k1)–(k8), by multiplying the number of times l “falls” in the category described by the respective feature to the weight of the feature. As an example, line 4, related to feature $\mathit{pos\_body\_occ}$ , gives a contribution to s obtained by multiplying the number of times literal l occurs in positive bodies of the program $\Pi$ and the weight of the feature. computes the score of a rule r.
Function Or computes the score of a rule r.
Depending of whether r is a constraint, a normal or disjunctive, or a choice rule, one of the lines from 2 to 5 is activated. If r is a choice, an additional factor $t_1$ is considered, which is an arbitrary large value, set to $10^5$ in our experiments, and (from the configuration side) means to put priorities to such rules. Then, lines from 6 to 12 contribute further to the score, as a bonus, for features (k13)–(k20): lines 6, 9, and 10 work similarly as within function O 1, while lines 7, 8, and 11 behave similarly to lines 2–5 in this function for the respective feature. Finally, line 12 employs function O l to compute a score that is later on divided by the number of literals appearing in the rule.
Function O u computes the score for an aggregate atom a. In particular, line 1 takes into account features (k21) and (k22), related to the presence of aggregates and its size, giving a high reward (value $t_2$ set to $10^5$ ) to the presence of aggregates as for choice rules before, line 2 considers the ratio between bound and size of the aggregate, while line 3 has a similar behaviour as of line 12 of the function O r .
Example 6 Consider again the program $\Pi_{run}$ of Example 2 and its lparse representation, reported in Figure 4. Suppose that we are interested in ordering the program by giving a high priority to aggregates and then to give additional priorities to rules according to the atoms that occur in negative bodies. This can be done by leaving all the parameters to the default value $0.0$ , but $\omega(\mathit{aggregate})$ and $\omega(\mathit{p\_body})$ that are both set to 10.0. In particular, the atom with id 2 occurs in the negative body of the second rule of $\Pi_{run}$ , while other atoms do not occur in the negative body. Thus, O t (2, $\Pi_{run}$ , $\omega$ ) returns 10.0, whereas O l (l, $\Pi_{run}$ , $\omega$ ) returns 0.0 for $l \in \{1,3,4,5,6,7\}$ . Then, O r (“8 2 2 3 0 0", $\Pi_{run}$ , $\omega$ ) and O r (“8 2 4 5 2 1 2 3", $\Pi_{run}$ , $\omega$ ) return 5.0 and 2.5, respectively, whereas O r (“1 7 1 0 6", $\Pi_{run}$ , $\omega$ ) returns 0.0, and O a (“5 6 7 3 0 3 5 2 1 2 4", $\Pi_{run}$ , $\omega$ ) returns 100003.33. Result is $\Pi_{\mathit{conf}}$ reported in Figure 4. $\lhd$
4 Experimental analysis
This experimental analysis aims at evaluating the impact of the proposed automated approach for performing the configuration of knowledge models, on state-of-the-art domain-independent solvers’ performance from SAT and ASP.
4.1 Experimental settings
In this work, we use the state-of-the-art SMAC (Hutter et al. Reference Hutter, Hoos and Leyton-Brown2011) configuration approach for identifying a configuration of the knowledge model that aims at improving the number of solved instances and the PAR10 performance of a given solver. PAR10 is the average runtime where unsolved instances count as $10 \times$ cutoff time. PAR10 is a metric commonly exploited in machine learning and algorithm configuration techniques, as it allows to consider coverage and runtime at the same time (Eggensperger et al. Reference Eggensperger, Lindauer and Hutter2019).
For each solver, a benchmark-set specific configuration was generated using SMAC 2.08. A dedicated script, either in Python 2.7 or in C++, is used as a wrapper for extracting information from a knowledge model and, according to the parameters’ values, reconfigure it and provide it as input for the solver.
Experiments, on both SAT and ASP instances, were run on a machine executing Linux Ubuntu 4.4.0-104 and equipped with Intel Xeon 2.50 Ghz processors. Each SMAC configuration process, that is, for each pair $\langle solver, benchmark\ set\rangle$ , has been given a budget of 7 sequential CPU-time days, and run on a dedicated processor.
To compare performance, as mentioned we rely on the number of solved instances, the PAR10, and the IPC score. For a solver $\cal R$ and an instance p to be solved, Score $({\cal R},p)$ is defined as:
where $T^*_{p}$ is the minimum amount of time required by any compared system to solve the instance, and $T_{p}({\cal R})$ denotes the CPU time required by $\cal R$ to solve the instance p. Higher values of the score indicate better performance, where the best performing solver obtains a score equals to 1.
All the executables, benchmarks, instances, and generators used in the experiments are available at https://www.mat.unical.it/~dodaro/research/aspsatconfig.
4.2 Configuration of SAT formulae
We selected 3 SAT solvers, based on their performance in recent SAT competitions and their widespread use: cadical version sc17 (Biere Reference Biere2017), glucose 4.0 (Audemard et al. Reference Audemard, Lagniez, Simon, Järvisalo and Gelder2013), and lingeling version bbc (Biere Reference Biere2017).
In designing this part of the experimental analysis, we followed the Configurable SAT Solver Challenge (CSSC) (Hutter et al. Reference Hutter, Lindauer, Balint, Bayless, Hoos and Leyton-Brown2017). The competition aimed at evaluating to which extent SAT solvers’ performance can be improved by algorithm configuration for solving instances from a given class of benchmarks. In that, the CSSC goals are similar to the goals of this experimental analysis, that is, assessing how performance can be improved via configuration, thus their experimental settings are deemed to be appropriate for our analysis. However, CSSC focused on the configuration of SAT solvers’ behaviour by modifying exposed parameters of solvers. In this work, we do not directly manipulate the behaviour of SAT solvers via exposed parameters, but we focus on the impact that the configuration of a CNF formula can have on solvers.
Following CSSC settings, a cut-off of 5 CPU-time minutes, and a memory limit of 8 GB of RAM, has been set for each solver run on both training and testing instances. This is due to the fact that many solvers have runtime distributions with heavy tails (Gomes et al. Reference Gomes, Selman, Crato and Kautz2000), and that practitioners often use many instances and relatively short runtimes to benchmark solvers for a new application domain (Hutter et al. Reference Hutter, Lindauer, Balint, Bayless, Hoos and Leyton-Brown2017). There is also evidence that rankings of solvers in SAT competitions would remain similar if shorter runtimes are enforced (Hutter et al. Reference Hutter, Hoos and Leyton-Brown2010).
We chose benchmark sets from the CSSC 2014 edition (Hutter et al. Reference Hutter, Lindauer, Balint, Bayless, Hoos and Leyton-Brown2017), and the benchmarks used in the Agile track of the 2016 SAT competition. Footnote 1 These two competitions provide benchmarks that can highlight the importance of configuration (CSSC) even though a different type of configuration than the one considered in this paper and that include instances that have to be solved quickly (Agile). In particular, CSSC benchmarks can allow us to compare the impact of the proposed CNF configuration with regard to the solvers’ configuration.
Selected CSSC 2014 benchmark sets include: Circuit Fuzz (Industrial track), 3cnf, K3 (Random SAT+UNSAT Track), and Queens and Low Autocorrelation Binary Sequence (Crafted track). Footnote 2 Benchmark sets were selected in order to cover most of the tracks considered in CSSC, and by checking that at least 20% of the instances were solvable by considered solvers, when run on the default CNFs. Benchmarks were randomly divided into training and testing instances, aiming at having between 150 and 300 instances for testing purposes, and a similar amount of benchmarks for training. The size of each testing set is shown in Table 1.
Table 1 summarises the results of the selected SAT solvers on the considered benchmark sets. Results are presented in terms of the number of timeouts on testing instances, achieved by solvers run using either the default or the configured CNFs. Indeed, all of the considered solvers benefited from the configuration of the CNFs. Improvements vary according to the benchmark sets: the Agile16 set is, in general, the set where the solvers gained more by the use of configured CNFs. Remarkably, the improvements observed in Table 1 are comparable to those achieved in CSSC 2013 and 2014, that were achieved by configuring the solvers’ behaviour (Hutter et al. Reference Hutter, Lindauer, Balint, Bayless, Hoos and Leyton-Brown2017). In fact, these results may confirm our intuition that the way in which clauses and literals are ordered has an impact on the way in which solvers explore the search space. Listing “important” clauses earlier may lead the solver to tackle complex situations early in the search process, making it then easier to find a solution. In that, it may be argued that a solver’s behaviour can be controlled internally, by modifying its exposed parameters, and externally by ordering the CNF in a suitable way.
Interestingly the overall results (last row of Table 1) indicate that the CNF configuration does not affect all the solvers in a similar way, and that can potentially lead to rank inversions in competitions or comparisons. This is the case of lingeling (on configured formulae) and cadical on “default” formulae. This may suggest that current competitions could benefit by exploiting a solver-specific configuration, in order to mitigate any implicit bias due to the particular CNF configuration exploited. Randomly listing clauses and variables may of course remove some bias, but it can also be the case that different biases are introduced. In that sense, allowing solvers to be provided with a specifically-configured CNF may lead to a better comparison of performance. Finally, it is worth noting that the way in which the CNFs are configured varies significantly between solvers, as well as according to the benchmark set. In other words, there is not a single ordering that allows to maximise the performance of all the SAT solvers at once.
In Tables 2 and 3, the performance of a solver run on the default and configured formulae are compared in terms of IPC score and PAR10. Results indicate that the configuration provides, for most of the benchmark sets, a noticeable improvement.
To shed some light on the most relevant aspects of the SAT formula configuration, we assessed the importance of parameters in the considered configurations using the fANOVA tool (Hutter et al. Reference Hutter, Xu, Hoos and Leyton-Brown2014). We observed that in most of the cases, improvements are mainly due to the effect of the correct configuration of a single criterion, rather than to the interaction of two or more criteria together. In terms of clauses, parameters controlling the weight of criteria (c4) and (c5) are deemed to be the most important: in other words, the number of positive (or negative) literals that are involved in a clause are a very important aspect for the performance of SAT solvers. The solver that can gain the most by ordering the clauses is lingeling. In particular, this solver shows best performance when clauses with a large number of negative literals are listed early.
Parameters related to criteria (m2), (m6), and (m8) have shown to have a significant impact with regard to the literals’ ordering in clauses. For glucose and cadical, criterion (m2), that is, the average size of the clauses in which the literal is involved, is the most important single criterion that has to be correctly configured. However, it is a bit hard to derive some general rules, as their impact on orderings vary significantly with regard to the solver and the benchmark set. Generally speaking, also in the light of the criteria that are most important for clauses, the ordering of literals appears to be the most important in a CNF formulae: this is also because, in many cases, clauses are ordered according to the (separately-calculated) weight of the involved literals. This behaviour can be due to the way in which data structures are generated by solvers: usually literals are the main element, that is also the focus of heuristic search used by SAT solvers. Instead, clauses from the CNF tend to have a less marked importance during the exploration of the search space, as they are related to literals mostly via lists, and are exploited only for checking satisfiability and performing unit propagation. Clauses learnt during the search process are not included in our analysis, as they are not part of the CNF formula–but are generated online by the solver.
Finally, we want to test if there is a single general configuration that improves the performance of a solver on any formula, despite of the benchmark and underlying structure. Therefore, we trained each of the considered solvers on a training set composed by an equal proportion of instances from each of the 6 benchmark sets. As for previous configurations, we gave 5 days of sequential CPU-time for each learning process and obtained configurations have been tested on an independent testing set that includes instances from all the benchmark sets. Results are presented in Table 4.
Results on the independent testing set indicate that this sort of configuration has a very limited impact on solvers’ performance. This seems to confirm our previous intuition that solvers require differently configured formulae according to the underlying structure of the benchmark: it is therefore the case that structurally different sets of instances require a very different configuration. Intuitively, this seems to point to the fact that, in different structures, the characteristics that identify challenging elements to deal with, vary. Solvers, when dealing with different sets of benchmarks, are then sensitive to different aspects of the CNF formulae, that should be appropriately highlighted and configured. On the one hand, this result may be not fully satisfying, as it suggests that there is not a quick way to improve the performance of SAT solvers. On the other hand, the results of the other experiments indicate that, for real-world applications of SAT where instances share some underlying structure, there is the possibility to further improve the SAT solving process by identifying a specific configuration for the solver at hand. As a further observation, we remark that the presented results, achieved on testing set instances, are comparable to those observed on the training set used by SMAC for the configuration process. This confirms the ability of the learned knowledge to generalise on different instances.
4.3 Configuration of ASP programs
We selected 3 ASP solvers, based on their performance in recent competitions, on the different approaches implemented, and for their widespread use: clasp, (Gebser et al. Reference Gebser, Kaufmann and Schaub2012), lp2sat (Janhunen Reference Janhunen2018), and wasp (Alviano et al. Reference Alviano, Dodaro, Leone, Ricca, Calimeri, Ianni and Truszczynski2015).
We chose benchmark sets used in ASP competitions for which either a sufficiently large number of instances or a generator is available. Selected benchmark sets include: Graceful Graphs, Graph Colouring, Hamiltonian, Incremental Scheduling, and Sokoban. Concerning Incremental Scheduling and Sokoban, we use all the instances submitted to the 2015 ASP Competition (Gebser et al. Reference Gebser, Maratea and Ricca2017), whereas for Graceful Graphs, Graph Colouring and Hamiltonian instances were randomly generated. Benchmarks were randomly divided into training and testing instances, aiming at having between 50 and 100 instances for testing purposes, and a 200–500 instances for training. In this setting, we considered a cutoff of 10 CPU-time minutes, and a memory limit of 8 GB of RAM. Table 5 summarises all the ASP constructs that are available for each tested benchmark. Moreover, concerning lp2sat, there are several levels of configurations. In particular, it is possible (i) to configure the ASP input and then to configure the SAT formula, (ii) to configure only the SAT formula, (iii) to configure only the ASP input. We have conducted a preliminary experiment analysis and in the following we report only the results of (iii) since they are the ones that obtained the best performance. Note that no domain contains (stratified) disjunction, so in this restricted setting ASP is as expressive as SAT (i.e., point (d) in the introduction is not leveraged). We are not aware of any publicly available benchmark containing disjunctive rules with a huge number of instances and/or generators, therefore we could not extend our analysis to this kind of programs.
Table 6 summarises the results of the selected solvers on the considered benchmark sets. Results are presented in terms of number of solved testing instances, achieved by solvers run using either the default or the configured ASP program. It is interesting to notice that, also for ASP, solvers are differently affected by the use of the configured knowledge. On the one hand, clasp does not benefit by the configuration in terms of coverage. On the other hand, the configuration has a widespread beneficial impact on the performance of wasp. lp2sat sits between the two sides of the spectrum: the configuration provided a significant improvement to the coverage performance on a single domain, Sokoban.
In Tables 7 and 8, the performance of a solver run on the default and configured programs are compared in terms of IPC score and PAR10, respectively. Here it is possible to observe that the configuration has a beneficial impact on the considered solvers in most of the benchmark domains. When considering the performance of clasp in Hamiltonian, for instance, we observed an average runtime drop from 32 to 26 CPU-time seconds. Similar improvements have been observed also for the other solvers. The presented results indicate that the configuration of ASP programs can improve the runtime performance of ASP solvers.
Finally, we also test if there is a single general configuration that improves the performance of a solver on any ASP program, despite of the benchmark and underlying structure. As in the SAT counterpart, we trained each of the considered solvers on a training set composed by an equal proportion of instances from each of the benchmark sets. It should come as no surprise that the results indicate that this sort of configuration has no significant impact on ASP solvers’ performance. For no one of the considered solvers it has been possible to identify a configuration able to improve the average performance.
Also for ASP solvers, we used the fANOVA tool to identify the most relevant aspects of the configuration process. The lack of a general configuration that allows to improve average performance of the solvers, suggests that the importance of a configuration parameter depends on the benchmark domain. In other words, the same element can be more important in a domain, but almost irrelevant in another, according the the structure of the instances to be solved. Looking at the configured ASP program identified for lp2sat in the Sokoban domain, it appears that features related to occurrences of literals in heads (k1), and to occurrences of literals in bodies are among the most relevant criterion. In particular, the 5 most important parameters identified by the fANOVA tool are (k1), (k23), (k7), (k15), and (k13). By analysing the configurations obtained for wasp it is possible to derive that, for this solver, most of the important criteria are different than those of lp2sat. Considering a domain where the configuration allowed wasp to obtain a significant improvement, Graph Colouring, the 5 most important parameters identified by the fANOVA tool are: (k16), (k1), (k13), (k14), and (k9). Only (k1), that focuses on the importance of the occurrences of a literal in heads, is shared between the configuration of lp2sat and wasp. This suggests that different solvers are more sensitive to different aspects of the ASP program, and therefore require different configurations. To shed some light into the relevant parameters for the same solver across different domains, we analysed the configuration of wasp on the Graceful Graphs domain. In this case, the 5 most important parameters identified by the fANOVA tool are: (k8), (k2), (k9), (k18), (k7). In this case, it is easy to notice that (k1) is not deemed to be relevant for improving the performance of the solver, and only (k9) is relevant for wasp on both Graceful Graphs and Graph Colouring. It is worth reminding that the fANOVA analysis does not provide information on the actual value of the parameters, but only on the impact that a parameter can have on the performance of a solver. On this regards, we observed that also in cases where the same parameter is identified as very important by the fANOVA analysis, its best selected value by SMAC can be different in different domains.
Note that it is difficult to find a high level explanation on why the solvers benefit of this different order, since they present a complex structure, where each (small) change in the order of input might have an impact on several different components. However, in our view, this represents one of the strengths of our approach, since tools for automatic configuration might understand some hidden properties of the instances that are not immediately visible even to developers.
Discussion.
ASP solvers are more complex than SAT solvers, since they have to deal with many additional aspects that are not present in SAT, such as the minimality property of answer sets, and additional constructs, as for example the aggregates. Albeit this additional complexity, we observed that reordering is beneficial for both lp2sat and wasp, while the impact on the performance of clasp is less noticeable. We conducted an additional analysis on the implementation of the solvers and we identified several reasons that can explain the results:
-
The considered solvers do not work directly on the input program, since they use a technique, called Clark’s Completion, that basically produces a propositional SAT formula which is then used to compute the answer sets of the original program. In this context, the impact of the reorder might be mitigated by the additional transformation made by the solvers.
-
Data structures employed by the ASP solvers might deactivate some of the parameters used for the configuration. As an example, wasp uses a dedicated data structure for storing binary clauses which are then checked before other clauses, independently from their order in the input program. clasp extends this special treatment also to ternary clauses. Concerning SAT solvers, as far as we know, only glucose has a similar data structure, whereas lingeling and cadical have an efficient memory management of binary clauses but they do not impose an order among the clauses.
-
Our tool considers the solvers as black boxes and uses their default configurations. Most of them employ parameters that were tuned on instances without specific reordering. Different results might be obtained by employing other configurations or by tuning the heuristics with instances processed by our reordering tool. However, from our perspective, the configuration tool can be already incorporated into grounders/preprocessors as an additional feature, to have a configured instance given to the solvers.
-
Differently from other ASP and SAT solvers, the default configuration of clasp uses the Maximum Occurrence of clauses of Minimum size (MOMS) heuristic to initialize its heuristic parameters. This led to a more uniform behaviour when it is executed on the same instance with different orders. We observed that the behaviour of wasp is much more dependent on the order of the instances as confirmed in our experiment and as also shown in Section 4.4.
4.4 Synthetic experiment
In this section, we report the results of an experimental analysis conducted on a synthetic benchmark. Goal of the experiment is to give an explanation of the different performance between CLASP and WASP, and investigate the different qualitative results achieved in SAT and ASP. As a side effect, we show that it is possible to improve the performance of the ASP solver wasp using a proper ordering of the input program.
In particular, we focused on the following (synthetic) problem:
where rules $r_1$ , $r_2$ and $r_3$ encode the pigeonhole problem with p pigeons and h holes, rules $r_4$ and $r_5$ encode the k-graph colouring problem, and $r_6$ encodes a complete graph with n nodes provided as input to the graph colouring problem. It is possible to observe that $r_1$ , $r_2$ , and $r_3$ admit no stable model when $p > h$ , whereas $r_4$ , $r_5$ , and $r_6$ admit no stable model when $n > k$ . Concerning the pigeonhole problem, it is well known that the performance of CDCL and resolution-based solvers are poor when $p > h$ and p is greater than a given threshold (Biere et al. Reference Biere, Heule, van Maaren and Walsh2009; Haken Reference Haken1985). For instance, clasp terminates after 1.51 s when $h=9$ and $p=10$ , after 17.81 s when $h=10$ and $p=11$ , and it does not terminate within 5 min when $h=11$ and $p=12$ . Similarly, concerning the k-graph colouring problem, large values of k (e.g. $k \geq 10$ ) with $n > k$ are associated with poor performance of the solver. Such properties are important in our case, since we are now able to control the hardness of the instances by properly selecting values of h, p, k, and n. In particular, if the two sub-problems are combined, that is, when the solver is executed on rules from $r_1$ to $r_6$ , then we are able to create hard and easy sub-programs. For instance, if we consider the case with $h=11$ , $p=12$ , $k=5$ , and $n=100$ , then we have that the rules from $r_1$ to $r_6$ admit no stable model, which is hard to prove for rules from $r_1$ to $r_3$ and easy to prove for rules from $r_4$ to $r_6$ . In this case the performance of the solver depends on the sub-program considered at hand. As noted in Section 4, the heuristic of clasp is not dependent on the ordering of the input program and, in this case, it is able to automatically focus on the easy subprogram. On the other hand, we observed that the performance of wasp depends on the processed order of the variables. This behaviour of wasp can be explained by looking at its branching heuristic, which first selects literals with the lowest ids and then it focuses on the sub-problem related to such literals. Clearly, this might lead to poor performance on the programs described above if the hard sub-problem is considered first.
In the following we show that the performance of wasp can be improved by performing an additional step after that the program has been configured, that is, ids of the literals can be sorted according to the value of $O_l$ in descending order. In particular, we report the results of an experimental analysis conducted on instances of the rules $r_1$ – $r_6$ , where $h=10$ , $k=5$ , $p=[20,\ldots,40]$ and $n=[7,\ldots,29]$ . Overall, we considered 483 instances, where 433 were used for the configuration and 50 were used for the testing. Results show that wasp without configuration solves 33 instances out of 50 with a PAR10 equal to 2108.63, whereas wasp after the configuration solves 39 instances out of 50 with a PAR10 equal to 1351.94.
It is important to emphasise that the results are obtained without changing the implementation of the solver. Such changes might be directly included in the grounders (e.g. as additional parameter of the system dlv (Alviano et al. Reference Alviano, Calimeri, Dodaro, Fuscà, Leone, Perri, Ricca, Veltri and Zangari2017)) or in specific tools dedicated to preprocess the input programs.
5 Related work
In both SAT and ASP there have been numerous papers where machine-learning-based configuration techniques based on features computation have been employed. Traditionally, such approaches aimed at modifying the behaviour of the solvers by either configuring their exposed parameters, or by combining different solvers into portfolios. In this work we consider an orthogonal perspective, where we configure the way in which the input knowledge, that is, formula or program, is presented to the solver. This is done with the idea that the way in which instances are formulated and ordered can carry some knowledge about the underlying structure of the problem to be solved. In the following, we present main related literature in SAT and ASP, in two different paragraphs.
SAT.
A significant amount of work in the area has focused on approaches for configuring the exposed parameters of SAT solvers in order to affect their behaviour (Eggensperger et al. Reference Eggensperger, Lindauer and Hutter2019). Well-known examples include the use of ParamILS for configuring SAPS and SPEAR (Hutter et al. Reference Hutter, Hoos, Leyton-Brown and Stützle2009), and of ReACTR for configuring lingeling (Fitzgerald et al. Reference Fitzgerald, Malitsky and O’Sullivan2015). Some approaches also looked into the generation of instance-specific configurations of solvers (Kadioglu et al. Reference Kadioglu, Malitsky, Sellmann and Tierney2010b). This line of work led to the development of dedicated tools, such as SpySMAC (Falkner et al. Reference Falkner, Lindauer and Hutter2015) and CAVE (Biedenkapp et al. Reference Biedenkapp, Marben, Lindauer and Hutter2018), and to the organisation of the dedicated Configurable SAT Solver Challenge (Hutter et al. Reference Hutter, Lindauer, Balint, Bayless, Hoos and Leyton-Brown2017). It also lead to the design and development of solvers, such as SATenstein (KhudaBukhsh et al. Reference KhudaBukhsh, Xu, Hoos and Leyton-Brown2016), that are very modular and natively support the use of configuration to combine all the relevant modules. A large body of works also focused on techniques for automatically configuring portfolios of solvers, such as satzilla (Xu et al. Reference Xu, Hutter, Hoos and Leyton-Brown2008), based on the use of empirical prediction models of the performance of the considered solvers on the instance to be solved (Hutter et al. Reference Hutter, Xu, Hoos and Leyton-Brown2014). Tools for assessing the contribution of different solvers to a portfolio has been introduced (Xu et al. Reference Xu, Hutter, Hoos and Leyton-Brown2012). With regard to portfolio generation, the interested reader is referred to Hurley et al. (Reference Hurley, Kotthoff, Malitsky, Mehta and O’Sullivan2016). The use of configuration techniques to generate portfolios of solvers has also been extensively investigated: HYDRA (Xu et al. Reference Xu, Hoos and Leyton-Brown2010) builds a set of solvers with complementary strengths by iteratively configuring new algorithms; AutoFolio (Lindauer et al. Reference Lindauer, Hoos, Hutter and Schaub2015) uses algorithm selection to optimise the performance of algorithm selection systems by determining the best selection approach and its hyperparameters; finally, in Lindauer et al. (Reference Lindauer, Hoos, Leyton-Brown and Schaub2017) an approach based on algorithm configuration for the automated construction of parallel portfolios has been introduced. The approach we follow relates also to the problem of discovering a backdoor, that is, an ordering that will allow the problem to be solved faster, see, for example Kilby et al. (Reference Kilby, Slaney, Thiébaux, Walsh, Veloso and Kambhampati2005). However, it has to be noted that this is not a characteristic of our approach to configuration, but common to many approaches mentioned above.
ASP.
Inspired by the solver satzilla in the area of SAT, the claspfolio system (Gebser et al. Reference Gebser, Kaminski, Kaufmann, Schaub, Schneider and Ziller2011; Hoos et al. Reference Hoos, Lindauer and Schaub2014) uses support vector regression to learn scoring functions approximating the performance of several clasp variants in a training phase. Given an instance, claspfolio then extracts features and evaluates such functions in order to pick the most promising clasp variant for solving the instance. This algorithm selection approach was particularly successful in the Third ASP Competition (Calimeri et al. Reference Calimeri, Ianni and Ricca2014), held in 2011, where claspfolio won the first place in the NP category and the second place overall (without participating in the BeyondNP category). The measp system (Maratea et al. Reference Maratea, Pulina and Ricca2014; Maratea et al. Reference Maratea, Pulina and Ricca2015b) goes beyond the solver-specific setting of claspfolio and chooses among different grounders as well as solvers. Grounder selection traces back to Maratea et al. (Reference Maratea, Pulina and Ricca2013), and similar to the QBF solver aqme (Pulina and Tacchella Reference Pulina and Tacchella2009), measp uses a classification method for performance prediction. Notably, “bad” classifications can be treated by adding respective instances to the training set of measp (Maratea et al. Reference Maratea, Pulina and Ricca2015a), which enables an adjustment to new problems or instances thereof. Some of the parameters reported in Section 3.2 were also adopted by claspfolio and measp, since they were recognised to be important to discriminate the properties of the input program.In the Seventh ASP Competition (Gebser et al. Reference Gebser, Maratea and Ricca2017), the winning system was i-dlv+s (Calimeri et al. Reference Calimeri, Dodaro, Fuscà, Perri and Zangari2020) that utilises i-dlv (Calimeri et al. Reference Calimeri, Fuscà, Perri and Zangari2017) for grounding and automatically selects back-ends for solving through classification between clasp and wasp. Going beyond the selection of a single solving strategy from a portfolio, the aspeed system (Hoos et al. Reference Hoos, Kaminski, Lindauer and Schaub2015) indeed runs different solvers, sequentially or in parallel, as successfully performed by ppfolio. Given a benchmark set, a fixed time limit per instance, and performance results for candidate solvers, the idea of aspeed is to assign time budgets to the solvers such that a maximum number of instances can be completed within the allotted time. In other words, the goal is to divide the total runtime per computing core among solvers such that the number of instances on which at least one solver successfully completes its run is maximised. The portfolio then consists of all solvers assigned a non-zero time budget along with a schedule of solvers to run on the same computing core. Calculating such an optimal portfolio for a benchmark set is an Optimisation problem addressed with ASP in aspeed. In Dingess and Truszczynski (Reference Dingess and Truszczynski2020), an approach for encoding selection was presented as a strategy for improving the performance of answer set solvers. In particular, the idea is to create an automated process for generating alternative encodings. The presented tool, called Automated Aggregator, was able to handle non-ground count aggregates. Other automatic non-ground program rewriting techniques are presented in Hippen and Lierler (Reference Hippen and Lierler2019).
Further, there has been a recent growing interest in techniques for knowledge model configuration in the areas of AI Planning and Abstract Argumentation. In AI Planning, it has been demonstrated that the configuration of either the domain model (Vallati et al. Reference Vallati, Hutter, Chrpa and McCluskey2015; Vallati et al. Reference Vallati, Chrpa, McCluskey and Hutter2021) or the problem model (Vallati and Serina Reference Vallati and Serina2018) can lead to significant performance improvement for domain-independent planning engines. On the argumentation side, it has been shown that even on syntactically simple models that represent directed graphs, the configuration process can lead to performance improvements (Cerutti et al. Reference Cerutti, Vallati and Giacomin2018).
6 Conclusions
In this paper, we proposed an approach for exploiting the fact that the order in which the main elements of CNF formulae and ASP programs, that is, literals, clauses, and rules, are listed carries some information about the structure of the problem to be solved, and therefore affect the performance of solvers. The proposed approach allows to perform the automated configuration of formulae and programs. In SAT, we considered as configurable the order in which clauses are listed and the order in which literals are listed in the clauses, while for ASP we considered as configurable similar entities, that is, literals and rules, but taking into account that some rule’s structure has to be maintained, and that other powerful constructs like aggregates come into play. In our experimental analysis we configured formulae and programs for improving number of solved instances and PAR10 performance of solvers. The performed analysis, aimed at investigating how the configuration affects the performance of state-of-the-art solvers: (i) demonstrates that the automated configuration has a significant impact on solvers’ performance, more evident for SAT but also significant for ASP, despite the constraints on rule’s structure; (ii) indicates that the configuration should be performed on specific set of benchmarks for a given solver; and (iii) highlights what are the main features and aspects of formulae and programs that have a potentially strong impact on the performance of solvers. Such features can be taken into account by knowledge engineers to encode formulae or programs in a way that supports solvers that will reason upon them.
Our findings can have implications on both solving and encoding side. Given the positive results obtained, solver’s developers should consider rearranging internally the structure of the formula/program in order to optimise their performance, and/or users could their-selves consider these hints while writing the encoding. However, given that such positive results are obtained per-domain, and varies with solvers, care should be taken in doing so.
We see several avenues for future work. We plan to evaluate the impact of configuration on optimisation variants of SAT and ASP, that is, weighted max SAT, or ASP including soft constraints, where the weight of the elements can provide another important information to the configuration process. We are also interested in evaluating if ordering clauses (and literals) that are learnt during the search process of a SAT solver can be beneficial for improving performance, given that all of the solvers employed are based on (variant of) the CDCL algorithm. In this paper, we focused on exact solvers based on CDCL, as future work it can be interesting to consider also SAT solvers based on local search. Concerning ASP solvers, it can be also interesting to check if reordering the non-ground program can have a positive impact on the performance. This would also open the combination of reordering and systems based on lazy-grounding (Weinzierl et al. Reference Weinzierl, Taupe and Friedrich2020). Another interesting future work can be to investigate the joint tuning of ordering and solver parameters. Moreover, in this paper we focused on parameters that are based on our knowledge of existing solvers, involving parameters, among others, that proved to be important to characterised input programs; of course, the inclusion of additional parameters is possible, for example, taking into account theoretical properties of the input programs (see, e.g., (Fichte et al. Reference Fichte, Truszczynski and Woltran2015; Janhunen Reference Janhunen2006; Lifschitz et al. Reference Lifschitz, Pearce and Valverde2001; Gebser and Schaub Reference Gebser and Schaub2013)), or adding symmetric counterparts of current criteria, for example, (c6) and (c7) of the input formula. Finally, we plan to incorporate the re-ordering into existing approaches for configuring portfolios of SAT solvers, such as SATenstein, which works in a similar way as aspeed, but differently from aspeed is not tailored on different configurations of the same solver, in order to further improve performance, and to investigate the concurrent configuration of formulae/programs and solvers.
Conflicts of interest
The authors declare none.