Shortest Violation Traces in Model Checking
Based on Petri Net Unfoldings and SAT*

Victor Khomenko

School of Computing Science, University of Newcastle
Newcastle upon Tyne NE1 7RU, U.K.
e-mail: Victor.Khomenko@ncl.ac.uk

Abstract. Model checking based on the causal partial order semantics of Petri nets is an approach widely applied to cope with the state space explosion problem. One of the possibilities for the verification process is to build a finite and complete prefix and use it for constructing a Boolean formula such that any satisfying assignment to its variables yields a trace violating the property being checked. (And if there are no satisfying assignments then the property holds.)

In this paper a method for computing the shortest violation traces (which can greatly facilitate debugging) is proposed. Experimental results demonstrate that it can achieve significant reductions in the size of the Boolean formula as well as in the time required to compute a shortest violation trace, when compared with a naïve approach.

Keywords: Shortest trace, model checking, Petri net unfolding, SAT, Boolean circuit.

1 Introduction and basic notions

A distinctive characteristic of reactive concurrent systems is that their sets of local states have descriptions which are both short and manageable, and the complexity of their behaviour comes from highly complicated interactions with the external environment rather than from complicated data structures and manipulations thereon. One way of coping with this complexity problem is to use formal methods and, especially, computer aided verification tools implementing model checking — a technique in which the verification of a system is carried out using a finite representation of its state space.

The main drawback of model checking is that it suffers from the state space explosion problem. That is, even a relatively small system specification can (and often does) yield a very large state space. To cope with this, several techniques have been developed, which usually aim either at a compact representation of the full state space of the system, or at the generation of its reduced (though sufficient for a given verification task) state space. Among them, a prominent technique is McMillan’s (finite prefixes of) Petri Net unfoldings (see, e.g., [5, 7]). They rely on the partial order view of concurrent computation, and represent system states implicitly, using an acyclic net, called a prefix.

Most of ‘interesting’ problems for safe Petri nets are $\mathcal{P}\mathcal{S}\mathcal{P}\mathcal{A}\mathcal{C}$-complete [2], but the same problems for prefixes are often in $\mathcal{NP}$ or even $\mathcal{P}$. Though the size

* The full version of this paper [6] is available on-line.
of a finite and complete unfolding prefix can be exponential in the size of the original Petri net, in practice it is often relatively small.

A model checking problem formulated for a prefix can usually be translated into some canonical problem, e.g., Boolean satisfiability (SAT). Then an off-the-shelf SAT solver can be used for efficiently solving it. Such a combination ‘unfolder & solver’ turns out to be quite powerful in practice.

Petri nets A net is a triple \( \mathcal{N} = (P, T, F) \) such that \( P \) and \( T \) are disjoint sets of respectively places and transitions, and \( F \subseteq (P \times T) \cup (T \times P) \) is a flow relation. A marking of \( \mathcal{N} \) is a multiset \( M \) of places, i.e., \( M : P \to \mathbb{N} \) \( \equiv \{0, 1, 2, \ldots \} \). The standard rules about drawing nets are adopted in this paper, viz., places are represented as circles, transitions as boxes, the flow relation by arcs, and the marking is shown by placing tokens within circles. As usual, \( z^n \equiv \{ y \mid (y, z) \in F \} \) and \( z^* \equiv \{ y \mid (z, y) \in F \} \) denote the pre- and postset of \( z \in P \cup T \), and \( Z^n \equiv \bigcup_{z \in Z} z^n \) and \( Z^* \equiv \bigcup_{z \in Z} z^* \), for all \( Z \subseteq P \cup T \). In this paper, the presets of transitions are restricted to be non-empty, i.e., \( *t \neq \emptyset \) for every \( t \in T \). A net system is a pair \( T = (\mathcal{N}, M_0) \) comprising a finite net \( \mathcal{N} \) and an initial marking \( M_0 \). It is assumed that the reader is familiar with the standard notions of the Petri nets theory, such as the enabledness and firing of a transition, marking reachability and deadlock.

Unfolding prefix A finite and complete unfolding prefix \( \pi \) of a Petri net \( \mathcal{T} \) is a finite acyclic net which implicitly represents all the reachable states of \( \mathcal{T} \) together with transitions enabled at those states. Intuitively, it can be obtained through unfolding \( \mathcal{T} \), by successive firings of transition, under the following assumptions: (a) for each new firing a fresh transition (called an event) is generated; (b) for each newly produced token a fresh place (called a condition) is generated. The unfolding is infinite whenever \( \mathcal{T} \) has an infinite run; however, if \( \mathcal{T} \) has finitely many reachable states then the unfolding eventually starts to repeat itself and can be truncated (by identifying a set of cut-off events) without loss of information, yielding a finite and complete prefix. The sets of conditions, events and cut-off events of the prefix are denoted by \( B, E \) and \( E_{\text{cut}} \), respectively. (Note that \( E_{\text{cut}} \subseteq E \).

Efficient algorithms exist for building such prefixes [5], which ensure that the number of non-cut-off events \( |E \setminus E_{\text{cut}}| \) in a complete prefix can never exceed the number of reachable states of \( \mathcal{T} \). Moreover, complete prefixes are often exponentially smaller than the corresponding state graphs, especially for highly concurrent Petri nets, because they represent concurrency directly rather than by multidimensional ‘diamonds’ as it is done in state graphs. For example, if the original Petri net consists of 100 transitions which can fire once in parallel, the state graph will be a 100-dimensional hypercube with \( 2^{100} \) vertices, whereas the complete prefix will coincide with the net itself. Another example, viz. a Petri net modelling two dining philosophers, and a finite and complete prefix of its unfolding, are shown in Fig. 1. One can observe that if this example is scaled up, the size of the prefix is linear in the number of dining philosophers, even though the number of reachable states grows exponentially.
Since $\pi$ is acyclic, the transitive closure of its flow relation is a partial order $\prec$ on $B \cup E$, called the causality relation. (The reflexive order corresponding to $\prec$ will be denoted by $\leq$.) Intuitively, all the events which are smaller than an event $e \in E$ w.r.t. $\prec$ must precede $e$ in any valid execution containing $e$.

Two nodes $x, y \in B \cup E$ are in conflict, denoted $x \# y$, if there are distinct events $e, f \in E$ such that $e \cap *f \neq \emptyset$ and $e \leq x$ and $f \leq y$. Intuitively, no valid execution can contain two events in conflict. Two nodes $x, y \in B \cup E$ are concurrent, denoted $x \bowtie y$, if neither $y \# y'$ nor $y \leq y'$ nor $y' \leq y$. Intuitively, two concurrent events can be enabled simultaneously, and executed in any order, or even concurrently. For example, in the prefix shown in Fig. 1(b) the following relationships hold: $e_1 < e_7$, $e_7 \# e_8$ (due to the choices at $c_2$ and $c_3$) and $e_3 \bowtie e_4$.

The reachable markings of $T$ can be represented using configurations of $\pi$. A configuration is a set of events $C \subseteq E \setminus E_{\text{cut}}$ such that for all $e, f \in C$, $-(e \# f)$ and, for every $e \in C$, $f < e$ implies $f \in C$. For example, in the net shown in Fig. 1(b), \{e_1, e_3, e_4\} is a configuration, whereas \{e_1, e_2, e_3, e_5\} and \{e_1, e_3, e_7\}
Fig. 2. Conversion of a Boolean circuit into a Boolean expression in the CNF.

are not (the former includes events in conflict, $e_3 \neq e_5$, while the latter does not include $e_4 \neq e_7$). Intuitively, a configuration is a partial-order execution, i.e., an execution where the order of firing of some of its events (viz. concurrent ones) is not important; e.g., the configuration $\{e_1, e_3, e_4, e_7\}$ corresponds to two totally ordered executions: $e_1 e_3 e_4 e_7$ and $e_1 e_4 e_3 e_7$. Since a configuration can correspond to multiple executions, it is often much more efficient in model checking to explore configurations rather than executions.

After starting $\pi$ from the implicit initial marking (whereby one puts a single token in each condition which does not have an incoming arc) and executing all the events in $C$, one reaches the marking denoted by $Cut(C)$. $Mark(C)$ denotes the corresponding marking of $T$, reached by firing a transition sequence corresponding to the events in $C$. It is remarkable that each reachable marking of $T$ is $Mark(C)$ for some configuration $C$ of $\pi$, and, conversely, each configuration $C$ of $\pi$ generates a reachable marking $Mark(C)$. Thus various behavioural properties of $T$ can be re-stated as the corresponding properties of $\pi$, and then checked, often much more efficiently.

**Boolean satisfiability** The **Boolean satisfiability problem (SAT)** consists in finding a satisfying assignment, i.e., a mapping $A : Var_\varphi \rightarrow \{0, 1\}$ defined on the set of variables $Var_\varphi$ occurring in a given Boolean expression $\varphi$ such that $\varphi$ evaluates to 1. This expression is often assumed to be given in the conjunctive normal form (CNF) $\varphi = \bigwedge_{i=1}^n \bigvee_{l \in L_i} l$, i.e., it is represented as a conjunction of clauses, which are disjunctions of literals, each literal $l$ being either a variable or the negation of a variable. It is assumed that no two literals in the same clause correspond to the same variable.

In order to solve a Boolean satisfiability problem, SAT solvers perform exhaustive search assigning the values 0 or 1 to the variables, using heuristics to reduce the search space [10]. Some of the leading SAT solvers, e.g., zChaff [8], can be used in the incremental mode, i.e., after solving a particular SAT instance the user can slightly change it (e.g., by adding and/or removing a small number of clauses) and execute the solver again. This is often much more efficient than solving these related instances as independent problems, because on the subsequent runs the solver can use some of the useful information (e.g., learnt clauses [10]) collected so far.

**Boolean circuits** A **Boolean circuit** (see, e.g., [9]) computes a multiple-output Boolean function of Boolean input variables $x_1, \ldots, x_n$. It consists of a finite
number $k$ of gates $G_1, \ldots, G_k$. Each gate $G_i$ is labelled by a Boolean function $f_i$ chosen from some fixed set of Boolean functions $\mathcal{F}$. (In this paper, $\mathcal{F}$ comprises all the unary and binary Boolean functions and conjunctions and disjunctions of arbitrary arity with arbitrary input inversions.) A Boolean circuit can be represented by an acyclic directed graph, where the input variables and the constants 0 and 1 are its sources, and the vertex representing the gate $G_i$ has arity($f_i$) numbered incoming edges from its predecessors in the graph. (If $f_i$ is commutative, the numbering of edges does not have to be specified.) In pictures, each gate is represented as a circle with the function shown within it, and input inversions are shown as "bubbles". Note that $\mathcal{F}$ is closed w.r.t. input inversions, and so they can be incorporated into the corresponding gate function.

The Boolean function $f_v$ computed at a vertex $v$ of this acyclic graph is defined inductively as follows. If $v$ is an input variable $x_j$, then $f_v(x_1, \ldots, x_n) \equiv x_j$, and if it is a constant $c \in \{0, 1\}$ then $f_v(x_1, \ldots, x_n) \equiv c$. Otherwise, the vertex is some gate $G_i$, and $f_v(x_1, \ldots, x_n) \equiv f_i(p_1, \ldots, \text{parity}(f_i))$, where $p_1, \ldots, \text{parity}(f_i)$ are the functions computed at the predecessors of this vertex in the graph. The output vector $(\nu_1, \ldots, \nu_m)$, where $\nu_i$ is some vertex of the graph, describes what the circuit computes, viz. the multiple-output Boolean function $(f_{\nu_1}, \ldots, f_{\nu_m})$.

In particular, any Boolean formula over the signature $\mathcal{F}$ can be represented as a circuit.

It turns out that a Boolean circuit can be efficiently encoded by a Boolean expression $\varphi$ in the CNF depending on the variables $\text{Var}_\varphi$ corresponding to the vertices of the graph representing the circuit (except 0 and 1) such that for any assignment $A : \text{Var}_\varphi \to \{0, 1\}$, $A$ is a satisfying assignment of $\varphi$ iff for every $v \in \text{Var}_\varphi$, $f_v(A(x_1), \ldots, A(x_n)) = A(v)$ (where the variables are denoted by the same symbol as the corresponding vertices of the graph) and $A(0) \equiv 0$ and $A(1) \equiv 1$.

The expression $\varphi$ is constructed as follows. For each gate $G_i$, a new Boolean variable $g_i$ representing its output is created, a Boolean equation relating $g_i$ to the inputs of $G_i$ is written down, and these equations are converted into the CNF. This process is illustrated in Fig. 2. Note that for a gate labelled with a Boolean function of bounded arity, the size of the corresponding equation (and its CNF) is bounded by a constant; moreover, for a gate labelled with a multiple-input conjunction or disjunction, the size of the equation (and its CNF) is linear in the number of gate inputs. Thus the size of the resulting Boolean expression in the CNF is linear in the size of the circuit.

**Model checking based on Petri net unfoldings** This paper concentrates on the following approach to model checking. First, a finite and complete prefix of the Petri net unfolding is built, and it is then used for constructing a Boolean formula encoding the model checking problem at hand. (It is assumed that the property being checked is the unreachability of some 'bad' states, e.g., deadlocks.) This formula is unsatisfiable iff the property holds, and such that any satisfying assignment to its variables yields a trace violating the property being checked.
Typically such a formula would have for each non-cut-off event $e$ of the prefix
a variable $\text{conf}_e$ (the formula might also contain other variables), and for every
satisfying assignment $A$, the set of events $C \equiv \{ e \mid \text{conf}_e = 1 \}$ is a configuration
such that $Mark(C)$ violates the property being checked. The formula often has the form
$\text{CONF} \land \text{VIOL}$. The role of the configuration constraint, $\text{CONF}$, is to
ensure that $C$ is a configuration of the prefix (not just an arbitrary set of
events). $\text{CONF}$ can be defined as the conjunction of the formulae
\[
\bigwedge_{e \in E \setminus E_{cut}} \bigg( \bigwedge_{f \in \{ \text{pre}(e) \}} (\text{conf}_e \rightarrow \text{conf}_f) \bigg) \land \bigwedge_{e \in E \setminus E_{cut}} \bigg( \bigvee_{f \in \{ \text{pre}(e) \}} \neg (\text{conf}_e \land \text{conf}_f) \bigg).
\]
The former formula ensures that if $e \in C$ then its immediate predecessors are
also in $C$, i.e., $C$ is downward closed w.r.t. $\prec$. The latter one ensures that $C$
contains no conflicts. $\text{CONF}$ can be transformed into the CNF by applying the
rules $x \rightarrow y \equiv \neg x \lor y$ and $\neg(x \land y) \equiv \neg x \lor \neg y$. For example, the configuration
constraint for the prefix shown in Fig. 1(b) is
\[
(\text{conf}_e \rightarrow \text{conf}_{e_1}) \land (\text{conf}_{e_2} \rightarrow \text{conf}_{e_1}) \land (\text{conf}_{e_2} \rightarrow \text{conf}_{e_3}) \land
(\text{conf}_{e_3} \rightarrow \text{conf}_{e_4}) \land
(\text{conf}_{e_3} \rightarrow \text{conf}_{e_5}) \land (\text{conf}_{e_4} \rightarrow \text{conf}_{e_3}) \land
(\text{conf}_{e_4} \rightarrow \text{conf}_{e_5}) \land (\text{conf}_{e_5} \rightarrow \text{conf}_{e_3}) \land
\neg (\text{conf}_{e_3} \land \text{conf}_{e_5}) \land \neg (\text{conf}_{e_4} \land \text{conf}_{e_3}).
\]
The role of the violation constraint, $\text{VIOL}$, is to express the property violation
condition for a configuration $C$, so that if a configuration $C$ satisfying this
constraint is found then the property does not hold, and any ordering of events
in $C$ consistent with $\prec$ is a violation trace. For example, for deadlock checking
$\text{VIOL}$ can be defined as
\[
\bigwedge_{e \in E \setminus E_{cut}} \bigg( \bigvee_{f \in \{ \text{pre}(e) \}} \neg \text{conf}_f \lor \bigvee_{f \in \{ \text{pre}(e) \} \setminus E_{cut}} \text{conf}_f \bigg).
\]
This formula requires for each event $e$ (including cut-off events) that some of the
direct causal predecessors of $e$ has not fired or some of the non-cut-off events
(including $e$ unless it is cut-off) consuming tokens from $\text{pre}(e)$ has fired, and thus $e$
is not enabled. This formula is already in the CNF. For example, the violation
constraint for the deadlock checking problem formulated for the prefix shown in
Fig. 1(b) is
\[
\text{conf}_{e_1} \land \text{conf}_{e_2} \land (\neg \text{conf}_{e_2} \lor \text{conf}_{e_3}) \land (\neg \text{conf}_{e_3} \lor \text{conf}_{e_1}) \land
(\neg \text{conf}_{e_2} \lor \text{conf}_{e_3}) \land (\neg \text{conf}_{e_3} \lor \text{conf}_{e_2}) \land
(\neg \text{conf}_{e_1} \lor \text{conf}_{e_3}) \land (\neg \text{conf}_{e_1} \lor \text{conf}_{e_2}).
\]
Shortest violation traces Note that in general the computed violation trace
trace can be quite long, which might make it difficult to locate the error, as the designer
has to inspect this trace in order to find and eliminate the source of the problem.
And parts of such long traces often describe incidental system activity which
is unrelated to the problem.) Thus computing shortest possible violation traces
can greatly facilitate the debugging process.

A quite obvious algorithm for computing the shortest violation trace is shown
in Fig. 3, where $\text{SAT-Assigmn}(\varphi)$ is a function computing a satisfying
assignment for a Boolean formula $\varphi$ and returning $\text{UNSAT}$ in case $\varphi$ is unsatis-
ifiable (it is usually implemented by a call to some off-the-shelf SAT solver,
input: \( \varphi \) — a Boolean formula
output: \( T \) — the shortest violation trace or UNSAT

\[
A \leftarrow \text{SAT Assignment}(\varphi)
\]
if \( A = \text{UNSAT} \)
then
\[
T \leftarrow \text{UNSAT}
\]
stop
\[
T \leftarrow \text{Extract Trace}(A)
\]
\( r \leftarrow [T] \)
\( l \leftarrow 0 \)
while \( l < r \) do
\[
t \leftarrow [(l + r)/2]
\]
\[
A \leftarrow \text{SAT Assignment}(\varphi \land \text{Threshold}_t)
\]
if \( A = \text{UNSAT} \)
then
\[
l = t + 1
\]
else
\[
T \leftarrow \text{Extract Trace}(A)
\]
\( r \leftarrow [T] \)
\[
\]

Fig. 3. An algorithm for computing shortest violation traces.

e.g., zChaff [8]). \text{Extract Trace}(A) is a function extracting the violation trace from a satisfying Boolean assignment \( A \), and \text{Threshold}_t is the threshold constraint \(| \{ e | \text{conf}_e = 1 \} | \leq t \). This algorithm uses a binary search to compute the length of the shortest trace still exhibiting the violation. If the property holds (i.e., if \( \varphi \) is unsatisfiable) then this algorithm does not have any additional overhead compared with the original model checking algorithm, but in the case of errors the SAT solver is called several times with larger formulae, and so the overhead might be quite significant. This situation is somewhat alleviated by the fact that SAT instances are very similar to each other (in fact, even the formulae of the form \text{Threshold}_t, described in detail further in this paper, change very little when \( t \) changes) and thus can be efficiently solved in the incremental mode. Moreover, the user always can terminate the execution of the algorithm and get the shortest violation trace computed so far.

What still needs describing is the construction of the formula \text{Threshold}_t, for a given \( t \). It turns out that one can exploit some problem-specific optimisations in order to significantly reduce the size of this formula as well as the computation effort required for solving the corresponding SAT instances. This is the main topic of this paper.

2 Basic translation of a threshold constraint

\( \text{Threshold}_t \) can be expressed as a pseudo-Boolean constraint \( \sum_{e \in E} \text{conf}_e \leq t \), where arithmetical operations are used instead of logical ones. The other constraints can also be converted into a similar form, and the problem can be solved by a 0 1 integer linear programming solver. However, SAT solvers tend to
Fig. 4. Implementations of a threshold constraint (a); a comparator (b), where the inputs $y_1, \ldots, y_k$ are interpreted as the binary representation of a non-negative integer (least significant digit first) and $t_1, \ldots, t_k$ is the binary representation of $t$; a counter as a balanced tree of adders (c); a $k$-bit adder $\Sigma_k$ comprising a half-adder cell and $k-1$ full-adder cells (d); and half-adder and full-adder cells (e,f).
more efficient in practice, and so in many cases it would be advantageous to express \( \text{Threshold}_t \) as a purely Boolean constraint.

A possible implementation of \( \text{Threshold}_t \) as a Boolean circuit is shown in Fig. 4(a). It consists of two parts: the counter and the comparator. The counter circuit has \( n \) inputs and \( \lfloor \log_2 n \rfloor + 1 \) outputs, and its purpose is to count the number of ones among its inputs and return the result as a binary number. The purpose of the comparator is to compare this number with a given constant \( t \).

Note that the counter circuit does not depend on \( t \) and so the corresponding part of the formula does not have to be changed between the calls to the SAT solver in the algorithm shown in Fig. 3. A possible implementation of the comparator is shown in Fig. 4(b). Note that it does depend on \( t \), and so the corresponding part of the formula has to be amended from call to call. However, the size of the comparator is just \( O(\log n) \). Thus this implementation of the threshold constraint is beneficial if the SAT solver is used in the incremental mode. The rest of this section is devoted to the counter circuit.

Fig. 4(c) illustrates an implementation of the counter as a tree of adders, where each adder is built of half-adder and full-adder cells, as shown in Fig. 4(d). A half-adder cell adds up two one-bit numbers, producing a one-bit result and a carry bit. A full-adder cell adds up two one-bit numbers and a carry from the previous cell of the adder, producing a one-bit result and a carry bit. Fig. 4(e,f) shows possible implementations of these cells.

The described circuit can be converted to a linear-size formula in the CNF, as described in Section 1. However, somewhat shorter formulae can be obtained using Boolean minimisation when translating half-adder and full-adder cells. It yields the formulae

\[
(-x \lor -y \lor -z) \land (x \lor -y \lor -z) \land (y \lor -x \lor -z) \land (-x \lor c_v \lor -z) \land (c_v \lor -x \lor -z)
\]

with 2 new variables, 6 clauses and 16 literals for a half-adder cell, and

\[
(c_v \lor -x \lor y \lor z) \land (c_v \lor x \lor -y \lor z) \land (-c_v \lor -x \lor y \lor z) \land (c_v \lor x \lor y \lor z) \land (c_v \lor x \lor -y \lor z) \land (c_v \lor x \lor y \lor z)
\]

with 2 new variables, 10 clauses and 36 literals for a full-adder cell.

It is shown in [6] that if \( n \) is a power of 2 then the resulting CNF formula for the counter contains \( 4n - 2\log_2 n - 4 \) auxiliary variables (corresponding to gate outputs), \( 16n - 10\log_2 n - 16 \) clauses and \( 52n - 36\log_2 n - 52 \) literals, i.e., even though the size of the formula is linear in the number of the circuit's inputs, the multiplicative constants hidden in this \( O(n) \) translation are quite large. Next section tries to remedy this situation by exploiting the structure of the prefix to improve the described translation.

3 Exploiting the structure of the prefix

The content of this section is the main contribution of this paper. It turns out that the structure of the prefix can be exploited to reduce the size of the counter circuit. Below, two heuristics are described, one utilising the conflicts between the events in the prefix, and the other making use of the causality relation.
Exploiting the conflicts One can observe that if $E' \subseteq E \setminus E_{\text{cut}}$ is a set of events which are in conflict with each other (i.e., $E'$ is a clique in the graph corresponding to the relation #), then no two events from $E'$ can belong to the same configuration. The configuration constraint ensures that at most one of the variables $\text{conf}_e$ corresponding to the events in $E'$ is assigned the value 1, i.e., $1 \geq |\{ e \in E' | \text{conf}_e = 1 \}| = \bigvee_{e \in E'} \text{conf}_e$, and so a single v-gate is sufficient to count the number of variables assigned the value 1.

Definition 1 (#{-cluster}). A set of events $E' \subseteq E \setminus E_{\text{cut}}$ is a #{-cluster if for all distinct events $e, f \in E'$, $e \neq f$.

Thus the non-cut-off events of the prefix are partitioned into #{-clusters, then v-gates are used to count in each #{-cluster the number of variables corresponding to its events and assigned the value 1, and a counter (hopefully, of a much smaller size) is used to count the number of outputs of these v-gates having the value 1. Since the translation of an v-gate into a Boolean expression is much smaller than the translation of a counter, one can expect reductions in the size of the resulting formula. For example, $\{e_1, e_2, e_3, e_5, e_4, e_6, e_7, e_8\}$ is a possible partition into #{-clusters of the non-cut-off events of the prefix shown in Fig. 1(b).

When partitioning the non-cut-off events of the prefix into #{-clusters, it is advantageous to make the number of such #{-clusters as small as possible. (When the number of #{-clusters is large, the size of the counter grows; in particular, for the trivial partition with each event forming its own #{-cluster the translation degrades to the one described in the previous section.) Thus one can formulate an optimisation problem of partitioning the non-cut-off events of a prefix into the smallest number of #{-clusters. Unfortunately, a decision version of this problem turns out to be NP-complete.

Proposition 1 (NP-completeness of the Partition into #{-clusters problem). Given an unfolding prefix $\pi$ and a $k \in \mathbb{N}$, the problem of deciding whether the set of non-cut-off events of $\pi$ can be partitioned into at most $k$ #{-clusters is NP-complete.

The proof is by reduction from the Partition into Cliques problem, which is known to be NP-complete [3, Problem GT15], and can be found in [6].

When computing the shortest violation trace, one does not want to spend too much effort on building the threshold constraints, as the process of building them can easily become more time consuming than model checking itself. Therefore, in the actual implementation, a fast ‘greedy’ algorithm for partitioning the set of events into #{-clusters was adopted, which is justifiable in the view of the above result. This algorithm is described in [6].

Exploiting the causality relation The method described above allowed for simplification of the threshold constraint by exploiting the conflict relation between the events in the prefix. It turns out that the causality relation can also be exploited to reduce the size of the translation even further.
**Definition 2.** Let $Cl$ and $\overline{Cl'}$ be two $\#$-clusters. $Cl \ll Cl'$ if for each event $e' \in Cl'$ there exists an event $e \in Cl$ such that $e < e'$. A sequence of $\#$-clusters $Cl_1 \ll Cl_2 \ll \cdots \ll Cl_k$ is called a $\ll$-chain.

For example, $\{e_4, e_6\} \ll \{e_7, e_8\}$ is a $\ll$-chain of the prefix shown in Fig. 1(b).

It follows from this definition that if $Cl \ll Cl'$ and an event $e' \in Cl'$ belongs to a configuration $C$ then some event $e \in Cl$ also belongs to $C$. Suppose $Cl_1 \ll Cl_2 \ll \cdots \ll Cl_k$ is a $\ll$-chain and $y_1, \ldots, y_k$ are the outputs of the $v$-gates corresponding to these $\#$-clusters. The configuration constraint ensures that in any satisfying assignment the sequence of values of $y_1, \ldots, y_k$ is non-increasing. This allows one to count the number of ones among these values much more efficiently than by a counter described in the previous section. Indeed, the encoding of the inputs is very similar to the 1-hot encoding, which can be obtained from $y_1, \ldots, y_k$ as $-y_1, y_1 \land -y_2, y_2 \land -y_3, \ldots, y_{k-1} \land -y_k, y_k$ and subsequently converted into the binary code using an encoder. A somewhat smaller circuit is shown in Fig. 5.

Thus one can partition the acyclic directed graph $G_{\ll}$ corresponding to the $\ll$ relation on the $\#$-clusters into $\ll$-chains, then build for each $\ll$-chain a circuit similar to the one shown in Fig. 5, and finally construct an adder tree similar to that in Fig. 4(c), but with the bottom layer comprised of the built counters rather than half-adders. The algorithm shown in Fig. 6 does this trying to balance the resulting adder tree. $ExtractMin(Q)$ extracts and returns a pair $(c, m) \in Q$ (where $c$ is a circuit and $m \in \mathbb{N}$ is the maximum value this circuit can output) with the minimum value of $m$, and $Add(c_1, c_2)$ constructs a circuit which computes the sum of values computed by $c_1$ and $c_2$ (i.e., an adder is put ‘on top’ of $c_1$ and $c_2$). Note that $Q$ is a priority queue and can be efficiently implemented as either a binary heap or by keeping a list of circuits for each $m$.

When partitioning $G_{\ll}$ into $\ll$-chains, it is advantageous to make the number of such $\ll$-chains as small as possible, in order to reduce the number of adders in the adder tree. Thus one can formulate an optimisation problem of partitioning
input : Q — a non-empty set of pairs \((c, m)\), where \(c\) is a circuit and \(m \in \mathbb{N}\)
output : \(c\) — a circuit

while \(|Q| > 1\) do
\((c_1, m_1) \leftarrow \text{ExtractMin}(Q)\)
\((c_2, m_2) \leftarrow \text{ExtractMin}(Q)\)
\(Q \leftarrow Q \cup \{(\text{Add}(c_1, c_2), m_1 + m_2)\}\)

/* now \(|Q|=1*/
\((c, m) \leftarrow \text{ExtractMin}(Q)\)
return \(c\)

Fig. 6. An algorithm for building a tree of adders.

\(G_\preceq\) into the smallest number of \(\preceq\)-chains. This is essentially the well-known \textit{minimum vertex-disjoint path cover} problem (zero-length paths comprising a single vertex are admissible).

This problem is NP-complete for general graphs, since checking the existence of a Hamiltonian path is equivalent to checking whether it is possible to cover the vertices of a given graph by a single vertex-disjoint path. Nevertheless, for acyclic graphs (note that \(G_\preceq\) is acyclic) it can be reduced to the maximum matching problem on a bipartite graph, and solved in polynomial time [4]. However, one should bear in mind that \(G_\preceq\) is given implicitly, and can be very large. (It is not uncommon to have an unfolding prefix with hundreds thousands events.) Therefore, using an exact algorithm for solving this problem might be either too memory demanding (if \(G_\preceq\) is built explicitly), or too slow due to the need of working with an implicitly represented graph (checking whether there is an arc between two vertices of \(G_\preceq\) is quite expensive in such a case, as one might have to traverse the whole prefix). Thus a fast ‘greedy’ algorithm for partitioning the set of \#-clusters into \(\preceq\)-chains has been designed. It is described in [6].

4 Experimental results

The proposed method has been tested with the zChaff SAT solver [8], and the popular set of deadlock checking benchmarks collected by J.C. Corbett [1] has been attempted. (For obvious reasons, only examples with deadlocks from this collection were used.) All the experiments were conducted on a PC with a PentiumTM IV/2.8GHz processor and 512M RAM.

The experimental results are shown in Table 1, where the meaning of the columns is as follows (from left to right): the name of the problem; the number of non-cut-off events in the prefix; the lengths of the first computed and a shortest violation traces; the number of \#-clusters and \(\preceq\)-chains computed by the heuristic algorithms described in [6]; the size (the number of new variables, clauses and literals) of the translation of the counter circuit for the basic translation described in Section 2 and for the improved one described in Section 3; and the time taken by the SAT solver to compute the first violation trace and the time taken by the algorithm in Fig. 3 to compute a shortest violation trace using the basic and the improved translations of the counter.
The experiments show that in many cases the first computed violation trace was much longer than a shortest one, with the results for the Sent benchmarks being particularly impressive. This confirms that in practice computing shortest violation traces can indeed greatly facilitate the debugging process.

One can see that the number of #-clusters and &lt;=-chains is usually quite small compared to the number of non-cut-off events in the prefix, and thus the reduction in the size of the formula is quite significant. It is possible to evaluate the maximum reduction which can be achieved by the improved translation over the basic one as follows. In the ideal case, all the events in the prefix would be in conflict with each other, and so the counter circuit can be implemented as a single $v$-gate. Such an implementation results in one new variable (for the gate's output), $n + 1$ clauses and $3n + 1$ literals in the corresponding CNF formula, where $n = |E \setminus E_{cut}|$. The corresponding parameters for the basic translation are given in Section 2, and the improvement ratios for new variables, clauses and literals are $(4n - 2 \log_2 n - 4)/1 \approx 4n$, $(16n - 10 \log_2 n - 16)/(n + 1) \approx 16$ and $(52n - 36 \log_2 n - 52)/(3n + 1) \approx 17\frac{1}{3}$, respectively. Thus the reduction factor for variables can grow unboundedly with $n$, whereas for clauses and literals it is bounded by $16$ and $17\frac{1}{3}$, respectively.

The improvement ratios for the benchmarks in Table 1 are plotted in Fig. 7. One can see that for the number of new variables, the reduction ratio indeed grows with the size of the prefix (though not as fast as in the ideal case), and is

<table>
<thead>
<tr>
<th>Problem</th>
<th>Prefix</th>
<th>Trace</th>
<th>Partitions</th>
<th>Translation of counter</th>
<th>Time</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>$</td>
<td>E \setminus E_{cut}</td>
<td>$</td>
<td>1st st</td>
<td>Basic</td>
</tr>
<tr>
<td></td>
<td>vars</td>
<td>cl</td>
<td>ch</td>
<td>vars</td>
<td>ch</td>
</tr>
<tr>
<td>QA</td>
<td>1663</td>
<td>24</td>
<td>4</td>
<td>30</td>
<td>9</td>
</tr>
<tr>
<td>Dac(6)</td>
<td>53</td>
<td>6</td>
<td>6</td>
<td>23</td>
<td>11</td>
</tr>
<tr>
<td>Dac(9)</td>
<td>95</td>
<td>9</td>
<td>9</td>
<td>35</td>
<td>17</td>
</tr>
<tr>
<td>Dac(12)</td>
<td>146</td>
<td>12</td>
<td>12</td>
<td>47</td>
<td>23</td>
</tr>
<tr>
<td>Dac(15)</td>
<td>206</td>
<td>13</td>
<td>15</td>
<td>59</td>
<td>29</td>
</tr>
<tr>
<td>Dr(6)</td>
<td>90</td>
<td>9</td>
<td>9</td>
<td>18</td>
<td>8</td>
</tr>
<tr>
<td>Dr(9)</td>
<td>120</td>
<td>8</td>
<td>8</td>
<td>24</td>
<td>8</td>
</tr>
<tr>
<td>Dr(10)</td>
<td>190</td>
<td>10</td>
<td>10</td>
<td>30</td>
<td>10</td>
</tr>
<tr>
<td>Dr(12)</td>
<td>276</td>
<td>12</td>
<td>12</td>
<td>36</td>
<td>12</td>
</tr>
<tr>
<td>Emr(4)</td>
<td>59</td>
<td>9</td>
<td>9</td>
<td>15</td>
<td>9</td>
</tr>
<tr>
<td>Emr(5)</td>
<td>496</td>
<td>22</td>
<td>12</td>
<td>24</td>
<td>7</td>
</tr>
<tr>
<td>Emr(6)</td>
<td>2269</td>
<td>30</td>
<td>15</td>
<td>32</td>
<td>9</td>
</tr>
<tr>
<td>Emr(7)</td>
<td>9598</td>
<td>23</td>
<td>18</td>
<td>40</td>
<td>11</td>
</tr>
<tr>
<td>Hant(15)</td>
<td>101</td>
<td>25</td>
<td>20</td>
<td>20</td>
<td>25</td>
</tr>
<tr>
<td>Hant(30)</td>
<td>201</td>
<td>51</td>
<td>51</td>
<td>51</td>
<td>51</td>
</tr>
<tr>
<td>Hant(75)</td>
<td>301</td>
<td>76</td>
<td>76</td>
<td>76</td>
<td>76</td>
</tr>
<tr>
<td>Hant(150)</td>
<td>401</td>
<td>101</td>
<td>101</td>
<td>101</td>
<td>101</td>
</tr>
<tr>
<td>Kvy(2)</td>
<td>64</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>Kvy(3)</td>
<td>4057</td>
<td>53</td>
<td>43</td>
<td>223</td>
<td>41</td>
</tr>
<tr>
<td>Kvy(4)</td>
<td>35905</td>
<td>65</td>
<td>44</td>
<td>407</td>
<td>83</td>
</tr>
<tr>
<td>Macr(1)</td>
<td>381</td>
<td>6</td>
<td>6</td>
<td>11</td>
<td>11</td>
</tr>
<tr>
<td>Macr(2)</td>
<td>385</td>
<td>8</td>
<td>8</td>
<td>26</td>
<td>7</td>
</tr>
<tr>
<td>Macr(3)</td>
<td>3312</td>
<td>10</td>
<td>10</td>
<td>36</td>
<td>9</td>
</tr>
<tr>
<td>Macr(4)</td>
<td>29545</td>
<td>12</td>
<td>12</td>
<td>44</td>
<td>7</td>
</tr>
<tr>
<td>Sotr(25)</td>
<td>176</td>
<td>34</td>
<td>3</td>
<td>40</td>
<td>3</td>
</tr>
<tr>
<td>Sotr(50)</td>
<td>201</td>
<td>59</td>
<td>3</td>
<td>65</td>
<td>3</td>
</tr>
<tr>
<td>Sotr(75)</td>
<td>226</td>
<td>84</td>
<td>3</td>
<td>90</td>
<td>3</td>
</tr>
<tr>
<td>Sotr(100)</td>
<td>251</td>
<td>109</td>
<td>3</td>
<td>115</td>
<td>3</td>
</tr>
</tbody>
</table>

Table 1. Experimental results for deadlock checking.
between two and three orders of magnitude for large benchmarks. For clauses and literals, the improvement rate also grows with the size of the prefix, and comes surprisingly close to the best possible ratio for large benchmarks. Moreover, it should be noted that since the improved translation uses a lot of multiple-input V-gates, the corresponding CNF formula has many clauses of length two, which makes the SAT instance easier for the solver.

The comparison of the running times of the algorithms shows that, except one test case, it was not too time-consuming to compute a shortest violation trace. (This is probably due to the fact that only a few benchmarks are large.) Moreover, the improved approach has a clear advantage over the basic one in terms of time. The only benchmark where computing the shortest violation trace by the improved method took significantly more time than just solving the original model checking problem was KEY(4). (Note that for MMGT(4) the increase in time was quite modest, which can be explained by the fact that the first computed violation trace was already optimal and very short.) In general, however, one can expect a significant increase in time when computing the shortest violation traces, due to the following phenomenon, related to phase transition. Let $t^*$ be the length of the shortest violation trace. If $t$ is significantly larger than $t^*$, adding the constraint $Threshold_1$ to the formula will exclude only a few satisfying assignments, and the resulting formula will not be much harder for the solver than the original one. On the other hand, if $t$ is significantly smaller than $t^*$, adding $Threshold_1$ to the formula will yield an overconstrained SAT instance which usually can be quickly proven unsatisfiable. A hard situation can occur when $t$ is close to $t^*$. In such a case, if the SAT instance is satisfiable, it often has only a small number of satisfying assignments (and thus such an assignment might be difficult to find), and if it is unsatisfiable, it might be hard to show
this. The last part of Section 1 discusses how the impact of this phenomenon can be alleviated in practice.

5 Conclusions and future work

Although performed testing was limited in scope, one can draw some conclusions about the efficiency of the proposed approach. Computing shortest violation traces can facilitate the debugging process and save a lot of designer’s time, since in many cases the first computed violation trace is much longer than a shortest one. According to the experimental results, for large problem instances it can reduce the number of new variables in the formula by two three orders of magnitude, and achieve almost optimal reduction in the number of clauses and literals, i.e., the length of the CNF formula corresponding to the threshold constraint was surprisingly close to that for a single multiple-input v-gate!

The possible directions for future research include using a Boolean minimiser to derive short formulae not only for half-adder and full-adder cells but also for adders with a small number of inputs, and exploiting the structure of the prefix to reduce the size of other pseudo-Boolean constraints encountered when dealing with various model checking problems.

Acknowledgements The author would like to thank Keijo Heljanko for fruitful discussions. This research was supported by an EC IST grant 511599 (ROBEX).

References