Victor Khomenko, School of Computing, University of Newcastle

Design and Verification of Asynchronous Circuits (DAVAC)

Funding bodies: The Royal Academy of Engineering and EPSRC (EP/C53400X/1)

Date: from 1st of September 2005 for five years.

Principal investigator: Dr Victor Khomenko

DAVAC project summary (based on the project proposal):

Description of the Proposed Research and its Context

Abstract

The proposed research centres on synthesis and verification of asynchronous (self-timed) circuits. It is argued that asynchrony is important for future development of methods for large-scale design, and thus new synthesis and verification techniques have to be developed. They will be aimed at combating the state space explosion problem, which is the main obstacle for efficient and robust design. This problem reflects the fact that even a relatively small system specification can (and often does) yield a very large state space, requiring memory and computational power for its management beyond the effective capability of available computers. In particular, the verification of both a circuit's data and control paths will be addressed. Relevant tools will be developed and integrated into existing asynchronous design flows.

Background

Asynchronous circuits

Asynchronous (also known as self-timed) circuits (ACs) are circuits without clocks. They have been around, in one form or another, for quite some time, and recently started to gain popularity both in industry and academic circles. The often cited advantages of ACs are:

Low power consumption In traditional synchronous designs up to 40% of power is consumed by the clock. This leads to chip overheating and short battery life. The power is often wasted on unproductive clock cycles, e.g., the clock ticks even when there is no data to process. (The latter problem can be remedied by clock gating, but this increases the complexity of the circuit and requires very careful design to avoid clock skews when there are many gates on the clock path.) ACs inherently avoid this problem, as they do not perform unneeded transitions, particularly the clock pulses, and the power is used only for processing. (And if there is no data to process, the switching activity within such a circuit stops and the circuit consumes virtually no power.) Thus ACs can provide the same or better levels of power saving compared with clock gating techniques without special effort. Power saving is cited as one of the main advantages of Sharp's DDMP asynchronous multi-media processor.
Average-case rather than worst-case performance In synchronous circuits the clock period is chosen as the worst-case delay of an operation (with some margin). Thus, if an operation is completed before the end of the clock cycle, the circuit wastes time waiting for a new clock cycle to start. This problem can be alleviated by splitting the main clock cycle into sub-cycles or by partitioning pipeline stages into equal-duration logic, but this considerably increases the complexity of the design. ACs do not encounter this problem and start a new operation as soon as the previous one is completed. Though this requires completion detection from the components of the data path, careful design allows one to obtain high-speed solutions. One of the most successful examples of an asynchronous data path is Ted Williams' iterative divider that was used in a commercial microprocessor.
Low electro-magnetic emission Asynchronous circuits have much lower level of electro-magnetic emission compared with synchronous ones. They tend to be quite because they avoid unneeded transitions and spread out the switching activity without the strict period and phase relationships that are typical of synchronous circuits (where it tends to concentrate near falling and rising edges of the clock signal). This allows one to pack ACs more tightly (complicated synchronous designs often involve several clocks working at different frequencies, which may lead to crosstalk if the clocks are located too close to each other) and reduces the need for expensive electro-magnetic interference shielding in order to meet regulatory requirements imposing limits on the the maximum energy that can be emitted at a given frequency. Reduced electro-magnetic interference was cited as a main motivation for Philips to include asynchronous microcontroller chips into commercial pagers. In addition, the absence of a regular clock pattern makes ACs advantageous for security applications (preventing the eavesdropping device from using clock pulses as the frame of reference), e.g., for devices reading credit cards.
No problems with clock skew and time closure Integration of several synchronous IP cores designed, perhaps, by different manufactures, is often a difficult problem, particularly if they use different clock frequencies. Asynchronous interfaces can be used to solve this problem, providing the `plug and play' possibility for large systems on a chip. The capability to `absorb clock skew variations inherent in tens-of-million transistor class chips' was cited as the key reason for Sun Microsystems to incorporate an asynchronous FIFO into the UltraSPARC IIIi design.
Robustness Asynchronous circuits tend to be more robust than synchronous ones because they do not encounter problems with the clock skew and the related subtle issues. Moreover, ACs are fundamentally more tolerant of process, voltage and temperature variations. However, there is a possibility to trade off this robustness for improved speed and area of the circuit, by introducing local timing assumptions about the relative delays of gates, which can be enforced by transistor sizing etc.

ACs are increasingly seen by industry as a way to overcome some of the difficulties, particularly in the applications where productivity and time-to-market is of main concern. For example, Philips, whose products are critical to these demands, is now the world leader in the exploitation of asynchronous design principles. Other microelectronic giants such as Intel, Sun, IBM and Infineon follow the trend and gradually allow some of their new products involve asynchronous parts. A smaller `market niche' companies appear to be successful in down-streaming the results of the recent investment in asynchronous design methods (e.g., Theseus Logic's Null-Convention Logic and and Fulcrum's PivotPoint).

The International Technology Roadmap for Semiconductors (ITRS 2003) cites asynchronous designs as a possible solution for some of the problems impairing contemporary large-scale circuits:

At 65nm and below, communication architectures and protocols for on-chip functional processing units will require significant change from today's approaches. As it becomes impossible to move signals across a large die within one clock cycle or in a power-effective manner, or to run control and data flow processes at the same clock rate, the likely result is a shift to asynchronous (or, globally asynchronous and locally synchronous (GALS)) design style. In such a regime, islands of self-timed functionality communicate via network-oriented protocols. This matches requirements of system-level power optimisation schemes, like the avoidance of switching large clock nets at the maximum chip frequency. (page 19) [. . . ]

Minimum clock periods scale as roughly 12 FO4 inverter delays (approximately 170 the CV/I metric in the PIDS Chapter) for high-speed designs while die sizes remain constant; across-chip communication hence requires increasingly many clock cycles. [. . . ] However, clock distribution in purely synchronous designs can account for over 40% of dynamic power and is subject to increasing stress (for example, shielding resource requirements, limits of edge rates and jitter, and parametric yield loss due to variability-induced skew). As a result, there is a clear trend to more robust and power-efficient hybridisations of synchronous and asynchronous designs. (page 21)

However, the advantages listed above do not come for free: asynchronous circuits are more difficult to design than synchronous ones. The complexity of their synthesis and verification is widely recognised as a major stumbling block in this key area of system design. One way of coping with the complexity problem is to use formal methods supported by computer aided synthesis and verification tools.

Signal Transition Graphs (STGs) is a formalism widely used for describing the behaviour of asynchronous control circuits. Typically, they are used as a specification language for the synthesis of such circuits or obtained by translation from some high-level hardware description language. STGs are a class of interpreted Petri nets, in which transitions are labelled with the names of rising and falling edges of circuit signals. Circuit synthesis based on STGs involves: (i) checking the necessary and sufficient conditions for the STG's implementability as a logic circuit; (ii) modifying, if necessary, the initial STG to make it implementable; and (iii) finding appropriate boolean covers for the next-state functions of output and internal signals and obtaining them in the form of boolean equations for the logic gates of the circuit (logic synthesis). One of the commonly used STG-based synthesis tools, Petrify, performs all of these steps automatically, after first constructing the state space of the initial STG specification. A vivid example of its use was the design of many circuits for the Amulet-3 microprocessor. Since popularity of this tool is steadily growing, it is very likely that STGs and Petri nets will increasingly be seen as an intermediate (back-end) notation for the design of large controllers.

While the state-based approach is relatively simple and well-studied, it suffers from the combinatorial state space explosion problem. That is, even a relatively small system specification may (and often does) yield a very large state space. Despite being finite, it requires memory and computational power for its management beyond the effective capability of available computers. This puts practical bounds on the size of control circuits that can be synthesised using such techniques, which are often restrictive, especially if the STG models are not constructed manually by a designer but rather generated automatically from high-level hardware descriptions. (For example, designing a control circuit with more than 20-30 signals with Petrify is often impossible.)

The state space explosion is a consequence of the fundamental fact that even the simplest verification problems are computationally very hard (e.g., deadlock detection is PSPACE-complete). However, in many practical applications one often can complete the task using various heuristics. To help cope with the state space explosion problem a number of techniques have been proposed:

Direct mapping The STG specifying a circuit is element-by-element mapped onto a circuit mimicking its behaviour. This altogether avoids building the state space of the circuit specification.
Restriction of the specification class The class of Petri nets underlying the STG is structurally restricted (e.g., to marked graphs or free-choice nets) so that it can be handled by specialised efficient algorithms.
Condensed representation of the state space The state space is represented in a condensed way, e.g., using Binary Decision Diagrams or Petri net unfoldings.

The first technique, although relatively simple and efficient, often yields low-quality circuits, with large area and low performance, which are often inferior to traditional synchronous circuits. The second technique also has a number of disadvantages; in particular, the restricted subclasses do not allow one to express some commonly encountered circuit operations, e.g., marked graphs cannot express choices, and free-choice nets cannot naturally express controlled choices and arbitration.

The last technique in the form of Binary Decision Diagrams (BDDs) has for long been employed in the areas of model checking and circuit synthesis. However, although BDDs do provide significant compression of the state space, they are still prone to state space explosion. Our experiments have shown that the BDD-based computation of the state space of even relatively small STGs can easily overflow the available memory. Thus it makes sense to try other compression techniques, which exploit the features commonly encountered in practical STGs.

One of the most promising techniques for compressing state spaces is based on McMillan's unfoldings, where the entire state space of a circuit is represented implicitly, using a finite acyclic Petri net - called unfolding - representing system's actions and local states (and the global states are represented as maximal sets of mutually concurrent local states). According to our experiments, such a representation is often much more efficient than that based on BDDs. This can be explained by the fact that practical STGs usually contain a lot of concurrency but relatively few choices, which is the ideal case for unfoldings. In fact, in many of the performed experiments the built unfoldings were not much bigger then the STGs themselves. As a result, unfolding-based methods had clear advantage both in terms of memory usage and running time.

STG unfoldings

There are two basic advantages of the unfolding approach when compared with other state space compression techniques. First, many instances of practically relevant problems can be solved much more efficiently than by using, for example, exhaustive exploration of the state space or BDDs. Second, by explicitly representing causality, concurrency and conflicts, unfoldings yield a direct visual representation of complex concurrent behaviours. (This is in a sharp contrast with BDDs and explicit representations of state graphs: the former are completely unintelligible, and the latter are hard to visualise due to their large sizes and the tendency to obscure causal relationships and concurrency between the events.)

This is particularly useful for the debugging and development process, especially at its early stages. In the area of asynchronous VLSI design, where STGs are rapidly gaining popularity as system behaviour notation, the application of unfoldings is also very promising and is being intensively researched. In particular, unfoldings have already demonstrated their potential for model visualisation and effective detection and resolution of state encoding conflicts, as well as for deriving Boolean equations implementing the circuit's signals in complex-gate synthesis.

Since its introduction, the unfolding-based approach has undergone several improvements and extensions, and is steadily gaining more power and versatility to deal with ever more complex and varied applications. As a result, it has reached a point where one can make a concerted effort to realise its full potential, and this proposal identifies and addresses a number of areas where unfolding-based verification and synthesis has not yet been developed at all, or not to a satisfactory degree. In particular, our resent research has shown that many verification and synthesis problems for unfoldings can be formulated in terms of Boolean satisfiability (SAT) and very efficiently dealt with by existing SAT solvers.

Project goal and aims

The proposed work will contribute to a long term goal of our research, which is the development of techniques and methods for the design and verification of ACs based on structures representing causality and concurrency in system's behaviour, rather than on the global state-space view of it. And, as a consequence, to make the unfoldings as usable as the state graphs. In concrete terms, the project's goal is to develop new theories and algorithms based on the unfolding approach. The expected outcomes will allow synthesis of ACs and verification of their key behavioural properties at early stages of their development, with greater efficiency than what could be achieved with state traversal techniques. The planned tools will also provide the designer with compact and informative feedback about defects in the circuit specification, enabling its efficient debugging. The specific objectives of the project are (in order of priority):

(a) To develop efficient synthesis techniques for ACs based on unfoldings.
(b) To develop efficient verification techniques to deal with data-intensive ACs.
(c) To provide a suite of algorithms for the process of automatic synthesis and verification of ACs.
(d) To provide links to existing Hardware Description Languages (HDLs).
(e) To conduct extensive case studies and test experiments in the area of VLSI circuits.

Victor.Khomenko@ncl.ac.uk