Victor Khomenko, School of Computing, University of Newcastle

VERification-Driven Asynchronous Design (VERDAD)

Funding bodies: EPSRC (EP/G037809/1)

Date: from 01/08/2009 to 31/10/2012.

Principal investigator: Dr Victor Khomenko

Project summary (based on the project proposal)

Abstract

This project redefines conventional asynchronous VLSI tool flows, by incorporating verification into the heart of the design process. Assertion-based verification is used to drive the synthesis procedure by analysing the behaviour of circuits to prove properties which may be exploited for optimisation. Using verification as part of the synthesis procedure represents a major paradigm shift from existing approaches where the validity of optimisations is determined statically and verification is performed after synthesis. Combining synthesis and verification will provide practical and theoretical advances in both areas extending beyond the focus of the project domain. Results of the project will be disseminated into the VLSI community in the form of tutorials and workshops with the aim of promoting asynchronous circuits as an effective solution to the problem of process variation.

Background

Verification is now a vital part of synchronous VLSI design flows. All the major EDA vendors provide assertion-based verification tools, to provide functional verification, and logic equivalence checkers, which perform structural verification by ensuring that the synthesis process maintains the original circuit specification. The formal methods community reignited the interest in asynchronous circuits, by providing a theoretical basis for reasoning about concurrency, modularity, traces, and verification. Engineers subsequently built on these foundations and are now able to generate significant circuits, both research and commercial. It is now timely for the maturing engineering and theoretical work to reconverge.

Applying optimisations to asynchronous circuits is often behaviour-changing, requiring detailed knowledge of the operation of the circuit and its environment and the effect of the optimisation on it. By incorporating verification into the synthesis procedure, the behavioural properties necessary for an optimisation to be valid may be proved, allowing a much wider range of optimisations to be applied than is currently possible. The sound formal foundations of asynchronous synthesis are likely to make exploitation of verification techniques more natural and fruitful than with clocked circuits.

This proposal is directly related to the GC3 “Moore for Less” Grand Challenge in Microelectronic Design signposted by EPSRC.

Industrial Context

Process variation is the most significant challenge currently facing the VLSI community. Parameter variations for transistors across a die may be as much as 45% in geometry and 15% in threshold voltage for 70nm technologies, in smaller geometries such variations may occur between adjacent transistors. Conventional design methodologies struggle to cope with highly variable technologies as assumptions about the relative speed of transistors are no longer valid. Typically, synchronous designs are clocked at 50% of their theoretical potential and in smaller geometries this may decrease to 25%. Asynchronous circuit styles are adaptable to process variation and are increasingly seen as a possible solution to global clock constraints. The ITRS report on Design predicts that 22% of designs will be driven by “handshake clocking” (i.e. asynchronous) in 2013, and this percentage will rise to 40% in 2020. Companies such as ARM, Boeing, Epson, Intel, Infineon, AMD, MBDA, Philips and Sun are all either making products containing asynchronous technology or have active research in the area. Start-up companies exploiting asynchronous technology include Fulcrum Microsystems, Handshake Solutions, Theseus Logic, Silistix and Situs Logic.

However, despite these advantages, industry take up, outside of these pioneering companies, is low. Reasons for the lack of interest include a lack of adequate EDA tools and the difficulty in designing efficient asynchronous systems. Achieving substantial performance increases over comparable synchronous circuits is very difficult without experience and non-standard semi-custom design flows. Haste is a commercial, and successful, EDA asynchronous synthesis tool, but is targeted at moderate performance, low power, devices. Attaining competitive performance is a sine qua non for widespread industrial acceptance.

Current Research Activity

A survey of asynchronous EDA tools commissioned for the European Commission by the ACiD working group reveals several tools capable of synthesising large-scale asynchronous systems. These tools may be categorised thus:

Handshake Circuits – The compilation mechanism of both the Haste and Balsa systems is based on van Berkel’s Handshake Circuits. A Handshake Circuit is a network of Handshake Components connected by channels. Designs are specified in a CSP-based language, which is mapped, using syntax-directed translation, into a Handshake Circuit. Each component in the graph corresponds to a language primitive of the original description. The use of handshake circuits abstracts physical implementation details away from the original description. Each handshake circuit may be implemented in a number of different circuit styles. Handshake circuits are very robust allowing the synthesis of complex circuits, e.g. processors. However, the syntax-directed nature of the compilation mechanism means that high performance is difficult to achieve without subsequent optimisation.
Communicating Hardware Processes – Several research groups have been influenced by Martin's work on process decomposition using CHP – a CSP subset for hardware specification. Algorithmic descriptions are decomposed into dynamic pipeline structures that are Quasi Delay-Insensitive, implemented using semi-custom (nonstandard) cells. CAST (Caltech) and TAST (TIMA) can synthesise large systems. CHP has been used at Caltech and Fulcrum Microsystems to develop several high-performance ICs. Process decomposition is complex and can only be automated for designs that uphold several complex properties; the onus is on the designer to understand and ensure such properties are satisfied.
Desynchronisation – In this approach, designs are specified using conventional, synchronous, HDLs and then converted into asynchronous designs. However, as circuits are specified using conventional methods, many of the advantages of employing asynchronous circuits, such as fine-grained concurrency, cannot be exploited. Therefore, desynchronised circuits struggle to match the performance of their synchronous implementations.

The theoretical underpinnings of asynchronous circuits make them a target for formal analysis, and a number of verification tools have been developed. The importance of timing assumptions has led to several tools that analyse the behaviour of circuit specifications to ensure they satisfy the constraints determined by the delay model classification. The finegrained concurrency inherent in asynchronous circuits makes modelling large circuits difficult, because of the large number of possible states, and most of the existing verification tools are used to model small to medium-sized control circuits.

There have been several attempts to create tools to integrate with the system-level asynchronous synthesis systems. The VDS group at TIMA have integrated verification into the TAST tool flow by translating CHP into the IF format and using the CADP toolkit for verification, however the size of circuits that may be verified is limited. Subsequent work at INRIA also uses CADP, but translates CHP into LOTOS, instead of IF, which increases the size of circuits that may be verified.Work at Birmingham University translates Balsa descriptions into CSP and uses FDR2 for verification. While these techniques present significant advances in attempts to incorporate verification into asynchronous design flows, all of them are post-hoc and require the designer to have significant knowledge of the verification tool being used; if non-standard properties are required, the designer must be able to express them using temporal logic, restricting their widespread adoption by the general engineering community. Furthermore, the size of circuits that may be verified is still severely limited by the state-space explosion due to concurrency, which remains a major problem in the verification of asynchronous systems.

Description of Proposed Research

Project Objectives

The focus of the proposed research is to use verification to drive synthesis and optimisation algorithms for asynchronous circuits. This work builds on existing research undertaken by the applicants into asynchronous system synthesis and optimisation using Balsa and verification tools based on Petri net unfoldings. By providing practical examples for verification tools and formal analysis for synthesis techniques, the project will advance current knowledge in both fields beyond the confines of the immediate project proposal.

In order to develop a verification system capable of handling large asynchronous circuits, much of the research will be concerned with the internals of the model checker, in particular, with the development of efficient verification algorithms oriented at asynchronous circuits and based on a novel condensed representation of state space, called merged processes. A framework will be created which will allow optimisation routines to interact with a model checker, using state-based and temporal logic predicates, to ascertain whether the causal relationships between channels will permit behaviour changing optimisations to be applied. This will allow such optimisations to be applied safely in areas where their validity cannot be determined by static analysis.

The measurable objectives of the work are:

A verification-driven asynchronous system synthesis flow.
Data structures and verification algorithms for condensed representation of state space.
Assertion-based synthesis and optimisation techniques for all stages of design flow.
Tutorials and best practice guides for designing efficient asynchronous systems.

Background

Optimisation

The syntax-directed nature of the handshake circuit compilation mechanism means that high performance is difficult to achieve without subsequent optimisation. Several attempts to optimise handshake circuits have been attempted, mainly in the form of “peep-hole optimisation” or control resynthesis, where the behaviour of clusters of components are modelled, typically using either STGs or burst-mode specifications, and the control circuit is re-synthesised. The external behaviour of the resynthesised controllers is preserved and so the operation of the circuit remains unchanged.

While resynthesis decreases the latency of control and, in control-dominated circuits, has been shown to cause a significant improvement in performance, in large systems, such as microprocessors, the effect of the optimisations is limited, as the latency of control circuitry has little bearing on the overall performance of the circuit. In such circuits however, the interaction of the control and the datapath has a great effect on the performance of the circuit. Plana et. al. demonstrated that significant improvements (x2) in the performance of large scale asynchronous circuits could be achieved by altering the specifications of control circuits rather than just their construction. In these optimisations, the original behaviour is not preserved and so they may only be applied in circumstances where changing the order of events does not cause erroneous behaviour in the circuit. For example, the introduction of eager semantics allows the control for datapath operations to be initiated before the arrival of data. This decreases the latency of many datapath operations by allowing steering components to be configured in parallel with sourcing the operands. However, decoupling the control and the data can be unsafe if the control can initiate operations that are not dependent on the data.

Further work, using a variety of techniques (not all directly relevant to this proposal), has achieved an overall x9 performance increase. Many of the techniques are known to not be generally safe and although in some cases the conditions for safe use of the optimisations can be determined within the compilation process, this is not always so, and presently some optimisations can only be performed if the designer is confident that certain properties are maintained. Further optimisations of Handshake Circuits have been suggested by Taylor and Hansen. These procedures involve redefining the structure of the handshake graph, by changing the algorithmic description, to radically alter the behaviour of the handshake circuit, for example to introduce fine-grained pipelining. Both require significant input from the circuit designer to fully exploit data-driven properties. In the former designs must be rewritten manually. In the latter the transformation is automated but the designer must input the details of external relationships between the ports of each procedure, which may be extremely complex.

Handshake Reshuffling and Process Decomposition are behaviour changing optimisations that form the basis of Martin’s CHP synthesis. Martin’s handshake reshuffling is equivalent to some of the optimisations described by Plana and cannot be guaranteed to work for arbitrary circuits. Both techniques employ reshufflings only in common situations where they are known to be valid, such as pipeline latch templates. Many of the structure changing optimisations of Taylor and Hansen are similar to Martin’s process decomposition. However, Martin’s work is based on modelling circuits as closed systems, where the environment is specified along with the circuit. In large circuits with many concurrent processes, determining the behaviour of the environment around the process being optimised is non-trivial. It is unclear to what extent Martin’s decompositions are automated and how much of the synthesis method relies on experienced designers creating pipeline style descriptions that allow the environment of processes to be easily modelled. The proposed research will develop a system that can analyse complex environments allowing optimisations to be implemented more easily.

The optimisation of circuits using information about its environment was the basis of Kapoor’s work on Restrictive Environments where the circuit along with the environment was modelled together, using DI-algebra, and the circuit synthesised using Petrify. Kapoor’s work used the environment to reduce the size of control circuits; in our research, the effect of reorganisation upon the performance and behaviour of large circuits will be studied in order to learn more about the trade-offs in the optimisation space.

Verification

The focus of the verification research will be developing a system capable of representing the state-spaces of large asynchronous circuits effectively. This representation will be related to Petri net unfoldings. Unfolding a concurrent system (represented as a Petri net or a set of communicating automata) into an acyclic (behaviourally equivalent) Petri net resembles the unwinding of a sequential transition system into its execution tree. Complete unfoldings are usually infinite, and so are unsuitable for automatic verification. Therefore, an essential feature of the unfolding approach is the use of cut-off events beyond which the unfolding starts to repeat itself, and so can be truncated without loss of information, yielding a complete prefix.

Complete prefixes are often exponentially smaller than the corresponding state graphs, especially for highly concurrent systems, because they represent concurrency directly rather than by multidimensional “diamonds” in state graphs. For example, if the original Petri net consists of 100 transitions which can fire once in parallel, the state graph will be a 100-dimensional hypercube with 2¹⁰⁰ vertices, whereas the complete prefix will coincide with the net itself. As circuits usually exhibit a lot of concurrency, but have rather few choice points, their unfolding prefixes are often exponentially smaller than the corresponding state graphs; in fact, in many cases they are just slightly bigger than the original Petri nets themselves. Therefore, unfolding prefixes are well-suited for alleviating the state space explosion problem. Unfolding based verification can be divided into two phases:

Generating a complete prefix.
Checking whether a given behavioural property is satisfied by the complete prefix.

Model checking of concurrent reactive systems is intrinsically hard, and exhibits a trade-off between the compactness of the representation of the system and resources it takes to verify behavioural properties. For example, the deadlock detection problem is PSPACE-complete for a compact (bounded) Petri net representation, but polynomial for transition system representation. However, transition system representation is often exponentially larger than the Petri net, which can make the algorithm impractical. A complete prefix offers a compromise: its size is in between those of the original specification and the corresponding transition system, while typical verification problems are NP-complete. Experimental results demonstrate that unfoldings can be very efficient for several crucial verification problems and application areas, including telecommunication systems, VLSI circuits and GALS systems. They work especially well when the system exhibits a high degree of concurrency. An important aspect of net unfoldings is their explicit representation of the fundamental aspects of concurrent behaviour: concurrency, causality and conflicts. This enables a shift from state-based reasoning and methods (typically expressed in terms of a global state transition diagram, and where the original structure of the system is completely lost) to causal partial order reasoning and methods (unfolding reflects precisely the causality and conflict structures present in the system being analysed).

Victor.Khomenko@ncl.ac.uk