Using ITL and Tempura for Large Scale Specification and Simulation

Antonio Cau \(^{\dagger}\)  Hussein Zedan \(^{\dagger}\)  Nick Coleman  Ben Moszkowski

School of Computing  and Mathematical Sciences  Liverpool John Moores University  Liverpool L3 3AF, UK

Department of Electrical  and Electronic Engineering  University of Newcastle upon Tyne  Newcastle NE1 7RU, UK

Abstract

ITL and Tempura are used for respectively the formal specification and simulation of a large scale system, namely the general purpose multi-threaded data-flow processor EP/3. This paper shows that this processor can be specified concisely within ITL and simulated with Tempura. But it also discusses some problems encountered during the specification and simulation, and indicates what should be added to solve those problems.

1 Introduction

There has been a considerable debate about the use and relevance of formal methods in the development of computing systems (both software and hardware). Some claim that formal methods offer a complete solution to the problems encountered in such development. Others put the claim that formal methods are of little or no use, or that their utility is severely hindered by their cost.

However, it would be over-enthusiastic to claim that a formal development technique could provide a panacea for the problem involved in developing useful computer systems. Indeed, there are too many aspects which are not amenable to formal representation or reasoning. For example, it is hard to envisage a way in which the process of requirements elicitation could be totally formalised; it is true that some requirements can be denoted using a formal notation, and possible inconsistencies found, but the completeness of the requirements (with respect to client's intentions) cannot be formally proven.

As digital devices are deployed in a growing number of high integrity applications there is increased anxiety about the dependability of such systems. Furthermore, the rapid growth of the VLSI market has meant that manufacturers are under pressure to deliver increasingly complex, reliable and cost-effective products within a short time scale. Formal techniques have clearly had an impact on the design of safety critical systems, and have been shown to be commercially advantageous as demonstrated in the production of Inmos IMS T800 floating point unit [8]. We believe that using formal techniques in the production of systems should be viewed as a means of delivering correctness (with respect to requirements) and hence enhanced quality.

One such formal technique is ITL which has been developed over a range of years and has been applied to a various number of systems. However it has not been applied to the design of large scale hardware systems. The aim of this paper is an analysis and discussion of the benefits of the use of ITL and its associated executable language Tempura [10] for such large scale hardware systems. Our chosen hardware is a general-purpose multithreaded dataflow computer known as EP/3 (Event Processor/3) [3]. EP/3 is intended primarily as a vehicle for exploratory research in high-performance computer structures.

1.1 Related work

There have been a plethora of formalisms proposed and used in conjunction with digital system specification and verification. A complete overview of these formalisms is outside the scope of the present paper, however, in this section we will only highlight some of the well known formalism classes used in this field.

A number of hardware specific calculi have been developed and used. Some of these are supported by theorem proving or other checking tools. Barrow's seminal work on VERIFY [1] (a Prolog program for checking the correctness of finite state machines) and Milne's Circle (a calculus based on CCS) [9] for specifying and analysing circuit behaviour.

General purpose logics have been proposed. The
Boyer-Moore theorem prover is a notable example [2]. Recently, CLInc has demonstrated that Boyer-Moore can be used successfully for non-trivial hardware verification cases. Higher order logic was first used by Hanna and Daeche who developed the VERITAS theorem proving system [6]. HOL [5] is also a machine-oriented formulation of higher logic based on Church’s lambda calculus.

Algebraic specification languages have also been used; a notable example is the use of OBJ specifications with hardware. OBJ-T [13] version of the language was used to specify and test hardware building blocks. UMIST OBJ [4] has also been used to specify simple devices with theorem proving support from REVE [7].

Computer hardware description languages were the first textual descriptive techniques to be used in the design of hardware. Examples include ELLA [12] and VHDL.

Many ideas originating from reactive systems theory including temporal logics, are relevant to the specification and verification of synchronous and asynchronous digital systems. Although propositional temporal logic can be used for reasoning about hardware, interval temporal logic is particularly interesting for hardware verification.

To our best knowledge, there have not been any serious attempts to specify, verify and design a large-scale hardware system in interval temporal logic.

1.2 Paper Organisation

In Section 2 we will describe the EP/3 processor architecture and in section 3 we will present an overview of ITL and its associated programming language Tempura. The specification and simulation of EP/3 in ITL are discussed in section 4. We give our evaluations in section 5 and indicate future work.

2 The EP/3 Processor

The system used in this case-study is the Event Processor [3]. Its processing elements are based on the multithreaded principle, in which a main memory is available for explicit storage of data (known as static operands), in addition to the normal circulation of tokens (flow operands). The processors are designed for individual high speed (150MHz with \( \approx 1 \) cycle instruction execution) with the facility for assembly into a small multiprocessor array with shared memory. The latter aims at almost 100\% load-balancing efficiency by dynamic scheduling at instruction thread level.

For the purpose of this case-study, a single processing element will be described, as shown in figure 1. It consists of a circular pipeline in which the 8 units communicate via highways\(^1\). Each unit will be described below.

1: The Instruction Issue unit (Inst-Iss) receives instructions from the Cache unit. It will decode these instructions and issue these decoded instructions onto the MI field of the memory highway (Mhwy). All instructions consists of a Command Field and a Destination Field. In the case of multi-target instructions, the Inst-Iss unit will issue the instruction once for each destination, using separate cycle for each. An Interlock signal from the control highway (Zhwy) will prevent any further input into the Inst-Iss unit during this time.

2: The Alu1 unit receives executable parcels of work from the Stack unit via the processor highway (Phwy). These parcels consist of a command field PI and data operands Pda and Pdb. If the PI is a logical or arithmetic command then the Alu1 unit will calculate the low-order of the result within one cycle and send this to the Alu2 unit together with the command otherwise it will complete the high-order

\(^1\)Note: the Zhwy in figure 1 is omitted because it is connected to each unit.
of an arithmetic operation from the Alu unit and send the result onto the Mda field of the Mhw.

4: The Memory Address unit (Mem-Addr) receives from the Mhw the instruction and its flow operand. The Mem-Addr unit calculates the effective address of the static operand. This address together with the instruction and the flow operand are sent to the Memory unit.

5: The Memory unit will fetch the contents of the address received from the Mem-Addr unit and will issue them (Sdb) together with the instruction (SI) and the flow operand (Sda) onto the stack highway (Shwy).

6: The Stack unit receives complete executable parcels of work destined for the Alu units via the Phwy, but in order to buffer variations in the rate of flow between the Mhw and Phwy the Stack unit is interposed. If the Interlock signal from the Zhwy is present it will store the parcel, and wait until some future time when the Phwy is clear but nothing is present to store. If the Interlock signal from the Zhwy is present it will store the parcel, and wait until some future time when the Phwy is clear but nothing is present on the Shwy. It will then issue any stored instruction parcels. In the absence of an Interlock signal, the Stack unit will play a special role during cache and I/O operations (see the description of the I/O unit).

7: The Cache unit receives via the Phwy instructions PI. The destination field of PI is used by the Cache unit to fetch the corresponding target instruction which is sent to the Inst-Iss unit. When a target instruction is requested which is not present in the Cache unit, a stream of memory accesses must be made to fill the relevant cache block. This loading function will be performed by the I/O unit.

8: The I/O unit is an interface to an external I/O processor IOP. It accesses the Memory unit by issuing dummy instructions, together with a flow operand, onto the Mhw. These cause the Memory unit either to read out a word onto Sdb, or to write the flow operand. In the case of a read, the data issued by the Memory unit enters the Stack unit. A second function performed by the I/O unit is related to cache loading. When a target instruction is requested which is not present in the Cache unit, a stream of memory accesses must be made to fill the relevant cache block. Since the I/O unit already contains hardware for generating memory accesses, it is convenient to use the unit for this purpose. The PI field is therefore carried into the I/O unit, which contains the cache tag memory and address comparator. When a cache miss is detected by the I/O unit, the Interlock signal is asserted and several consecutive memory accesses are made onto the Mhw. The requested data is passed out of the Memory unit into the Stack unit, as before. I/O and cache operations are identified by a special tag on the Mhw and Shwy; this causes the Stack unit to ignore any such data.

I/O transfers and cache loading each involve serial operations, and are somewhat too complex to handle with hardwired control. The I/O is therefore microcoded.

3 Interval Temporal Logic and Tempura

This section describes the syntax and informal semantics of the Interval Temporal Logic (ITL) and gives the syntax of the executable part of ITL, i.e., the Tempura language. For a more succinct exposition and the formal semantics see [10].

An interval is considered to be a finite sequence of states, where a state is a mapping from variables to their values. The length of an interval is equal to one less than the number of states in the interval (i.e., a one state interval has length 0).

The syntax of ITL is defined in Table 1 where a is a static variable (doesn’t change within an interval), A is a state variable (can change within an interval), v a static or state variable, g a function symbol, p a predicate symbol.

Table 1: Syntax of ITL

<table>
<thead>
<tr>
<th>Expressions</th>
<th>Formulas</th>
</tr>
</thead>
<tbody>
<tr>
<td>exp ::= a</td>
<td>f ::= p(exp1, ..., expn)</td>
</tr>
<tr>
<td>A</td>
<td>f1 ∧ f2</td>
</tr>
<tr>
<td>g(exp1, ..., expn)</td>
<td>f1 : f2</td>
</tr>
</tbody>
</table>

The informal semantics of the most interesting constructs are as follows:

- \( \mu a : f \): the value of \( a \) such that \( f \) holds.
- \( \forall v · f \): for all \( v \) such that \( f \) holds.
- \( \text{skip} \): unit interval (length 1).
- \( f1 : f2 \): holds if the interval can be decomposed ("chopped") into a prefix and suffix interval, such that \( f1 \) holds over the prefix and \( f2 \) over the suffix.
- \( f* \): holds if the interval is decomposable into a finite number of intervals such that for each of them \( f \) holds.

Frequently used abbreviations are listed in table 2. Tempura is an executable subset of ITL, its syntax resembles that of ITL. It has as data-structures
Table 2: Frequently used abbreviations

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>$f_1 \sqcap f_2$</td>
<td>$\equiv \neg (f_1 \land \neg f_2)$</td>
</tr>
<tr>
<td>$f_1 \lor f_2$</td>
<td>$\equiv \neg (p_1 \land \neg p_2)$</td>
</tr>
<tr>
<td>$f_1 \equiv f_2$</td>
<td>$\equiv (f_1 \sqcap f_2) \land (f_1 \sqcup f_2)$</td>
</tr>
<tr>
<td>$\exists v \cdot f$</td>
<td>$\equiv \forall v \cdot \neg f$</td>
</tr>
<tr>
<td>$\bigcirc f$</td>
<td>$\equiv \text{skip} \cdot f$ (next $f$)</td>
</tr>
<tr>
<td>$\bigtriangledown f$</td>
<td>$\equiv \text{true} \cdot f$ (sometimes $f$)</td>
</tr>
<tr>
<td>$\land f$</td>
<td>$\equiv \neg \bigcirc \neg f$ (always $f$)</td>
</tr>
<tr>
<td>$\bigtriangledown f$</td>
<td>$\equiv \neg \bigtriangledown f$ (weak next $f$)</td>
</tr>
</tbody>
</table>

if $f_0$ then $f_1$ else $f_2$ $\equiv (f_0 \land f_1) \lor (\neg f_0 \land f_2)$

$A := exp$ $\equiv \Box A = exp$

more $\equiv \circ true$ (non-empty interval)

empty $\equiv \neg more$ (empty interval)

fin $f$ $\equiv \bigcirc (\land \neg f)$

while $f_0$ do $f_1$ $\equiv (f_0 \land f_1)^* \land fin \neg f_0$

repeat $f_0$ until $f_1$ $\equiv f_0$ (while $\neg f_0$ do $f_0$)

Integers and booleans and the list construct to build more complex ones. The standard operations on expressions are available like $+, -, \ast, /, \text{div}, \text{mod}, =, >, \text{or}, \text{and}, \ldots$. The basic statements are as follows with $b$ a formula without temporal operators (with the corresponding ITL construct in parentheses behind it): $f_1$ and $f_2$ ($f_1 \land f_2$), $A := exp$ ($A := exp$), more ($more$), empty ($empty$), sometimes ($\bigcirc$), always ($\Box$), true ($true$), false ($false$), if $b$ then $f_1$ else $f_2$ (if $b$ then $f_1$ else $f_2$), while $b$ do $f$ (while $b$ do $f$), repeat $b$ until $f$ (repeat $b$ until $f$). See [10] for more statements. With the define name$(p_1, \ldots, p_n) = f$ and define name$(p_1, \ldots, p_n) = exp$ one can define “procedures” and functions (recursive definitions are allowed).

4 The Specification and Simulation of the EP/3

This section describes some parts of the specification of the EP/3 in ITL. The whole specification consists of a temporal formula of about 3400 lines, so this won’t be given.

4.1 The Specification

The description of the processor from which we started to write the formal specification, consisted of what is given in section 2 plus a Pascal program that simulates the processor. The first formal specification was therefore of sequential nature, i.e., the units of the processor were sequentially composed. This however was not a faithful specification of the processor because each unit should work in parallel. Luckily the specification was such that the transformation into a “parallel” version was not so difficult. Only the interlock signal caused some difficulties. The specification of the processor is a big ITL formula of about 3400 lines. The general structure of the formula is as follows:

\[
\begin{align*}
\text{epsim}(IO) & \equiv \exists \text{Stopped}, \text{Zlocka}, \text{Zlockb}, \text{Zvalid}, \text{Zioreq}, \\
& \text{Phwy, Ishwy, Mhwy, Shwy} : ( \\
& \text{init() } \land \\
& \text{repeat(} \\
& \text{skip } \\
& \text{cache(Phwy, Zlockb } \lor \text{ Zlocka, Ishwy) } \land \\
& \text{instis(Ishwy, Zlocka, Zvalid, Zioreq, } \\
& \text{Mhwy, Zlockb, Stopped) } \land \\
& \text{alu(Phwy, Zlockb } \lor \text{ Zlocka, Zioreq, } \\
& \text{Mhwy, Zvalid) } \land \\
& \text{io(IO, Phwy, Zvalid, Zlockb, } \\
& \text{Mhwy, Zioreq, Zlocka) } \land \\
& \text{mem(Mhwy, Shwy) } \land \\
& \text{stk(Shwy, Zlockb } \lor \text{ Zlocka, Phwy) } \\
& \text{until Stopped })
\end{align*}
\]

The init() formula initializes the values of the variables. The repeat (S) until Stopped repeats “executing” the S formula until Stopped is true which is the case if an stop instruction is executed by the processor. The S formula is an $\land$ (and) formula of 7 sub-formulas which means that these sub-formulas are “executed” in parallel.

- The skip formula describes that we use an interval of two states namely the state of the processor before and after each clock cycle.
- The cache(Phwy, Zlockb $\lor$ Zlocka, Ishwy) formula describes the behavior of the Cache unit. It has as inputs the processor highway Phwy and the interlock signal which consists of the logical or of respectively the lock signal Zlocka issued by the I/O unit and the lock signal Zlockb issued by the Inst-Iss unit. This means that both units can lock the pipeline. The Cache outputs on the instruction highway Ishwy.
- The instis(Ishwy, Zlocka, Zvalid, Zioreq, Mhwy, Zlockb, Stopped) formula describes the behavior of the Inst-Iss unit. It has as inputs the instruction
highway $\text{Mhwy}$ and the lock signal $\text{Zlocka}$ from the I/O unit. It has furthermore as input the signal $\text{Zvalid}$ from the Alu unit indicating if data is valid, and the signal $\text{Zioreq}$ from the I/O unit indicating if it is allowed to output on the memory highway $\text{Mhwy}$. It issues also the lock signal $\text{Zlockb}$ for interlocking. If the stop instruction is issued the $\text{Stopped}$ signal will be on.

- The $\text{alu}(\text{Phwy}, Z\text{lockb} \lor Z\text{locka}, Z\text{ioreq}, M\text{hwy}, Z\text{valid})$ formula describes the behavior of the Alu1 and Alu2 unit. It is defined as

$$\exists \text{alu2in} \cdot (\text{alu1}(\text{Phwy}, Z\text{lockb} \lor Z\text{locka}, \text{Alu2in}) \land \text{alu2}(\text{Alu2in}, Z\text{lockb} \lor Z\text{locka}, Z\text{ioreq}, M\text{hwy}, Z\text{valid}) \).$$

So the Alu1 unit has as inputs the processor highway $\text{Phwy}$ and the interlock signal $\text{Zlockb} \lor \text{Zlocka}$. It outputs $\text{Alu2in}$ to the Alu2 unit.\footnote{The current specification of the Alu units is such that Alu1 computes the arithmetic and logical operations as opposed to the informal specification of section 2} The Alu2 unit has as further inputs the interlock signal $\text{Zlockb} \lor \text{Zlocka}$ and the signal $\text{Zioreq}$ from the I/O unit indicating if it is allowed to output on the memory highway $\text{Mhwy}$. It outputs furthermore the signal $\text{Zvalid}$ indicating if the computed data is valid.

- The $\text{io}(\text{IO}, \text{Phwy}, \text{Zvalid}, \text{Zlockb}, M\text{hwy}, Z\text{ioreq}, Z\text{locka})$ formula describes the behavior of the I/O unit. It has as inputs the stream $\text{IO}$ from the external I/O processor IOP, and the processor highway $\text{Phwy}$, and the signal $\text{Zvalid}$ from the Alu unit, and the lock signal $\text{Zlockb}$ from the Inst-Iss unit. It outputs on the memory highway $\text{Mhwy}$ and it generates the signal $\text{Zioreq}$ during this output preventing that the Alu and the Inst-Iss unit generate output on the $\text{Mhwy}$. It also generates the lock signal $\text{Zlocka}$ for interlock.

- The $\text{mem}(\text{Mhwy}, \text{Shwy})$ formula describes the behavior of the Mem-Addr and Memory unit. It is defined as

$$\exists \text{mem2in} \cdot (\text{mem1}(\text{Mhwy}, \text{Mem2in}) \land \text{mem2}(\text{Mem2in}, \text{Shwy}) \).$$

$\text{mem1}(\text{Mhwy}, \text{Mem2in})$ describes the Mem-Addr unit and has as input the memory highway $\text{Mhwy}$ and as output $\text{Mem2in}$ to the Memory unit. The Memory unit is described by the $\text{mem2}(\text{Mem2in}, \text{Shwy})$ formula and has as output the stack highway $\text{Shwy}$.

- The $\text{stk}(\text{Shwy}, Z\text{lockb} \lor Z\text{locka}, \text{Phwy})$ formula describes the behavior of the Stack unit. It has as inputs the stack highway $\text{Shwy}$ and the interlock signal $\text{Zlockb} \lor \text{Zlocka}$ and it outputs on the processor highway $\text{Phwy}$.

### 4.2 The Simulation

We will simulate the execution of a small machine code program on the processor. This program consists of two threads running in parallel on the processor, one thread subtracts from 0 the value 31 and the other thread adds 31 to 0. The machine code program is as follows:

1. QS F1, Z / X2 / X1
2. X1 S F2, F1
3. WS F2, Y1 / END
4. X2 A F3, F1
5. WS F3, Y2 / END
6. END STOP
7. Z DCW 31
8. Y1
9. Y2

The program performs the following: 1. load flow F1 with the value of Z (i.e., 31) and “goto” /X1 and /X2, 2. subtract from flow F2 the value of flow F1, 3. write the value of flow F2 to Y1 and goto END, 4. add to flow F3 the value of flow F1, 5. write the value of flow F3 to Y2 and goto END, 6. stop the computation, 7. the variable Z (with value 31), 8. the variable Y1, 9. the variable Y2. The stages of the pipeline are illustrated in figure 2 where

- A is Cache and Alu1
- B is Inst-Iss and Alu2
- C is Mem-Addr
- D is Memory
- E is Stack

and the numbers correspond to the instructions.

At stage 1 the Inst-Iss unit gets the first instruction from the Cache and because the instruction has two destinations this instruction is split into two instructions (1.1 and 1.2). It first issues the instruction with the first destination to the Mem-Addr unit (stage 2) and then the same instruction with the second destination (stage 3). Meanwhile the Mem-Addr has sent the first instruction to the Memory (stage 3) where the contents of the Z variable are fetched. The contents of the flows and the value of the Z variable are sent to the Stack unit (stage 4) and the Memory unit fetches the contents of the Z variable for the second instruction.
Figure 2: The stages of the pipeline during execution.

The Stack sends the first instruction to the Alu1 and the Cache (stage 5). The Cache fetches the destination instruction where to the Alu unit should send the result of the first instruction (this instruction is 4. X2 A F3,F1) and sends it to the Inst-Iss unit (stage 6). At stage 7 this instruction is sent to the Mem-Addr unit together with the result of the Alu unit (flow F1 is loaded with the contents of the Z variable). Meanwhile the Cache has fetched the destination instruction for the second instruction (namely 2. X1 S F2,F1) and this is sent at stage 8 to the Mem-Addr unit together with the result of the Alu1 unit (the same as the first instruction). These two “destination” instructions migrate through the Mem-Addr, the Memory and the Stack unit (stages 7-10) towards the Alu1 unit and the Cache. The same procedure as before is followed: the Cache fetches the destination instruction and the Alu unit computes the result. For the first instruction this result is “add the contents of flow F1 to the contents of flow F3 and write them to flow F3” (stage 12) and the destination instruction is 5. WS F3,Y2 END. For the second instruction this result is “subtract the contents of flow F1 from flow F2 and write them to flow F2” (stage 13). The two destination instructions again migrate through the Mem-Addr and the Memory unit. The Memory unit writes at the arrival of the first instruction (stage 13) the contents of flow F3 to variable Y2 and at the arrival of the first instruction (stage 14) the contents of flow F2 to variable Y1. The two instructions then migrate through the Stack to arrive at the Alu1 and the Cache unit (stages 15-16) where the Cache fetches the destination instruction of the first instruction, i.e., 6. END STOP and sends it to the Inst-Iss unit (stage 16). Because this is the halt instruction the processor stops. The output as generated by the simulator is shown in the appendix.

This sample machine program works fine on the processor but there is a problem. This problem is you can be observe when you execute the following program:

1. X AS F0,Y /X /X
2. Y DCW 1

This program adds 1 to flow F0 and then sends the result twice to itself (it constructs a binary tree of adding instructions).

The stages of the pipeline are illustrated in figure 3. Whenever an instruction enters the Inst-Iss (B) it will be split into 2 instructions. Because the pipeline has a length five this will eventually result in a situation (for instance stage 12) that there is “no room” in the pipeline: the instruction (i.e., 1.2.1) in the Stack (E) unit will then be put on the stack. In this example because the stack is of finite length, this will result eventually in a stack overflow. This problem should then be solved on operating system level.

Figure 3: The stages of the pipeline during execution.

5 Evaluation and Future Work

This section evaluates ITL and Tempura as vehicle for the specification and simulation of large scale systems. It also indicates future work.

5.1 Evaluation

ITL and Tempura are suitable for the specification and simulation of the EP/3, i.e., for large scale systems. But the following problems were encountered during the specification of the processor:

- In ITL/Tempura only integers, booleans and lists are available as data-types. One would like to have
the data-structures of Pascal like records, reals, files etc. Although these could be simulated using lists, it would ease usability if these data-types were explicitly expressed.

- The units of the processor communicate with each other over the highways. This is modeled with shared variables. The use of special kind of variables like channels is more appropriate. Plus allowing communication actions to be explicitly expressed.

- In normal programming languages like Pascal, a program like if $y > 0$ then $x := x + 1$ the value of $x$ increases by 1 if $y > 0$ and $x$ doesn’t change otherwise. In ITL/Tempura the information that $x$ doesn’t change has to be coded explicitly: if $y > 0$ then $x := x + 1$ else $x := x$; i.e., one has to state explicitly that a variable doesn’t change. If one has to update one memory cell, this will be a very costly operation. This is the so-called framing problem.

- Within ITL/Tempura it is hard to model timing constrains like delay or time-out. In the specification of the processor each clock cycle corresponds to a state in ITL, i.e., the length of the interval corresponds to the number of cycles of the processor. We couldn’t model properties like: the Alu unit computes the result within 1 clock cycle.

A problem that did not show up in this example is whether to extend Tempura (the executable part of ITL) with high level constructs like the “or” for non-determinism. The advantage is that more specifications become executable but a disadvantage is that the simulator becomes then very complicated. It may be better to construct a refinement tool that refines a high level specification (written in ITL) into an executable specification (written in Tempura).

### 5.2 Future work

Future work will consists in extending ITL/Tempura with more data-structures and data-types, and constructs for the description of communicating and timing constrains. Furthermore the framing problem should be solved, this will result in an increase of the speed of the simulator because an update of the memory would then only cost 1 statement instead of as many as there are memory cells. Also we will investigate the use of the PVS[11] system as refinement tool for high level specifications written in ITL to executable specifications written in Tempura.

An issue that isn’t addressed in this paper is the correctness of the EP/3, i.e., the proof that certain properties like the pipeline doesn’t “overwrite” an unit when there is for some reason “no room” in the pipeline. This kind of properties should be formalized within ITL. With the proof system of ITL these properties then can be proven.

### References

## A Simulator output of first example

The simulator has a more detailed output namely the values of the registers of the various units of the processor before the start of the cycle. The format is as follows of the output is given in table 3.

<table>
<thead>
<tr>
<th>Cycle</th>
<th>Z1</th>
<th>Z2</th>
<th>Z3</th>
<th>B1</th>
<th>B2</th>
<th>B3</th>
<th>B4</th>
<th>B5</th>
<th>B6</th>
<th>B7</th>
<th>B8</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

where NR is a positive integer, B a boolean and D one of the values 0, 1 or 2 (the destination index). The first line describes the registers comprising the Alu1, i.e., the instruction, the active, the flow and the static operand registers. The second line describes the registers comprising the Inst-Iss and the Alu2. The third line describes the registers comprising the Mem-Add. The fourth line describes the registers comprising the Memory. And the fifth and last line describes the registers comprising the Stack.

The output generated by the simulator of the most interesting cycles is given in the following table. Stage i (in figure 2) corresponds with cycle 20 + i. Variables Z, Y1 and Y2 corresponds to respectively Memory locations 00000020, 00000024 and 00000028.

### Table 3: Format of the simulator output

<table>
<thead>
<tr>
<th>Cycle</th>
<th>Z1</th>
<th>Z2</th>
<th>Z3</th>
<th>B1</th>
<th>B2</th>
<th>B3</th>
<th>B4</th>
<th>B5</th>
<th>B6</th>
<th>B7</th>
<th>B8</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Reading from Memory[00000020] the value 0000000F

### Table 4: Total Reductions

<table>
<thead>
<tr>
<th>Stage</th>
<th>Reductions</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
</tr>
</tbody>
</table>

### Table 5: Computation length

<table>
<thead>
<tr>
<th>Stage</th>
<th>Computation length</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0000000000000000</td>
</tr>
<tr>
<td>2</td>
<td>0000000000000000</td>
</tr>
<tr>
<td>3</td>
<td>0000000000000000</td>
</tr>
<tr>
<td>4</td>
<td>0000000000000000</td>
</tr>
</tbody>
</table>

Stop Instruction Executed


Cumulative Statistics

Total Cycles: 36 Processing Cycles = 34 during which Instructions Executed = 6 Mean Cycles / Instruction = 5