Application Programmer to Compiler

Next: Compiler to Hardware Up: Contracts Previous: Interface Clause

Application Programmer to Compiler

The second-stage contract negotiations are between the application programmer and the compiler. These negotiations occur while the algorithm suite is being developed. In many ways, this contract is similar to but much more detailed than the first-stage contract. The application program creates a set of machine independent algorithms. These algorithms are written in an English-like style and all data dependencies are explicitly stated. Included are a set of requirements, namely: speed, cost, accuracy, and reliability of the algorithm and hardware. These same requirements may also be applied to the compilation process itself.

The application programmer bases these estimates on estimates of a hypothetical machine. If the programmer demands too high a performance, all users will be satisfied, but no compiler will agree to the contract. If the programmer is too conservative, every compiler will agree to the terms, but end users' demands will not be met. In this way we make explicit what is currently implicit. Note also that this allows the application programmer to track decades of hardware change simply by altering performance demands without making any changes to the set of algorithms.

The algorithms and requirements are then passed to the compiler. In addition to the usual syntactical checks and translation, the compiler also uses its knowledge about the target hardware to determine whether or not the application programmer's performance requirements are feasible. If they are, the object code is generated and tagged with the various performance values. If the requirements are not feasible, then the compiler rejects the contract and notifies the application programmer of the reasons for rejection.

In general, low effort by the application programmer will usually result in a failed contract. As the application programmer expends more effort in writing efficient code and in giving the compiler more information on how portions of the code should be compiled, the more likely a successful compilation will be.

Consider the quadratic equation as an example. One way to express this equation is as:

Suppose that the application programmer does not give the compiler any information about the placement of values in memory leading to a contract failure. The compiler might respond back by suggesting that the timing requirement be relaxed. Suppose that the application programmer decides that the timing requirement not be changed. Then extra effort must be expended to help the compiler find a way to do the same equation in less time. One way to achieve this would be for the application programmer to suggest to the compiler that the values of a and b be stored in registers. Another suggestion to the compiler could be to calculate as instead of by invoking a power function. Yet another suggestion would be to inform the compiler of the various parts of the equation to calculate in parallel.

The point is that we expect the application programmer to do most of the ``dirty work'' -- not the compiler. As more and more details and suggestions are given to the compiler, the more tenable the contract becomes.

Figure 4: Stage Contract

In the Second-Stage contract (see Figure 4), the following items included:

variable latency hierarchy
the application algorithm including looping probabilities and loop bounds
data dependencies

The application program should give the compiler information of true utility. Included is a section containing suggestions for variable placements. Currently, we see two different approaches in which this can be done. The first is called the relative memory distance approach. In this approach, variables are ranked from fastest access needs down to those of slowest access needs. This is done by declaring levels so that variables of like access speed can be grouped together under a common level.

The second approach to specifying variable latencies is in terms of absolute distance as measured by elapsed wall clock time for access. For certain kinds of applications, this approach is more useful as it places definite, real-time limits on the system. However, this very precision also makes this approach more tied to current technological levels in its specification without anticipation of future improvements. We believe that this information could be changed from a suggestion to a requirement, if need be, but could be counter productive to overall speed obtainment.

Next comes the application program itself. Our goal here is to provide just the basic algorithm without supplying any unnecessary dependencies or extra ``tricks'' to circumvent the compiler. As a rule, we never reuse any variable on the left hand side of an assignment statement. Also note that each line is numbered for later use in the dependency specification. To aid the compiler in making timing and error estimates, we also require that probability estimates be included for conditional statements. If a probability is not known in advance, this should also be stated. Also, for loops, we recommend that bounds be included in the form of a triple containing: the guaranteed least number of times through the loop, the average expected number of times through the loop, and the guaranteed maximum number of times through the loop. With this information, both average and worst case timing estimates can be calculated. Finally, the application programmer is permitted and strongly encouraged to use library subroutines when possible. The idea is to keep a collection of ``best known solutions to date'' on line to help ensure maximum performance or accuracy while freeing the application programmer from having to continuously ``reinvent the wheel''. In some sense, this would be like keeping existing BLAS routines on line but it is also very different in many ways. For example, instead of having one routine to do a particular task, there would be several versions emphasizing difference precision, performance, and architectural requirements. Also, these routines would come with attached error and performance ratings so that the system software could easily match the appropriate version to meet the end user's needs. Again, these details and the fact that a collection of similar routines exist need not be known by the application programmer. Finally, the subroutines would have to pass strict requirements before being added to the system and would absolutely be forbidden from having extra ``tricks'' embedded in them. After all, it was these extra tricks that lead to the problem example described in Section 2.2.2.

Then, to make life easier for the compiler, we require that a data dependency specification be included that explicitly states which line of code is dependent on another line of code. However, some dependencies can be omitted due to their transitive nature. For example, if line 1 is dependent on lines 2 and 3 and line 2 is also dependent on line 3, we could denote this as 1 2 1 3 2 3 or alternatively as 1 [2, 3] 2 3 . However, since the dependency 1 3 is implied by the other two dependencies, it is safe to omit it and simply say that 1 2 2 3 . Furthermore, we permit the shorthand notation of combining dependencies into the form: 1 2 3 . Finally, we also permit multiple source to destination or source to multiple destinations to be expressed with ranges of lines use combinations of commas, hyphens, and square brackets. Hence, the expression 1 [2, 3] [4, 6, 8, 13] is actually shorthand for 1 2 1 3 1 4 1 6 1 8 1 13 2 4 2 6 2 8 2 13 3 4 3 6 3 8 3 13 .