This repository has been archived by the owner on Oct 9, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.pandoc
271 lines (202 loc) · 11 KB
/
README.pandoc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
% MonoProc; Monotone Data-Flow Analysis for a Simple Procedural Language
% Pepijn Kokke
*In this paper I will provide a brief overview of the MonoProc system, and its
relation to the framework for data-flow analysis as described in [@NNH05].*
# Language
## Syntax
A program in MonoProc consists of a number of statements and a number of
procedure declarations. These are allowed to occur in any order.
Statement declarations have the following form, with their usual interpretations.[^bexprs]
> Stmt
> ::= "skip" ";"
> | Name "=" AExpr ";" -- assignment
> | Name "(" AExpr* ")" ";" -- call statement
> | "if" "(" BExpr ")" "then" Block
> | "if" "(" BExpr ")" "then" Block "else" Block
> | "while" "(" BExpr ")" Block
Aside from these syntactic constructs, there are two implemented forms of
syntactic suger.
The syntactic construct
> | "return" Name ";"
is transformed to an assignment to a special variable called **return**,
and
> | Name "=" Name "(" AExpr* ")" ";"
is transformed to a call statement, followed by an assignment of the *value*
of **return** to the variable in question.
Furthermore, **AExpr**s and **BExpr**s have the following form.
> AExpr
> ::= Name -- variable
> | Integer -- primitive integer
> | "-" AExpr
> | AExpr "+" AExpr
> | AExpr "-" AExpr
> | AExpr "*" AExpr
> | AExpr "/" AExpr
> BExpr
> ::= "true"
> | "false"
> | "~" BExpr
> | AExpr "<" AExpr
> | AExpr "<=" AExpr
> | AExpr ">" AExpr
> | AExpr ">=" AExpr
> | AExpr "==" AExpr
> | AExpr "!=" AExpr
Aside from the above , another constructor for **AExpr**s exists:
*null* values. While *null*-assignments are not syntactically
allowed, they are used to model function entry and exit in the analyses.
Finally, procedure declaclarations have the following form.
> Decl
> ::= Name "(" Name* ")" Block
*NB.*
A **Name** is any string of letters, digits and underscores that starts
with a lowercase letter.
[^bexprs]:
While **BExpr** are syntactically not considered statements in their own
right, they are allowed under the **Stmt** constructor in my implementation
as it greatly simplifies the implementation.
# Semantics
Semantically, the MonoProc language does not stray far from the \textsc{While}
language as defined in [@NNH05, pp. 3-5].
A notable exception is that in MonoProc the user cannot choose the name for a
function's return value. This does not, however, limit the expressiveness of the
language, as the user can assign to global values or user extra parameters to
simulate a chosen return value.
# Framework
My implementation of monotone frameworks is based on the description provided
in [@NNH05, pp. 33-97; @APA13]. Therefore, I will refer the reader to these pages
for a detailed description of data-flow analysis and monotone frameworks.
In the paragraphs below I will
1. provide a mapping from the descriptions in [@NNH05] to modules and functions in my implementation;
2. give a brief overview of the combinators that are used in the construction of framework instances;
3. discuss the deviations from [@NNH05] in my implementation.
## From Nielson, Nielson and Hankin to MonoProc {#mapping}
Basic data-flow functions as described in §2.1 (*init*, *final*, *flow*, ...) can be found in the
*Flowable*[^module-names] module.
The $\mathbf{AExp}(\_)$ function can be found in the *Available* module as $\mathit{available}(\_)$,
and is defined on any term that can contain arithmetic expressions (including arithmetic expressions).
The $\mathit{FV}(\_)$ function can be found in the *FreeNames* module as $\mathit{freeNames}(\_)$.
As with $\mathit{available}(\_)$, it is defined on any term that can contain variable names.[^used-names]
The definition of a monotone framework and of a lattice can be found in the *Analysis* module,
together with some basic functions for building monotone frameworks---a description of which will
follow below under [Constructing Monotone Frameworks](#construction).
The analyses (available expressions, live variables, ...) are defined in the *Analysis* module,
under their usual abbreviations (*AE*, *LV*, ...). These will be discussed in [Analyses](#analyses).
Several versions of the *MFP* algorithm have been implemented in the *Algorithm.MFP* module.
`mfp`:
~ Computes a pointwise maximal fixed-point---using call strings[^call-strings] as a context---and
collapses all pointwise results using the join operator ($\sqcup$).
`mfpk`:
~ As *mfp*, but allows the user to provide a parameter $k$ that is used to bound the length of the
call strings.[^max-bound]
`mfp'`:
~ As *mfp*, but it returns a pointwise analysis.
`mfpk'`:
~ As *mfpk*, but it returns a pointwise analysis.
Furthermore, several versions of the *MOP* (actually *MVP*) algorithm have also been implemented.
These can be found in the *Algorithm.MOP* module.
`mop`:
~ Computes a *meet over all (valid) paths* analysis by computing all valid paths, and
composing the transfer functions along these paths.
`mopk`:
~ As *mop*, but allows the user to provide a parameter $k$ that is used to bound the lengths
of the paths *by bounding the number of times that a path is allowed to visit a single
program point.*
*NB.*
In general, *MOP* isn't guaranteed to terminate on recursive or looping programs.
In my implementation, however, *MOP* is guaranteed to terminate on all programs, but is not
guaranteed to return correct results. More precisely, with recursive calls and loops *mopk*
will examine at most $k$ recursive calls or iterations.
In general this is not a safe extension, but for practical purposes it suffices.
As *mop* is internally based on *mopk*, this loss of guarantee of correctness also affects
my implementation of *mop*.
[^module-names]:
All modules in this paragraph can be found in the *PROC.MF* module,
which is the module containing the implementation of the monotone framework.
[^used-names]:
A new module, *UsedNames*, has also been implemented under *PROC.MF*. However,
as this module is only used during evaluation---and not analysis---this does not
seem an appropriate time to discuss it.
[^call-strings]:
In the implementation, call strings are referred to as call *stacks*, because their
implementation more closely resembles a stack (where the top element is the latest function call).
[^max-bound]:
In the implementation, plain `mfp` calls `mfpk` using `maxBound` as a value for $k$.
While this is technically still a bounded analysis in terms of call string length, `maxBound`
usually evaluates to around $2147483647$, which should suffice for most practical purposes.
The same holds for `mop`/`mopk`; however, since meeting over all paths with at most $2147483647$
repetitions of the same program point can take... rather long, I'm assuming nobody will
encounter this in practice.
## Constructing Monotone Frameworks {#construction}
Below I will discuss the functions that can be used to construct instances of monotone frameworks,
in order of relevance.
`framework`:
~ The empty monotone framework, of which all properties are left undefined. Using this will
result in error messages stating which properties you still have to provide and---if
possible---what functions you can use to do this.
*NB.* It is common to use the invocation of `framework` to provide an instance of a lattice.
`embelished`:
~ Provides the instance with a table of declarations, and precomputes the blocks in the program.
`distributive`:
~ Allows for an easy definition of the transfer functions of distributive analyses that return
sets (*AE*, *LV*, ...). It defines the transfer function in terms of a *kill* and a *gen*
function [see @NNH05, fig. 2.6].
`forwards`, `backwards`:
~ Defines the direction of the analysis. Practically, it sets the *direction* value in the instance
and precomputes the *flow*, *interflow* and *extremal values* for the analysis.
See below for a concrete definition of an instance, taken from *Analysis.AE*.
> mfAE :: Prog -> MF (Set AExpr)
> mfAE p
> = forwards p
> $ distributive killAE genAE
> $ embelished p
> $ framework
> { getI = empty
> , getL = Lattice
> { join = intersection
> , refines = isProperSubsetOf
> , bottom = available p
> }
> }
The final type of the "instance" is $\mathit{Prog} \to \mathit{MF}$, and it will in fact
only compute the actual instance upon application to a program. This is internalized in the
`analyse` functions, however, and therefore the user will rarely have to do this.
## Deviations from Nielson, Nielson and Hankin
While my implementation is often true to the description in [@NNH05], there are a few
exceptions. These will be briefly discussed below.
### Limiting MOP to MOP-k
This has been discussed [above](#mapping), and is only restated here for completeness.
### Procedure Entry and Exit Label
The most notable of these is that a procedure's entry and exit point do not receive any
explicit labels. Instead the values for *init* and *final* for the procedure's body are
used in interflow expressions.
This effectively means that the transfer functions for a procedure call and its entry
are forced to be the same function---and likewise for exit and return---which limits the
analytical capabilities of my implementation. However, the distinction between these
pairs of transfer functions was not required for the simple analyses that were implemented.
# Analyses {#analyses}
Below I will provide a brief overview of the implemented analyses.
Available Expressions (AE):
~ Available Expression analysis computes, for every program point, what expressions are
currently available, i.e. have been computed in previous execution steps but have not yet
been overwritten.
This is useful for, for instance, removing duplication of expressions.
Live Variables (LV):
~ Live Variable analysis computes, for every program point, which variables may be used in
future computations.
This is useful for, for instance, removing variables and assignments that are never used.
Very Busy Expressions (VB):
~ Very Busy Expression analysis computes, for every program point $p$, which expressions are
guaranteed to be used in all paths from $p$ up to the next assignment to any variable used
in $p$.
This is useful for, for instance, eager evaluation of very busy expressions.
Reaching Definitions (RD):
~ Reaching Definitions computes, for every program point, which assignments are known not to
affect the outcome of the program. It does this by looking for the assignments that are not
used before their next redefinition.
This is useful for, for instance, removing non-reaching definitions.
Constant Propagation (CP):
~ Constant Propagations computes, for every program point, which variables are known to have
a constant value in all paths leading up to that program point.
This is useful for, for instance, optimizing by converting variables to constant values.
# References