-
Notifications
You must be signed in to change notification settings - Fork 31
GhostVariables
Ghost variables are variables that do not appear in the
program. They are created dynamically (i.e., during the analysis) by
abstract domains to express complex properties which typically help to
improve the expressiveness of the analysis. In Crab, memory and array
domains create ghost variables. For instance, memory domains use ghost
variables to represent the offset of a pointer or the size of an
allocated object. Array domains can represent each array cell with a
ghost variable that represents its contents. Ghost variables are also
very useful when low-level properties are modeled involving precise
bitwise reasoning (e.g., to represent the k
least or most
significant bits separately) although Crab does not provide such a
domains at the moment.
However, the implementation of ghost variables in Crab has been tricky
because there is a tradeoff between modularity and
performance. Modularity is a very important property for an abstract
domain. This is the idea that complex abstract domains can be composed
of a stack and/or products of simpler abstract domains without making
any assumptions apart from providing sound abstract
operations. However, when one of the simpler domains operates on a
ghost variable created by another domain, modularity can be
broken. Whenever the domain that created a ghost variable renames it
with another ghost variable (this can happen during binary operations
such as join, meet, etc) the simpler domain that uses that ghost
variable needs to be notified. One way of doing this is by heavily using
the abstract rename
operation which is implemented by all the Crab
abstract domains. At each binary operation, the domain that
creates ghost variables generates a new map from program variables to
their ghost variables and communicate recursively to all its dependent
domains by calling rename
. Unfortunately, this operation can be very
expensive. For instance, environment maps which are implemented by
Patricia trees do not implement this operation efficiently and their
sharing properties can be completely destroyed if too many rename
operations take place.
In Crab, we can provide three different strategies to manage ghost variables that when uses with care it can ensure modularity without performance degradation:
- Crab variable factory provides a method
get
that returns always the same variable for a given fixed pair of variable and string. With this method, we can create ghost variables from an existing variablev
and a string that defines the role of the ghost variable. For instance,get(v, ".offset")
,get(v, ".size")
, etc create ghost variables to represent the offset of a pointer, the size of an allocated object, etc. However, this solution only works under two assumptions:
a. The roles can be efficiently named. For instance, this might not be the case if we use ghost variables to represent symbolically array slices and these slices are determined by complex linear constraints which cannot be efficiently extracted.
b. This solution is sound only if the variable v
is a program
variable or if v
is another ghost variable then it cannot be
changed during the analysis.
Currently, Crab memory domains use this approach because they never act as subdomains or reduced product with the other domains. Note that this might change in the future and modularity in that case should be revisited.
- [NOT IMPLEMENTED] Avoid the use of ghost variables whenever possible. To help with that, a Crab variable has a normal type that represents the type of the variable as defined in the program CFG, and an abstract type that represents the type as seen by the abstract domain. Initially, the normal and abstract types coincide but an abstract domain can change the abstract type of a variable during the analysis.
For instance, the Array Smashing
domain is a subdomain of the
Array Adaptive
domain which in turn, it is a subdomain of the
Region
domain. The Array Smashing
domain only uses one
(numerical) ghost variable to represent the content of a whole
array. That ghost variable can be avoided if the Region
domain
changes the abstract type of a region to an array and the Array Adaptive
domain changes the abstract type of an array to a scalar
type.
- Keeping an internal map from program variables to ghost variables
and use abstract
rename
operation to communicate to other domains changes in the ghost variables. This should be used when 1 and 2 cannot be used.
- Sharing Ghost Variables in a Collection of Abstract Domains by Marc Chevalier and Jerome Feret published in VMCAI 2020. In this work, authors address exactly the same problem described above and provide a clean and elegant way to deal with ghost variables by defining the Reduced Product with Ghost Variables. Authors reported 3x slowdown wrt to a baseline implementation that provided support for ghost variables in a hacky way.