Skip to content
kevinjalbert edited this page Mar 27, 2011 · 20 revisions

#Purpose The goal of ARC - Automatically Repair Concurrency is to provide a tool that attempts to repair concurrency bugs in source code. This tool is aimed towards Java due to its maturity in concurrency and known concurrency bugs and idioms.

A genetic algorithm is used to progressively mutate the buggy program's source code to a state where it executes in a timely fashion without errors. The mutations used are all related to adding and removing synchronization to the source code. The mutations used can only effect the thread schedules, and as a side-effect the performance overhead/reduction of added/removed synchronization. Two main criteria used in determining the fitness of a mutant are the:

  • Correctness of the code -> No concurrency bugs
  • Performance of the code -> Executes as quick as possible

Essentially the end purpose is to automatically repair concurrency by minimizing the bugs and maximizing the performance.

#Features

  • Adjustable number of generations to perform
  • Adjustable number of runs per generation to perform (population)
  • Adjustable timeout period for testing phase
  • Toggle-able set of operators that can be used for mutation
  • Adjustable mutation rate
  • Adjustable crossover rate
  • Testing phase is concurrent
  • Testing is augmented through the noise making technique (random thread delays)

#Problem Definition The problem resides in the source code of a Java program that contains elements of concurrency. The following segment of example code illustrates a write to a unprotected shared variable.

obj.write ( var1 ); 

The execution of this segment might result in a data race during the execution of the program. This is caused if there are two or more threads trying to write to the value. To prevent this problem synchronization is used as a solution, this can be shown in the next segment of now protected writing to the shared variable.

synchronized ( lock ) { 
    obj.write ( var1 ); 
}

This is just one simple example of how a concurrency bug is fixed, the addition of synchronization where necessary. One problem with adding synchronization to source code is that it imposes additional overhead in the lock acquisition and thread contention. In addition, it may also be possible that adding synchronization might result in another concurrency bug called deadlock. If threads are blocking each other from proceeding, the execution halts. A fine balance between adding synchronization to ensure correctness while maintaining a sane level of performance is needed.

The problem is defined as:

Finding the best configuration of a concurrent program, that results in a highly correct and performing execution.

The problem is of a search based type, thus genetic algorithms will be used in an attempt to mutate individual versions of the source code towards the best configuration. Genetic algorithms require the following components, which are explained in the following sections:

  1. Genetic representation
  2. Mutation operators
  3. Crossover function
  4. Fitness function
  5. Terminating condition

The following Poster shows an overview of the problem and the general approach to finding a solution.

Genetic Representation

A buggy program's source code is the target of mutations. Before mutations are applied, an initial parse is performed to understand the domain of the mutations that can be applied to the source code.

The following illustrates an example, for each line there is the possibility that a mutation operator can be applied to it.

line_1 Mutation_2a
line_2
line_3 Mutation_1a
line_4
line_5 Mutation_2b Mutation_3a
line_6 Mutation_3b 
line_7 Mutation_2c

There are three mutation operators so three arrays are created to represent the mutation operators. In each array there is a bit for each place a mutation could occur. The ordering of the bits represent the location of where a mutation can occur (in order that TXL parses the source code, not necessarily in the order of line number). Thus, for this example three arrays look like:

Mutation_1   0
Mutation_2 000
Mutation_3  00

The genetic representation is just an array of bits, where a flipped bit represents an applied mutation. For this example, if the mutation was on the first bit of the second mutation operator the resulting genetic representation afterwards is then:

line_1 Mutation_2a_APPLIED
line_2
line_3 Mutation_1a
line_4
line_5 Mutation_2b Mutation_3a
line_6 Mutation_3b 
line_7 Mutation_2c
Mutation_1   0
Mutation_2 001
Mutation_3  00

This genetic representation allows for a simple encoding of the state of a mutant. The only problem is that after a mutation occurs it may result in new locations that can be mutated. Thus, after each mutation the potential number and locations of mutants might change. This requires a re-parse of the mutant to find the new set of possible mutations and locations. The following example is of the new state:

line_1 Mutation_2c
line_2
line_3 Mutation_1a
line_4
line_5 Mutation_3a_APPLIED
line_6 Mutation_1b_NEW 
line_7 Mutation_2a_APPLIED

Notice how one of the second mutation operators disappeared on line 5, the ordering of the second mutation operators has changed, and finally a new mutation operator location appeared. The way this is handled is as follows

  1. Keep track internally of the mutation operators in a unique global method (every possible mutation location is uniquely identified)
  2. When a mutation location has changed in order of being parsed, the unique id will maintain the order within the bit array.
  3. When a new mutation is found, it is assigned the correct next unique id.

This unique position id combined with the bit array storage creates a consistent genetic representation. In the following encoding, a _ represents that the corresponding mutation is not possible from this mutant:

Mutation_1  00
Mutation_2 0_1
Mutation_3  _1

Essentially the genetic representation is an array of bit arrays, where each bit corresponds to an unique id of a mutation location. The concept of keeping track of each unique mutation location allows for a unified global set of states, and allows for the application of crossover within a changing set of possible mutations.

Mutation Operators

A mutation operators is a single function that mutates the source code with a single change. The change for this algorithm is related to the concurrency aspects of the source code. Three mutation operators have been found to fix data races and deadlocks, a limited set for now. These three mutation operators deal with the addition of synchronization in the source code, these are functional mutations (correcting the source code). In addition, another set of non-function (increasing performance of the source code) mutation operators, which removes the addition of synchronization. As aforementioned, mutation operators have the effect of flipping a bit in the genetic representation. The mutations operators are broken down into the following categories.

NOTE: The mutation operators are illustrated as follows:

original source code
mutated source code

Functional

  • Synchronize an unprotected shared resource

One cause of a data race is that a shared resource is un-protected. By synchronizing around a shared resource data races may be fixed.

...
obj.write ( var1 ); 
...
synchronized ( lock ) { 
    obj.write ( var1 ); 
}
  • Expand synchronization regions to include unprotected source code

Data races can sometimes be caused if the synchronization region does not fully encapsulate access to the shared resources. Expanding the synchronization region may also fix the data race.

synchronized ( lock ) { 
    obj.write ( var1 ); 
}
obj.write ( var2 );
synchronized ( lock ) { 
    obj.write ( var1 ); 
    obj.write ( var2 );
}
  • Interchange nested lock objects

Common deadlocks occur due to the ordering of lock acquisition. By interchanging nested lock objects common deadlocks can be fixed

synchronized ( lock1 ) { 
    synchronized ( lock2 ) { 
        obj.write ( var1 ); 
    }  
}
synchronized ( lock2 ) { 
    synchronized ( lock1 ) { 
        obj.write ( var1 ); 
    }  
}

Non-Functional

  • Remove unnecessary synchronization regions

Synchronization regions create overhead due the time required in acquiring/releasing the lock and delays due to waiting for the lock. Removing unnecessary synchronization regions will improve performance.

synchronized ( lock ) { 
    obj.write ( var1 ); 
}
...
obj.write ( var1 ); 
...
  • Shrink synchronization region

Reducing the number of statements encapsulated in a synchronization region will allow the lock to be released quicker. The less time a thread holds the lock the less thread contention will exist, thus improving performance.

synchronized ( lock ) { 
    obj.write ( var1 ); 
    obj.write ( var2 );
}
synchronized ( lock ) { 
    obj.write ( var1 ); 
}
obj.write ( var2 );

Crossover Function

When a generation has completed the current best programs in respect of the fitness function will be breed with each other to provide the bases of the next set of programs to mutation. The idea here is that through crossover, the population will benefit from successful mutations and hopefully reach an optimal state quicker. The crossover function is relatively simple in the sense that the set of mutations are encoded into a genetic representation.

Given mutants A and B of the following encodings:

Mutant A Mutant B
mutation_1 _110 1_01
mutation_2 _001 01__
mutation_3 1_00 1100

To ensure that the children are all valid mutations (can only consider the valid bits from both mutatns), the encodings are reduced to shared bits:

Mutant AB_1 Mutant AB_2
mutation_1 __10 __01
mutation_2 _0__ _1__
mutation_3 1_00 1_00

Finally, all possible encoded combinations can exist from these compatible mutants(TODO have a strategy on how mutants reproduce):

Mutant C_1 Mutant C_2 Mutant C_3 Mutant C_4
mutation_1 __11 __10 __01 __00
mutation_2 _1__ _1__ _0__ _0__
mutation_3 1_00 1_00 1_00 1_00

Fitness Function

Throughout the evolution process of moving towards a possible solution for the source code, the concept of ranking each mutation is necessary. A fitness function is used to rank an instance of a mutation. For this problem there are two fitness functions, one which evaluates the functional behaviour of the mutation and a second which evaluates the non-functional behaviour.

Functional

The functional fitness function is used to rank the mutation in respect to the correctness of the source code:

[functional\ fitness(P) = \sum\limits_{i=0}^n\frac{interleavings\ without\ a\ bug}{interleavings\ tested}] [n = #\ of\ Test\ Cases]

To ensure a high-level of confidence, IBM's ConTest is used as a testing tool on the mutation program. The program is instrumented with random delays, which has the effect of forcing unusual thread schedules. By exploring more thread schedules it is possible to explore more of the paths which might lead to a bug.

Non-Functional

The non-functional fitness function is used to rank the mutation in respect to the performance of the source code:

[non-functional\ fitness(P) = \sum\limits_{i=0}^n\frac{average\ CPU\ time}{interleavings\ tested}] [n = #\ of\ Test\ Cases]

The correction of the program can lead to unnecessary amounts of synchronization. Performance of the program will degrade due to the overhead of lock acquisition and thread contention. By evaluating the non-functional fitness of the program it becomes possible to find the most efficient program mutation. Typically, the non-functional fitness will improve synchronization is reduced by applying the non-functional mutation operators.

Terminating Condition

There are two terminating conditions in place for this algorithm. One is when a successful state has been reached by maximizing the functional fitness (no more bugs detected) with a minimal non-functional fitness. The second is when a state has been reached without any improvements in sight (over successive generations of mutations the functional fitness is not improving), which then this state is minimized in respect to the non-functional fitness. The first condition results in the optimal state, while the second condition is a sub-optimal state.

#State Space

#Initial State

#Action

#Goal

#Cost

Clone this wiki locally