Skip to content
davidkelk edited this page Oct 13, 2011 · 20 revisions

Things to consider:

  • If no improvements are made in phase 1 (bug fixing), should the algorithm proceed to phase 2 (optimization) or stop? Is there any point in attempting to optimize a program that doesn't work correctly?
  • Should the user have the option to run phase 1 only, or phase 2 only? Say the program works and they want to only try optimizing it. (Or time is short and they only want to try and bug fix it.)
  • We should update our terminology. We are really using an Evolutionary Programming approach: EP is mutation driven, doesn't use crossover and every parent passes a child into the next generation. (http://en.wikipedia.org/wiki/Evolutionary_programming)

#TODO

  • When mutated code is compiled, indicate whether it succeeded or failed
  • Capture the Java classpath
  • Somehow differentiate directory structure (/generation/member/...) for both phases
    • Save project in pristine state AND bug fixed state AND optimized state

#FUTURE WORK

  • Make testing phase concurrent
  • Add more stopping criteria to the GA (Time, no progress)
  • Increase functionality of the TXL operators
  • Recognize more Java project types (Maven?)
  • Allow phases 1 and 2 to operate independently of each other

#Purpose The goal of ARC - Automatically Repair Concurrency is to provide a tool that attempts to repair some concurrency bugs in Java source code. Java was selected due to its mature concurrency libraries, idioms and projects with known concurrency bugs.

ARC operates in two stages. The first attempts to fix concurrency bugs, the second attempts to improve the performance of the concurrent code. A genetic algorithm is used to progressively mutate the buggy program's concurrent code blocks to a state where it executes in a timely fashion without errors. Two main criteria are used to determine the fitness of a mutant:

  • Correctness of the code -> No concurrency bugs
  • Performance of the code -> Executes as quickly as possible

#Features

  • Uses tools with known track records: Python, TXL, ConTest, ConMan, Java and Ant
  • Genetic algorithm (GA) runs for a user-defined number of generations
  • User-definable parameters:
  • Population
  • Mutation rate
  • Mutation operators used
  • Timeout for detecting deadlocks
  • Testing is augmented through a noise making technique (random thread delays)

#Problem Definition Concurrent Java programs are hard to debug. A small number of interleavings of the concurrent code may cause the problem, making it hard to verify, find and fix. This segment of code illustrates a concurrency bug: Writing to a unprotected shared variable:

obj.write ( var1 ); 

Note that this bug is specific to concurrent programs. It doesn't occur in programs that use only a single thread. This segment might result in a data race during the execution of the program if there are two or more threads trying to write to the variable. Which one writes first is random. To prevent this problem synchronization is used to control access to the variable:

synchronized ( lock ) { 
    obj.write ( var1 ); 
}

This is a simple example of how a concurrency bug is fixed: The addition of synchronization where necessary. One problem with adding synchronization to source code is that it imposes additional overhead in the lock acquisition and thread contention. Acquiring the lock takes time and while the lock is held, no other thread can access the locked code. Adding synchronization might result in another concurrency bug, deadlock. If threads are blocking each other from proceeding, execution halts. A fine balance between adding synchronization to ensure correctness while maintaining a high level of performance is desirable. This is what ARC attempts to achieve.

The problem is defined as:

Finding the best configuration of a concurrent program, that results in the highest possible correctness and performance.

Search-Based Software Engineering (SBSA) techniques are often used to tackle problems of this type. [Cite Arcuri and Weimer et. al.] Genetic algorithms (GAs) are used to guide the evolution of the concurrent code towards the most correct configuration. In the second step the GA attempts to optimize the execution of the concurrent parts of the source code without sacrificing correctness. Genetic algorithms consist of a number of parts:

  1. Genetic representation of the problem
  • A member is a project with mutations applied to it
  1. Mutation operators that act on the member
  • Apply mutation operators to the source code of the project in such a way that it still compiles
  1. Fitness evaluation of the candidate fixes
  2. Terminating condition for the algorithm

###Algorithm Overview _The following Poster gives a high-level overview of the problem and the general approach to finding a solution.

High-Level Overview:

Phase 1: Bug Fixing:

  1. Parse the project's source to find the concurrent code
  2. Create an initial population of proposed solutions and mutate each one once
    • Mutate each project once or mutate each Java file in the project once?
  3. Run each member (solution) a number of times and evaluate how it does in terms of: Successful executions, deadlocks detected, data races detected (and others?)
  4. If one of the members runs without errors, the fix has been found. Proceed to step 7.
  5. Based on the results of 3, choose an appropriate mutation operator and apply it to each solution
  6. If search time remains, return to step 3, otherwise stop.

Phase 2: Optimization:

  1. As in step 1, parse the modified source to find the concurrent code
  2. Select the subset of the operators from phase 1 useful for optimization
  3. Create an initial population of proposed solutions and mutate each one once
  • Any mutated project that decreases correctness is discarded
  1. Run each solution a number of times and evaluate it's performance
  2. If one of the solutions meets a predefined performance goal, return it and stop
  3. If time remains, return to step 9. Otherwise, return the fixed program with the best performance found so far

Genetic Representation

A buggy project's concurrent source code is the target of mutations. Before any fixing can begin, all of the concurrent code must be identified. In the example below, lines of Java code are referred to as line_1, line_2, ... _ Concurrent lines are identified by the presence of, Mutation_OP... In english, "Line_1 has concurrent code in it because Mutation_OP_2 can operate on it. As mutation operation 2 can also operate on lines 5 and 7, we differentiate each instance of operation 2 by calling them spots A, B and C. Line 2 has no concurrent code."

line_1     Mutation_OP_2_spot_A
line_2
line_3     Mutation_OP_1_spot_A
line_4
line_5     Mutation_OP_2_spot_B, Mutation_OP_3_spot_A
line_6     Mutation_OP_3_spot_B 
line_7     Mutation_OP_2_spot_C

There are three mutation operators so three arrays are created to represent the mutation operators. In each array there is a bit for each place a mutation could occur. The ordering of the bits represent the location of where a mutation can occur (in order that TXL parses the source code, not necessarily in the order of line number). Thus, for this example three arrays look like:

TODO: Kevin update this with new representation.

Mutation_1   0
Mutation_2 000
Mutation_3  00

The genetic representation is just an array of bits, where a flipped bit represents an applied mutation. For this example, if the mutation was on the first bit of the second mutation operator the resulting genetic representation afterwards is then:

line_1     Mutation_OP_2_spot_A_APPLIED
line_2
line_3     Mutation_OP_1_spot_A
line_4
line_5     Mutation_OP_2_spot_B, Mutation_OP_3_spot_A
line_6     Mutation_OP_3_spot_B 
line_7     Mutation_OP_2_spot_C

TODO: Kevin update this with new representation.

Mutation_1   0
Mutation_2 100
Mutation_3  00

This genetic representation allows for a simple encoding of the state of a mutant. The only problem is that after a mutation occurs it may result in new locations that can be mutated. Thus, after each mutation the potential number and locations of mutants might change. This requires a re-parse of the mutant to find the new set of possible mutations and locations. The following example is of the new state:

line_1     Mutation_OP_2_spot_A_APPLIED
line_2
line_3     Mutation_OP_1_spot_A
line_4
line_5     Mutation_OP_3_spot_A
line_6     Mutation_OP_3_spot_B 
line_7     Mutation_OP_2_spot_B, Mutation_OP_1_spot_B

Notice how one of the second mutation operators disappeared on line 5. The ordering of the second mutation operators has changed: As 2_B has disappeared, 2_C on line 7 becomes 2_B. Finally, a new mutation operator location appeared on line 7, 1_B. The way this is handled is as follows:

TODO: Kevin update this with new representation.

  1. Keep track internally of the mutation operators in a unique global method (every possible mutation location is uniquely identified)
  2. When a mutation location has changed in order of being parsed, the unique id will maintain the order within the bit array.
  3. When a new mutation is found, it is assigned the correct next unique id.

This unique position id combined with the bit array storage creates a consistent genetic representation. In the following encoding, a _ represents that the corresponding mutation is not possible from this mutant:

TODO: Kevin update this with new representation.

Mutation_1  00
Mutation_2 0_1
Mutation_3  _1

Essentially the genetic representation is an array of bit arrays, where each bit corresponds to an unique id of a mutation location. The concept of keeping track of each unique mutation location allows for a unified global set of states, and allows for the application of crossover within a changing set of possible mutations.

Mutation Operators

A mutation operator is a function that makes one chance to the concurrent code.

TODO: Update operator list

Three mutation operators have been found to fix data races and deadlocks, a limited set for now. These three mutation operators deal with the addition of synchronization in the source code, these are functional mutations (correcting the source code). In addition, another set of non-function (increasing performance of the source code) mutation operators, which removes the addition of synchronization. As aforementioned, mutation operators have the effect of flipping a bit in the genetic representation. The mutations operators are broken down into the following categories.

NOTE: In the source the before and after effects of a mutation look like:

#Original:

  synchronized (lock1) { 
    ...
  }

#Mutated:

  /* MUTANT : "ASAS (Added Sync Around Sync)" */
  synchronized (this) {
    synchronized (lock1) { 
      ...
    }
  }
  /* MUTANT : "ASAS (Added Sync Around Sync)" */

Functional Mutations

  • Synchronize an unprotected shared resource

One cause of a data race is that a shared resource is un-protected. By synchronizing around a shared resource data races may be fixed.

...
obj.write ( var1 ); 
...
synchronized ( lock ) { 
    obj.write ( var1 ); 
}
  • Expand synchronization regions to include unprotected source code

Data races can sometimes be caused if the synchronization region does not fully encapsulate access to the shared resources. Expanding the synchronization region may also fix the data race.

synchronized ( lock ) { 
    obj.write ( var1 ); 
}
obj.write ( var2 );
synchronized ( lock ) { 
    obj.write ( var1 ); 
    obj.write ( var2 );
}
  • Interchange nested lock objects

Common deadlocks occur due to the ordering of lock acquisition. By interchanging nested lock objects common deadlocks can be fixed

synchronized ( lock1 ) { 
    synchronized ( lock2 ) { 
        obj.write ( var1 ); 
    }  
}
synchronized ( lock2 ) { 
    synchronized ( lock1 ) { 
        obj.write ( var1 ); 
    }  
}

*TODO: Add the rest

Non-Functional

  • Remove unnecessary synchronization regions

Synchronization regions create overhead due the time required in acquiring/releasing the lock and delays due to waiting for the lock. Removing unnecessary synchronization regions will improve performance.

synchronized ( lock ) { 
    obj.write ( var1 ); 
}
...
obj.write ( var1 ); 
...
  • Shrink synchronization region

Reducing the number of statements encapsulated in a synchronization region will allow the lock to be released quicker. The less time a thread holds the lock the less thread contention will exist, thus improving performance.

synchronized ( lock ) { 
    obj.write ( var1 ); 
    obj.write ( var2 );
}
synchronized ( lock ) { 
    obj.write ( var1 ); 
}
obj.write ( var2 );

*Todo: Add the rest

Fitness Function

To evolve a fix for concurrency bugs, the proposed fixes must be evaluated and ranked. In the GA a fitness function is used to rank each proposed solution after the mutation step. Separate fitness functions are used for the fixing (functional) and optimization (non-functional) steps.

Functional

The functional fitness function is used to rank GA members in respect to the correctness of the source code:

[functional\ fitness(P) = \sum\limits_{i=0}^n\frac{interleavings\ without\ a\ bug}{interleavings\ tested}] [n = #\ of\ Test\ Cases]

To ensure a high-level of confidence, IBM's ConTest tool is used to evaluate each mutated program. ConTest inserts random delays in to the code. These delays increase the chances of unusual thread schedules occurring. By exploring more thread schedules it is possible to explore more of the paths which might lead to a bug.

Non-Functional

The non-functional fitness function is used to rank the mutation in respect to the performance of the source code:

[non-functional\ fitness(P) = \sum\limits_{i=0}^n\frac{average\ CPU\ time}{interleavings\ tested}] [n = #\ of\ Test\ Cases]

The correction of the program can lead to unnecessary amounts of synchronization. Performance of the program will degrade due to the overhead of lock acquisition and thread contention. Phase two attempts to improve performance by attempting to shrink or remove unnecessary synchronization. Note that loss of correctness is not acceptable during this phase.

Terminating Condition

There are two terminating conditions in place for this algorithm. The most desirable is when a fix is found in phase 1 and optimizations that don't impact correctness are found in phase 2. the second is when no improvements in correctness are found. After so many attempts the algorithm quits, indicating no (or minimal) improvements were made.

#State Space

TODO: Kevin update from there to the end of the document with the new non-0s representation.

The state space of this problem is quite interesting due to the changing state situation. Initially the source code is parsed for the set of possible mutation locations. These mutation locations are then used to encode the program into the bit arrays.

For example, consider a situation where there are three mutation operators and three possible mutation locations.

[possible\ locations\ to\ mutate\ = \sum\limits_{i=0}^n #\ bits\ in\ i] [n = #\ of\ mutation\ operators]

There are 9 possible locations that can be mutated assuming all the locations have not been mutated. Considering that each of the mutations can be applied in any combination we need to consider all the combinations of the mutation locations.

[possible\ states\ = possible\ locations\ to\ mutate\ ^ 2]

From this, there are 81 possible states, which takes 9 mutations to achieve due to only one mutation at a time. This assumes, that the number of mutation locations does not change after a mutation occurs. This is where the unusualness of the state space comes from. When a mutation occurs, it becomes possible that the number of mutation locations lowers/raises or stays the same.

From this an unusual situation arises, essentially the state space grows or shrinks on every mutation. There is no way to quantify this difference as a change in the state space can actually loop back to the same state. Additionally, it is difficult to understand the number of mutations that can occur (it is possibility infinite).

[all\ possible\ states\ = \sum\limits_{i=0}^n (\ possible\ locations\ to\ mutate\ *\ n\ )\ ^ 2 ] [n = #\ of\ mutation\ actions]

#Initial State The initial state of a program is one without any mutation applied to it. An initial parse of the program will present all the possible mutations and the initial genetic representation. This initial genetic representation is where each of the bits for the mutation operator arrays are set to 0.

#Action An action in this algorithm is when a mutation is applied to the program's source code. The mutation will apply a single mutation operator to the source code, which results in a single change in respect to concurrency. When the mutation has been applied, the source code must be re-parsed so that the new genetic representation can be found. The action of the mutation is reflected in the genetic representation by flipping the corresponding bit.

#Goal The mutants are evaluated using the two defined fitness functions (functional and non-functional). When a state has been found that satisfies the functional fitness function (no more bugs left, or no progress has been made in a few generations), it is then used to find a state that retains the functional fitness and minimizes the non-functional fitness. The product of this approach is then the optimal state that maximizes the function fitness and minimizes the non-functional fitness.

#Cost The cost of a solution is the number of mutations required from the original state to reach the goal state. An improvement can also be made on the cost, if there are any mutations which reverse another mutations then these are able to be canceled out.

[cost\ of\ path\ = number\ of\ mutations\ -\ (\ cancelled\ pairs\ of\ mutations\ *\ 2\ )\ ]

This results in the quickest cost/path from the initial state to the goal state. A lower costing path is preferred over a higher costing path.

Clone this wiki locally