Skip to content

Writing Analyses

Daniil Tiganov edited this page Dec 14, 2021 · 1 revision

This page describes how to write analyses for SWAN. We expect those who want to write analyses for SWAN in its current state to be familiar with static analysis. However, we provide some requisite basic information here, too.

Note: We use the terms method and function interchangeably here and throughout SWAN because all Swift methods are represented as functions in SWAN.

SWAN uses Synchronized Pushdown Systems (SPDS). Therefore, to write more advanced analyses for SWAN, you will need to be familiar with static analysis concepts and SPDS. The following resources should be helpful in learning more about SPDS.

The implementation of SPDS is called Boomerang. Boomerang has two types of on-demand queries: forward (where does a value flow to?) and backward (where does a value come from?). You can use it for taint analysis, which finds dataflow from sources (user-controlled information) to sinks (places where user-controlled information can potentially have negative effects if not properly sanitized with sanitizers). "Do values returned from method X (source) ever flow to method Y (sink) without first flowing through method Z (sanitizer)?"

The implementation of Weighted Pushdown Systems (WPDS) is called IDEal. This is basically an analysis driver for Boomerang. It is designed for typestate analysis, which tracks the state of a value through the program. You can define state transitions with a finite state machine, where edges represent method calls on the value.

The analysis is on-demand and not whole-program. Therefore, you must know exactly what you are looking for in the program when writing an analysis. Canonical SWIRL converts 1:1 to SPDS rules, and therefore we use SWIRL mutations in coordination with Boomerang queries to achieve our desired analysis.

Use driver.jar to analyze the SIL in the swan-dir/. You can use -h to view the driver options.

Taint Analysis

We'll use the following Swift example for this section (based on tests/examples/Simple).

1: func source() -> String { return "I'm tainted" }
2: func sink(sunk: String) { print(sunk) }
3: let sourced = source()
4: sink(sunk: sourced)

Use the -t option and provide a taint analysis JSON specification file like the following:

[
  {
    "name": "testing",
    "description": "",
    "advice": "",
    "sources": [
      {
        "name": "test.source() -> Swift.String",
        "description": "generic source"
      }
    ],
    "sinks": [
      {
        "name": "test.sink(sunk: Swift.String) -> ()",
        "description": "generic sink"
      }
    ],
    "sanitizers": [
      {
        "name": "test.sanitizer(tainted: Swift.String) -> Swift.String",
        "description": "generic sanitizer"
      }
    ]
  }
]
java -jar driver.jar -t taint-spec.json swan-dir/

JSON Specification Format

The specification JSON is an array of specifications of the following form:

  • name: Name of the specification.
  • description: What issues the specification is supposed to find.
  • advice: How to fix issues the specification finds.
  • sources: Array
    • name: Method name of the source.
    • description: Why this method is a source.
    • regex: (optional) true if you want to use regex for the source name.
      • See tests/testing-schemes/custom-spec-regex for an example.
  • sinks: Array
    • name: Method name of the sink.
    • description: Why this method is a sink.
    • regex: (optional) true if you want to use regex for the sink name.
    • args: (optional) Integer array specifying which arguments should not come from sources.
      • See tests/testing-schemes/custom-spec-args for an example.
  • sanitizers: (optional) Array Only works with path-tracking enabled (-p).
    • name: Method name of the sanitizer.
    • description: Why this method is a sanitizer.

If you want to use the full name of a method (no regex), you can dump all full function signatures to swan-dir/function-names.txt using -n with the driver.

Analysis Results

SWAN will dump the analysis results as a JSON to swan-dir/taint-results.json. We expect that future user interfaces will consume this JSON to display warnings to the user.

[
  {
    "name": "testing",
    "description": "",
    "advice": "",
    "paths": [
      {
        "source": {
          "name": "test.source() -> Swift.String",
          "description": "generic source"
        },
        "sink": {
          "name": "test.sink(sunk: Swift.String) -> ()",
          "description": "generic sink"
        },
        "path": [
          "test.swift:3:15",
          "test.swift:4:12"
        ]
      }
    ]
  }
]

Typestate Analysis

SWAN provides two methods of writing typestate analysis: JSON-only and JSON + code.

JSON-only

We will use the following Swift example for this section (based on tests/typestate/file-open-close).

1: class File {
2:   init() {}
3:   func open() {}
4:   func close() {}
5: }
6: func foo() {
7:   let f = File()
8:   f.open()
9: }

Use the -e option and provide a typestate analysis JSON specification file like the following:

[
  {
    "style": 0,
    "name": "FileOpenClose",
    "type": "File",
    "class": true,
    "description": "Not closing file resources can cause resource leaks.",
    "advice": "Close file resources.",
    "states": [
      {
        "name": "INIT",
        "error": false,
        "initial": true,
        "accepting": true
      },
      {
        "name": "OPENED",
        "error": true,
        "message": "file left open",
        "initial": false,
        "accepting": false
      },
      {
        "name": "CLOSED",
        "error": false,
        "initial": false,
        "accepting": true
      }
    ],
    "transitions": [
      {
        "from": "INIT",
        "method": ".*File.open.*",
        "param": "Param1",
        "to": "OPENED",
        "type": "OnCall"
      },
      {
        "from": "INIT",
        "method": ".*File.close.*",
        "param": "Param1",
        "to": "CLOSED",
        "type": "OnCall"
      },
      {
        "from": "OPENED",
        "method": ".*File.close.*",
        "param": "Param1",
        "to": "CLOSED",
        "type": "OnCall"
      }
    ]
  }
]

java -jar driver.jar -e typestate-spec.json swan-dir/

You can check out this file to see the state machine on which the above specification is based. You can also look at Example 26 starting on page 94 of Johannes Späth's dissertation on SPDS.

JSON-only Specification Format

The specification JSON is an array of specifications of the following form:

  • style: 0 for JSON-only format.
  • name: Name of the specification.
  • type: The type of the values that will hold the state (supports regex).
  • class: (optional) Whether the type is a class. If this is off and the type is a class, the typestate analysis will not be object sensitive. This is because this triggers the analysis to attach to the result of the constructor function rather than the new instruction result.
  • description: What issues the specification is supposed to find.
  • advice: How to fix issues the specification finds.
  • states: Array that defines states in the finite-state machine.
    • name: Name of the state.
    • error: Whether it is an error state.
    • initial: Whether it is an initial state.
    • accepting: For our purposes, it is the inverse of error.
    • message: (only required when error is true) Why the state is an error state.
    • severity: (optional for error states) How severe this issue is (higher is more severe).
  • transitions: Array that defines state transitions in the finite-state machine.
    • from: Name of the from state (corresponds to a state in states).
    • method: Name of the method which activates the state transition (supports regex).
    • param: Which paramater of the method call represents the state value (This, Param1, or Param2). SWAN doesn't ever use This because methods are functions and therefore there is no "this" value.
    • to: Name of the to state (corresponds to a state in states).
    • type: The type of transitions (OnCall, OnCallToReturn, OnCallOrOnCallToReturn, or None).

JSON + code

Sometimes a JSON-only configuration is not enough. Some complicated typestate analyses may have many transitions and be very tedious to write in JSON. You may also want to have conditional transitions that require querying method call arguments. Therefore, SWAN provides a way to write a programmatic typestate analysis and complement it with a JSON configuration. This method is far more advanced than the JSON-only option because it requires writing code that interfaces with SPDS and SWAN.

This section will examine the typestate analysis that SWAN provides for finding inefficient configurations of the Visits Location Service. You can find more information about this analysis here. You can find the tests for this analysis in tests/typestate/energy/visits-location-service and the analysis itself in jvm/ca.ualberta.maple.swan.spds/src/scala/ca/ualberta/maple/swan/spds/analysis/typestate/statemachines/VisitsLocationService.scala.

In short, the Visits Location Service is enabled by calling startMonitoringVisits on a CLLocationManager. Then, you can set the activityType, which can be configured non optimally (this is what we want to detect).

We will attach the state to CLLocationManager values and change the state based on the activityType. The activityType can be configured without calling startMonitoringVisits(). Even if the activity type is nonoptimal, we should not report this as an error because the user never started the service.

In the following example, there is an error on line 4 because CLActivityType.fitness is nonoptimal.

1: func test_fitness() {
2:   let locationManager = CLLocationManager()
3:   locationManager.startMonitoringVisits()
4:   locationManager.activityType = CLActivityType.fitness
5: }

In SIL/SWIRL, the activity type is set using a black-box method.

  %7 = new $`CLActivityType`
  %7i0 = literal [string] "#CLActivityType.fitness!enumelt", $`Builtin.RawPointer`
  field_write %7i0 to %7, type
  %8 = builtin_ref @`#CLLocationManager.activityType!setter.foreign`, $`@convention(objc_method) (CLActivityType, CLLocationManager) -> ()`
  %9 = apply %8(%7, locationManager), $`()`

We only want to change the state of locationManager if %7 holds a nonoptimal value. A JSON-only configuration would not work here because we need conditional transitions. We could try using a backwards query on %7 to check if we should transition, but we quickly realize that we're actually interested in %7.type (another reason why JSON-only doesn't work). We can't simply model #CLLocationManager.activityType!setter.foreign to read the type field of the first argument because we would lose the calling context. We can instead replace the call to the function with our own function SWAN.CLLocationManager.setActivityType and give our function the actual activity type string.

%11 = new $`CLActivityType`
%11i0 = literal [string] "#CLActivityType.fitness!enumelt", $`Builtin.RawPointer`
field_write %11i0 to %11, type
%12 = builtin_ref @`#CLLocationManager.activityType!setter.foreign`, $`@convention(objc_method) (CLActivityType, CLLocationManager) -> ()`
%12m0 = field_read %11, type, $`Builtin.RawPointer` // <-- GET VALUE WE REALLY WANT HERE
%12m1 = function_ref @`SWAN.CLLocationManager.setActivityType`, $`Any`
%13 = apply %12m1(locationManager, %12m0), $`()`

You can see the model for SWAN.CLLocationManager.setActivityType in jvm/ca.ualberta.maple.swan.drivers/src/resources/models.swirl and the mutation code in ca/ualberta/maple/swan/ir/canonical/Mutations.scala.

Now we can set the state transition method to be SWAN.CLLocationManager.setActivityType and query the second argument for allocation sites. If the allocation site is a string literal with the value #CLActivityType.fitness!enumelt, then we can do the transition. Therefore, we have modified the source code to detect which activity type the user is setting and transition conditionally based on that information.

There are five different types of activity types, 4 of which or nonoptimal, and the default type is nonoptimal. We create a state for each type and then create transitions from every transition to every other transition. Nonoptimal configurations are not errors until startMonitoringVisits() is called on the location manager, which can happen at any time. Therefore, we create two levels of states: active and non-active. All non-active states are non-error states. When startMonitoringVisits() is called, we elevate the current non-active state to an active state through a non-conditional (simple) transition. The current state is now possibly an error state.

Programmatic typestate analysis go in jvm/ca.ualberta.maple.swan.spds/src/scala/ca/ualberta/maple/swan/spds/analysis/typestate/statemachines and they need to be added to TypeStateJSONProgrammaticSpecification.make() in jvm/ca.ualberta.maple.swan.spds/src/scala/ca/ualberta/maple/swan/spds/analysis/typestate/TypeStateSpecification.scala.

JSON Specification

We must also provide a complementary JSON configuration to activate the typestate and also provide text information about the typestate errors.

[
  {
    "style": 1,
    "name": "VisitsLocationService",
    "description": "Certain configurations of the Visits Location Service are not optimal.",
    "advice": "Use the airborne configuration (most optimal).",
    "states": [
      {
        "name": "#CLActivityType.automotiveNavigation!enumelt (active)",
        "message": "automotiveNavigation is nonoptimal.",
        "severity": 2
      },
      {
        "name": "#CLActivityType.fitness!enumelt (active)",
        "message": "fitness is very nonoptimal.",
        "severity": 3
      },
      {
        "name": "#CLActivityType.other!enumelt (active)",
        "message": "other, the default Visits Location Service configuration, is nonoptimal."
        ,
        "severity": 2
      },
      {
        "name": "#CLActivityType.otherNavigation!enumelt (active)",
        "message": "otherNavigation is very nonoptimal, especially when used in the background."
        ,
        "severity": 3
      }
    ]
  }
]

The specification JSON is an array of specifications of the following form:

  • style: 1 for programmatic format.
  • name: Name of the specification.
  • description: What issues the specification is supposed to find.
  • advice: How to fix issues the specification finds.
  • states: Array that defines information about the states in the finite-state machine.
    • name: Name of the state (must correspond to the toString() of programmatically defined states).
    • message: Why the state is an error state.
    • severity: (optional) How severe this issue is (higher is more severe).

Analysis Results

SWAN will dump the analysis results as a JSON to swan-dir/typestate-results.json.

[
  {
    "name": "VisitsLocationService",
    "description": "Certain configurations of the Visits Location Service are not optimal.",
    "advice": "Use the airborne configuration (most optimal).",
    "errors": [
      {
        "pos": "test.swift:4:32",
        "message": "fitness is very nonoptimal.",
        "severity": 3
      }
    ]
  }
]