Skip to content

Commit

Permalink
Merge branch 'master' of github.com:GaloisInc/mctrace
Browse files Browse the repository at this point in the history
  • Loading branch information
jtdaugherty committed Jun 27, 2024
2 parents 28c3f6f + e3a1bed commit 8118607
Show file tree
Hide file tree
Showing 5 changed files with 108 additions and 116 deletions.
16 changes: 8 additions & 8 deletions DTRACE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@
Supported DTrace Language Features
==================================

MCTrace uses the DTrace language as the means for expressing how it
should modify its input binary. While MCTrace does not implement all of
the DTrace language, the following core DTrace language features are
supported:
MCTrace uses a subset of the DTrace language to allow users to describe
how input binaries should be instrumented. The following DTrace language
features are supported:

* Probe descriptions and probe syntax
* Probe name pattern-matching. Supported metacharacters are `*`, `?`,
Expand All @@ -26,14 +25,15 @@ supported:
* `long ucaller`

Example DTrace probe scripts demonstrating MCTrace's features can be
found in `mctrace/tests/eval/` in the MCTrace GitHub repository, as well
as in the `examples` directory in the release Docker image.
found in `mctrace/tests/eval/` in the MCTrace repository as well as in
the `examples` directory in the release Docker image.

MCTrace-specific DTrace extensions
----------------------------------

In addition to the core language features listed above, MCTrace's
support for DTrace includes the following additional features:
In addition to the language features listed above, MCTrace's support for
DTrace includes the following additional features that are specific to
MCTrace's version of DTrace:

* A `send(int channel_id)` action for telemetry exfiltration (see
`MCTRACE.md` for details)
Expand Down
94 changes: 47 additions & 47 deletions MCTRACE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ Introduction
============

The MCTrace tool enables users to insert instrumentation into binaries
in order to collect fine-grained tracing information. MCTrace functions
similarly to DTrace but does not require any operating support (or even
an operating system). The input format of MCTrace is a subset of the
DTrace probe script language. Prior knowledge of DTrace concepts and
terminology is assumed in this document.
in order to collect fine-grained tracing information about their
execution. MCTrace functions similarly to DTrace but does not require
any operating support (or even an operating system). The input format of
MCTrace is a subset of the DTrace probe script language. Prior knowledge
of DTrace concepts and terminology is assumed in this document.

Concept of Operations
=====================
Expand Down Expand Up @@ -37,27 +37,26 @@ Using MCTrace requires:
* A DTrace probe script containing the probes that will be used to
modify the provided ELF binary.

MCTrace works by producing a modified version of its input binary that
calls Dtrace probes at the points described in the DTrace probe script.
MCTrace compiles DTrace probe scripts into native code using a compiler
backend (e.g., LLVM). It then uses binary rewriting to insert the
generated probes into the binary. Through static analysis, it identifies
program locations corresponding to DTrace probe providers; at each
provider site, it inserts calls to the compiled probes.
MCTrace works by compiling a DTrace probe script into native code using
a compiler backend (e.g., LLVM). It then uses binary rewriting to
insert the generated probes into the binary at points indicated by the
probe script. Through static analysis, it identifies program locations
corresponding to DTrace probe providers; at each provider site, it
inserts calls to the compiled probes.

Some supported DTrace language features need access to platform-specific
functionality such as memory allocation. Since the DTrace code will run
within the context of the modified binary rather than an operating
system kernel, MCTrace requires some additional code to provide access
to such platform-specific features. The MCTrace tool provides the input
program with access to this platform-specific functionality by way of an
object file of compiled code called the Platform Implementation. The
object code that implements the required functions must conform to a set
of C function prototypes called the Platform API. A complete
implementation of the Platform API must provide implementations of all
of the functions the in header file
functionality such as memory allocation. Since the DTrace code
will run within the context of the modified binary rather than an
operating system kernel, MCTrace requires some additional code to
provide access to such platform-specific features. The MCTrace tool
provides the input program with access to this platform-specific
functionality by way of an object file of compiled code called the
Platform Implementation. The object code that implements the required
functions must conform to a set of C function prototypes called the
Platform API. A complete implementation of the Platform API must
provide implementations of all of the functions the in header file
`mctrace/tests/library/include/platform_api.h` provided in the MCTrace
GitHub repository (also in `library\include` in the release Docker
GitHub repository (also in `library/include` in the release Docker
image) . Once compiled, the platform API implementation must be provided
to the `mctrace` as the `--library` argument when invoking the `mctrace`
tool.
Expand Down Expand Up @@ -94,7 +93,7 @@ DTrace global variables (containing only `read_calls` in this case) is
then transmitted as telemetry according to the platform implementation
of `send()`.

Once the probes are written, `mctrace` would be invoked as follows.
Once the probes are written, `mctrace` would be invoked as follows:

- The `foo` binary and the `probes.d` probe script are provided to
`mctrace` as inputs.
Expand Down Expand Up @@ -122,9 +121,9 @@ $ ./foo.instrumented 2>telemetry.bin
```

Once the instrumented binary has finished running, the file
`telemetry.bin` will contain the binary telemetry data corresponding
to the value of `read_calls` after each invocation of the probe. The
telemetry data must be decoded:
`telemetry.bin` in this case will contain the binary telemetry data
corresponding to the value of `read_calls` after each invocation of the
probe. The telemetry data must be decoded:

```
$ extractor.py foo.mapping.json --columns < telemetry.bin
Expand Down Expand Up @@ -155,23 +154,24 @@ MCTrace has the following limitations:
a 32-bit value). As a result, these work best on the PowerPC 32-bit
platform since the argument and return value width match the
architecture.
- Platform API implementations are subject to the following
restrictions:
- The Platform API implementation must be provided as a single object file
to MCTrace.
- The Platform API implementation must be self-contained. From a technical
point-of-view, we require that the `.text` section of the object file
contains code that does not refer to anything outside of that section.
In essence this implies that:
- Functions cannot make use of global variables.
- Functions in the implementation are allowed to call other functions
in the object file, but cannot call functions outside of it, including
functions from the standard library. NOTE: syscalls are
supported. See the architecture-appropriate version of
`platform_api.c` for examples.
- We require that the calls between functions be made via direct relative
offsets (and not via relocation tables).
- The precise mechanism to induce a compiler to generate calls using relative
offsets are somewhat compiler and platform specific, however `gcc`
on both `x86-64` and PowerPC 32-bit platforms appear to generate such code
as long as the functions being called have a `static` scope.

Platform API implementations are subject to the following restrictions:

- The Platform API implementation must be provided as a single object
file to MCTrace.
- The Platform API implementation must be self-contained. From a
technical point of view, this means that the `.text` section of the
object file contains code that does not refer to anything outside of
that section. In essence this implies that:
- Functions cannot make use of global variables.
- Functions in the implementation are allowed to call other functions
in the object file, but cannot call functions outside of it,
including functions from the standard library. Note that system
calls are supported. See the architecture-appropriate version of
`platform_api.c` for examples.
- Calls between functions must made via direct relative offsets (and
not via relocation tables).
- The precise mechanism to induce a compiler to generate calls using
relative offsets is compiler- and platform-specific; however, `gcc`
on both `x86-64` and PowerPC 32-bit platforms generates such code
as long as the functions being called have a `static` scope.
53 changes: 33 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,18 @@ Introduction

This repository contains the source code and build system for the
MCTrace binary instrumentation tool. The MCTrace tool enables users
to modify binaries, inserting instrumentation into them in order to
collect fine-grained tracing information. For information on the
MCTrace tool's design and usage, please see `MCTRACE.md`. This document
covers instructions for building MCTrace from source for development or
releases.
to modify binaries, inserting instrumentation into them in order
to collect fine-grained tracing information. This document covers
instructions for building MCTrace from source for development or
releases. For information on the MCTrace tool's design and usage, please
see `MCTRACE.md`.

Building MCTrace
================

MCTrace can be built for one of two purposes: either for local
development in a Haskell build environment, or for release as a Docker
image. Instructions for each method are detailed below.
image. Instructions for each method are provided below.

Development Build Instructions
------------------------------
Expand All @@ -29,7 +29,8 @@ System requirements for building mctrace are as follows:

To perform a one-time setup of the development environment including the
installation of LLVM, cross compilers, and other required tools, run the
development setup script:
development setup script. This script requires `sudo` privileges since
it installs system packages.

```
./dev_setup.sh
Expand Down Expand Up @@ -85,13 +86,13 @@ This will build two docker images:

- `mctrace.tar.gz`, a self-contained image that contains MCTrace, its
dependencies, associated tools, and examples. For information on using
former image, please see `release/README.md`.
this image, please see `release/README.md`.

- `mctrace-tool.tar.gz`, a minimal image containing just MCTrace and its
dependencies. A helper script, `release/mctrace` has been provided to
run the command in a container. Note that paths passed to this script
should be relative to the root of the repository and paths outside of
the repository will not accessible.
dependencies. A helper script, `release/mctrace`, has been provided to
run the command in a container based on this image. Note that paths
passed to this script must be relative to the root of the repository
and paths outside of the repository will not accessible.

Status Information
------------------
Expand Down Expand Up @@ -213,10 +214,10 @@ configurations if there is one already populated for the MPC5777C. This
may happen if you have created other projects.

Select the desired configuration (likely just created), and click on the
"PEMicro Debugger" tab. For "Interface" select the "USB Multilink..."
"PEMicro Debugger" tab. For "Interface" select the `USB Multilink...`
option. For port, select the port that the Multilink is connected to.
Likely some COMX type variant. For "Device Name", be sure "MPC5777C"
is selected and "Z7_0" for "Core". Default options should work for the
Likely some COMX type variant. For "Device Name", be sure `MPC5777C`
is selected and `Z7_0` for "Core". Default options should work for the
rest. Click "Flash".

A similar workflow should be possible by selecting the project,
Expand All @@ -241,16 +242,28 @@ Acknowledgements
This material is based upon work supported by the United States Air
Force AFRL/SBRK under Contract No. FA8649-21-P-0293, and by the Defense
Advanced Research Projects Agency (DARPA) and Naval Information Warfare
Center Pacific (NIWC Pacific) under Contract Number N66001-20-C-4027 and 140D0423C0063.
Any opinions, findings and conclusions or recommendations expressed in
this material are those of the author(s) and do not necessarily reflect
the views of the DARPA, NIWC Pacific, or its Contracting Agent, the U.S. Department of the Interior, Interior Business Center, Acquisition Services Directorate, Division III..
Center Pacific (NIWC Pacific) under Contract Number N66001-20-C-4027 and
140D0423C0063. Any opinions, findings and conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the DARPA, NIWC Pacific, or its
Contracting Agent, the U.S. Department of the Interior, Interior
Business Center, Acquisition Services Directorate, Division III.

SBIR DATA RIGHTS
Contract No. 140D0423C0063
Contractor Name: Galois, Inc.
Contractor Address: 421 SW Sixth Ave., Suite 300, Portland, OR 97204
Expiration of SBIR Data Protection Period: 06/07/2042
The Government's rights to use, modify, reproduce, release, perform, display, or disclose technical data or computer software marked with this legend are restricted during the period shown as provided in paragraph (b)(5) of the Rights in Noncommercial Technical Data and Computer Software-Small Business Innovation Research (SBIR) Program clause contained in the above identified contract. After the expiration date shown above, the Government has perpetual government purpose rights as provided in paragraph (b)(5) of that clause. Any reproduction of technical data, computer software, or portions thereof marked with this legend must also reproduce the markings.

The Government's rights to use, modify, reproduce, release, perform,
display, or disclose technical data or computer software marked with
this legend are restricted during the period shown as provided in
paragraph (b)(5) of the Rights in Noncommercial Technical Data and
Computer Software-Small Business Innovation Research (SBIR) Program
clause contained in the above identified contract. After the expiration
date shown above, the Government has perpetual government purpose rights
as provided in paragraph (b)(5) of that clause. Any reproduction of
technical data, computer software, or portions thereof marked with this
legend must also reproduce the markings.

(c) 2022-2024 Galois, Inc.
22 changes: 0 additions & 22 deletions dev_setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -162,27 +162,6 @@ function build_ppc_musl_compiler {
fi
}

function build_arm_musl_compiler {
cd $HERE

if [ ! -f "$HERE/musl-gcc-arm/output/bin/arm-linux-musleabi-gcc" ]
then
notice "Cloning and building ARM GCC cross compiler (this will take a while)"

logged git clone $MUSL_CROSS_MAKE_REPO musl-gcc-arm
cd musl-gcc-arm
logged git checkout $MUSL_CROSS_MAKE_REF

logged cp -f config.mak.dist config.mak
echo "TARGET = arm-linux-musleabi" >> config.mak

logged make
logged make install
else
notice "ARM GCC cross compiler already built, skipping"
fi
}

function install_docker {
if [ ! -z "$SKIP_DOCKER" ]
then
Expand Down Expand Up @@ -233,7 +212,6 @@ install_ghcup
symlink_cabal_config
update_submodules
build_ppc_musl_compiler
build_arm_musl_compiler
install_docker

notice "Done."
39 changes: 20 additions & 19 deletions release/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ docker image load -i mctrace.tar.gz
docker run -it -w /mctrace-test mctrace
```

This will drop you into a bash shell within the Docker container in the
This will drop you into a `bash` shell within the Docker container in the
directory `/mctrace-test` where you can use `mctrace` to instrument
binaries. We discuss the details of the `mctrace` tool in the following
sections. All relative paths mentioned in this document are relative to
Expand All @@ -24,22 +24,20 @@ test programs and example probes that can be used to exercise MCTrace.

Two documentation files are available in the top-level directory:

* `MCTRACE.md` describes the features and limitations of the current
MCTrace tool.
* `MCTRACE.md` describes the features and limitations of the MCTrace
tool.

* `DTRACE.md` describes the subset of the DTRACE language supported by
the current implementation
the current implementation.

Important folders are as follows:

* `examples/eval` contains a collection of probes primarily derived
from those provided to us by WebSensing. The probes have been
modified slightly after discussions with WebSensing to fit the
currently supported DTrace syntax in MCTrace.
from those provided by users.
* `examples/full` contains source code and binaries for bundled test
programs.
* `examples/binaries` contains binaries from a statically compiled
version of GNU coreutils for use with `mctrace`.
version of GNU `coreutils` for use with `mctrace`.

Using MCTrace in this demonstration
-----------------------------------
Expand Down Expand Up @@ -70,24 +68,27 @@ mctrace instrument --binary=/mctrace-test/examples/full/read-write-syscall-PPC \
- The `--var-mapping` option tells `mctrace` where to record metadata
that allows it to later interpret the collected telemetry.

The above command instruments the binary with probes that triggers
at the start and end of the `write` function and computes timing
The above command instruments the binary with probes that trigger
at the start and end of the `write` function and compute timing
information for the call. Note that the instrumentation command produces
a significant amount of DEBUG logs, that can be ignored at the moment.
a significant amount of log output that may be ignored.

When probes call the DTrace `send` action, the current test
implementation of `send` pushes the set of telemetry variables, in a
compact binary format, to the standard error. A script `extractor.py`
has been included with the image to help interpret this data.
compact binary format, to the process's standard error. A script
`extractor.py` has been included with the image to interpret this
data.

To invoke the instrumented binary and use the `extractor.py` script to
decode any emitted telemetry:

/mctrace-test/examples/full/read-write-syscall-PPC.4.inst 2>&1 >/dev/null | \
extractor.py /mctrace-test/examples/full/read-write-syscall-PPC.4.json --extract --big-endian

NOTE: if running the instrumented binary as above fails for a PPC
binary, run this through `qemu-ppc` as follows:
The Docker image's environment is configured to run PPC binaries such
as this one in a PPC build of QEMU automatically. However, if that
automatic emulation isn't working for some reason, QEMU can be run
manually as follows:

qemu-ppc /mctrace-test/examples/full/read-write-syscall-PPC.4.inst 2>&1 >/dev/null | \
extractor.py /mctrace-test/examples/full/read-write-syscall-PPC.4.json --extract --big-endian
Expand All @@ -104,17 +105,17 @@ This produces output similar to the following:

- Note that `2>&1 >/dev/null` has the effect of piping the standard
error to the next command while suppressing the standard output of the
command. We do this because the provided platform API implementations
writes `send()` data to `stderr` and we need that data to be piped to
the extractor script.
command. This is done this because the provided platform API
implementations write `send()` data to `stderr` and the resulting
telemetry data needs to be piped to the extractor script.

- When extracting telemetry data from instrumented PowerPC binaries, the
flag `--big-endian` must be passed to the extractor script as in the
command above. The flag should be elided when working with `x86_64`
binaries.

- The `extractor.py` script offers a few other conveniences when
extracting data from instrumented programs; for example it can produce
extracting data from instrumented programs; for example, it can produce
columnar outputs and filter columns. See `extractor.py --help` for
details on these options.

Expand Down

0 comments on commit 8118607

Please # to comment.