v1.8.0-beta1
Pre-releaseAbout beta1
Four months have gone by with zero commits, so it is as good a time as any to release beta1. It extends identifier encoding and adds various forms of identifier mapping using map files similar to those produced by ProGuard. The v1.7.0-to-v1.8.0-beta1 transition was coded in a spree of two months; I brought the code to a releasable state, then I was too burned out to release and document the changes.
Intro
After years of avoiding the ever present issue of obfuscation, with v1.8.0 DexPatcher-tool gets a new system of transforms that is completely orthogonal to its legacy workings and allows tackling obfuscated code effectively. Rather than imposing 'the right way' of doing things, it supports several different ideas and workflows. To be most useful, these workflows need to be supported in future DexPatcher-gradle plugins, but creative users of the tool have already started to roll their own flows around the new features.
Because of the expansion in scope, v1.8.0 turned out to be a huge release, by far the biggest one ever since the project started. DexPatcher-tool is a 5 year old project. Still, in the move from v1.7.0 to v1.8.0, its source code generously more than doubled in size, an observation that is valid for code, comments, test harness, etc, and does not even count the substantial amount of existing code that was changed.
A change of this magnitude should not be released without real-world testing, and this release has had almost no testing done on it. An untested codebase this complex is most likely teeming with all sorts of nasty bugs. It is almost irresponsible of me to move it out of alpha this early, but my motives are:
- I want to communicate that the architecture of the v1.8.0 release as well as its command line interface should finally be stable. What follows should be bug fixes. Lots of them.
- Though I would like to, I do not think I will be able to embark on an app patching project to test this release in the field myself.
- By this token, testing will have to happen as users take up patching projects, and I do not want to delay future developments until feedback trickles in.
- I exercised extreme caution when modifying existing code, so likely no regressions where introduced in preexisting functionality.
The release is v1.8 instead of v2 because, from the perspective of the patch writer, the patching language has not changed; only better tooling for dealing with obfuscated code is provided. I am reserving v2 for when DexPatcher tool supports patching the actual bytecode inside of methods, instead of simply moving it around or replacing it.
In the months after I coded this release, some new nice-to-have features cropped up, but i decided against including them in v1.8.0. These are:
- A 'hollow' code transform that would create legal but mostly empty skeleton bytecode to be fed to decompilers to create skeleton classes that DexPatcher patch writers can use as templates.
- Switching to dexlib2 v2.4.0, which includes better support for Android 10.
- A convenience code transform to support defining Android 10's
hiddenapi_class_data_item
using Java annotations.
Obfuscation support
This release adds support for processing obfuscated code. It allows patching all bytecode in which illegal identifiers (eg: containing illegal characters or matching reserved words) or clashing identifiers (eg: clashing package and class names) are used.
Support is provided by optional code transforms that deal with identifier coding and mapping.
Identifier decoding transforms
Identifier decoding transforms decode instances of identifier codes found in bytecode. The following new command line options control these transforms:
--decode-source decode identifiers in source
--decode-patches decode identifiers in patches
--decode-output decode identifiers in output
--code-marker <marker> identifier code marker (default: '_$$_')
--no-decode-errors treat decode errors as warnings
Identifier codes can be used to define patches in legal Java that reference illegal identifiers in obfuscated bytecode. The patches can be decoded prior to patching (producing patching diagnostics with obfuscated names) or, alternatively, obfuscated identifiers present in the source bytecode can be encoded before patching and decoded after patching (producing patching diagnostics with encoded names).
A sample patching supposedly obfuscated code using the former strategy is available: see source and patch. Identifier code samples can be found here. The result of running the patched code is here.
Identifier encoding transforms
Identifier encoding transforms encode some names found in bytecode as instances of identifier codes. The following new command line options control these transforms:
--code-marker <marker> identifier code marker (default: '_$$_')
--encode-source encode identifiers in source
--encode-map <file> encode map file (repeatable option)
--invert-encode-map use inverse of encode map file
--escape-non-ascii escape non-ASCII characters
--escape-non-latin escape non-ASCII/Latin-1 characters
--no-ascii-escapes do not output ASCII escapes
--no-code-point-escapes do not output code point escapes
--obfuscated-types <ptrn> pattern for binary type names
(form: '[<pkg>/...][<cls>$...]<cls>')
--obfuscated-packages <ptrn> pattern for non-qualified package names
--obfuscated-classes <ptrn> pattern for non-qualified class names
(form: '[<cls>$...]<cls>')
--obfuscated-members <ptrn> pattern for member names
--encode-all-classes encode all class names
--encode-obfuscated-packages encode obfuscated package names
--encode-obfuscated-classes encode obfuscated class names
--encode-obfuscated-members encode obfuscated member names
--encode-reserved-chars encode names with reserved characters
--encode-reserved-words encode names matching reserved words
--encode-class-hints encode type hints in classes
--encode-member-hints encode type hints in members
--encode-member-type encode member type in members
--no-identifier-type do not encode identifier type
--no-multiple-hints only allow unique type hints
--no-nested-classes disable nested class processing
--ignored-hint-type <type> fully qualified name of type
(use '-' to remove defaults)
(repeatable option)
--ignored-hint-types <ptrn> pattern for binary type names
(form: '[<pkg>/...][<cls>$...]<cls>')
--encode-compilable allow recompile of obfuscated code
Identifier encoding can be used to make obfuscated identifiers legal before patching and restore the obfuscation afterwards, with the patch code and patching diagnostics referring to the legal encoded names exclusively. Encoding can also help with code analysis by assigning descriptive encoded names to obfuscated names based on automatic bytecode analysis and/or identifier map files that the user provides (which would typically be generated using external AI deobfuscation tools). The encoded bytecode is typically fed to interactive viewers and decompilers, presenting the user with a unified encoded view of the codebase being worked on.
Several identifier encoding samples are available: by encode map (source, encode map, output), by pattern matching (source, output parts 1 and 2), by double encoding (source, output), and by character escaping (source, output). In these samples the encoding is not reverted after patching to allow dumping the generated bytecode for documentation purposes, but in real-world usage a decode transform would typically revert all encodings before the output is generated.
Identifier mapping transforms
Identifier mapping transforms map or unmap names found in bytecode according to map files that the user provides. The following new command line options control these transforms:
--map-source apply map to identifiers in source
--unmap-source apply map inverse to identifiers in source
--unmap-patches apply map inverse to identifiers in patches
--unmap-output apply map inverse to identifiers in output
--map <file> identifier map file (repeatable option)
--invert-map use inverse of identifier map file
--compose-map <file> compose map file (repeatable option)
--invert-compose-map use inverse of compose map file
Identifier mapping can be used to make obfuscated identifiers legal before patching and restore the obfuscation afterwards, with the patch code and patching diagnostics referring to the legal mapped names exclusively. Mapping helps with code analysis by assigning descriptive names to obfuscated names based on user-provided map files. The mapped bytecode is then typically fed to interactive viewers and decompilers, presenting the user with a unified deobfuscated view of the codebase being worked on. The map files are typically created by hand based on an interactive 'map/decompile/update map' cycle, but could also be generated using external AI deobfuscation tools. Or hand-made maps could be composed over auto-generated maps, so that newer versions of the obfuscated code being worked on (with a completely different obfuscation) can reuse the hand-made map as long as the AI tools can provide a stable enough base for it to map from.
A sample patching supposedly obfuscated code using map files is available: see source, map, composing map, and patch. After forward mapping, patching happens in the mapped realm. Then reverse mapping restores obfuscation, and running the patched code confirms that obfuscation is present.
Map file templates
As a convenience, the tool can generate a map file template for all identifiers present in the output bytecode when invoked with the following new command line option:
--create-map <file> create template map file based on output
This is typically invoked out-of-workflow to create a starting point for a map file that will be hand-generated during development. A sample generated template is available here.
Anonymous class support
Special thanks to @andrewleech whose work inspired these changes.
This release adds support for processing anonymous classes. It allows easier patching of existing anonymous classes, and also defining new anonymous classes in patches without risk of them name-clashing with the existing ones.
Support is provided by optional type renaming code transforms that deanonymize and reanonymize classes. The following new command line options control these transforms:
--deanon-source deanonymize anonymous classes in source
--deanon-source-alt deanonymize source with alternate plan
--deanon-patches deanonymize anonymous classes in patches
--deanon-patches-alt deanonymize patches with alternate plan
--reanon-source reanonymize anonymous classes in source
--reanon-patches reanonymize anonymous classes in patches
--reanon-output reanonymize anonymous classes in output
--main-plan <anon-plan> main anonymization plan (default: 'Anon[_]')
--alt-plan <anon-plan> alternate plan (default: '[_]_patch')
--no-reanon-errors treat reanonymize errors as warnings
The <anon-plan>
argument is a string that represents an anonymous class renaming template and is described here. Sample code involving anonymous classes is available: see source and patch.
Pre-transform stages
Code transforms run on-demand as pieces of bytecode are internally operated upon. This means that it is not required to keep models of the transformed bytecode in memory, and thus memory usage is low and mostly independent of bytecode size. However, as the transforms execute spread out in time, their log output will also be spread throughout the patching process, making the logs difficult to analyze.
To mitigate this effect, this release adds optional pre-transform stages to the processing pipeline that fully run a transform or set of transforms ahead of time to ensure their side effects occur before or during a particular stage, but never after. These stages are configured with the following new command line option:
--pre-transform <set> add pre-transform stages (default: 'out')
(<set>: 'none'|'dry'|'out'|'inout'|'all')
The added pre-transform stages depend on the chosen set:
all
: a pre-transform stage is added after each transform is applied. Each transform produces a completely separate log output. This is the slowest mode and is only recommended for debugging transform issues.inout
: a pre-transform stage is added per input dex file after the set of transforms corresponding to each file is applied. a pre-transform stage is also added after the set of output transforms is applied to the result of the patching stage, but before the output dex file is written, if any. Each pre-transform stage produces a separate log output for each input and output dex (even if no output file is written). Within each stage, the logs of the corresponding transforms will be spread out. This is the fastest mode that will detect and log all input transform errors, including those errors that occur in input sections that will be discarded during the patch process.out
: a single pre-transform stage is added to the pipeline, right after the set of output transforms is applied to the result of the patching stage, but before the output dex file is written, if any. All transform side effects and their output will be spread throughout the patching process, but they are guaranteed to occur before the output dex file is written. This is the fastest mode that will detect all errors in the tool's processing pipeline before the output file is created, and thus is the fastest mode that guarantees that no output file is generated in case of errors. This is the default mode.dry
: a single pre-transform stage is added to the pipeline, right after the set of output transforms is applied to the result of the patching stage, but only if a dry run is detected (ie: no output dex file will be written). Dry runs may be explicitly requested, or a normal run may become dry if an error is detected before the output file is created. Note that transform errors may still occur after that point. This is the fastest mode that will detect all errors in the tool's processing pipeline that would occur if an output dex file were generated, even if it is not. No mode is faster for successful, non-dry runs. This mode is recommended for use with the DexPatcher Gradle plugins.none
: no pre-transform stages are added to the pipeline. Transform errors may be masked during dry runs, as large sections of bytecode may not be fully processed during those runs. Conversely, transform errors may occur after the output dex file is written during non-dry runs. This mode is recommended for embedded uses of the DexPatcher tool where only a success/failure exit status is required.
About 'release' patches
Many transforms could be used during patch development to handle code obfuscation and for other reasons. These are typically applied (JIT or AOT) to the code being patched before the patch is applied. Patch development happens under the transformed view of the code, including patch writing and patch application (providing transformed diagnostics). Further down the workflow pipeline, after patching and before code generation, transforms are typically reverted to preserve the original code obfuscation and other characteristics.
This is fine for development, but distributing such a development patch is awkward: it requires accompanying the patch with transform metadata in the form of complex command lines, map files, etc, in order for it to apply as designed to the targeted code. Such distribution may also reveal confidential details about the targeted code or the patch itself.
A solution to this problem is to design the workflow pipeline so that all necessary transforms can be pre-applied to release versions of the patch. Such patches can be applied to the targeted code using a standard command line and no additional metadata.
The automated tests ran during DexPatcher-tool builds used to involve a single invocation of the tool. Now the tool is invoked twice, effectively running all tests a second time as if a release patch had been generated and then applied to the untransformed targeted code. (The release patch file itself is not actually generated for efficiency reasons, but operations do happen on a internal representation of it.) Finally, after independently applying the development and release patches to the targetted test code, the resulting bytecode files are compared bit-for-bit as an extra correctness verification for the system.
Using code transforms with the DexPatcher Gradle plugins
The DexPatcher Gradle plugins have not yet been updated to support the DexPatcher tool's new command line options. However, code transforms can still be used by manually adding the required command line options to the extraArgs
properties of Gradle tasks of type lanchon.dexpatcher.gradle.tasks.DexpatcherTask
.
Other changes
- Retention policy of DexPatcher annotations changed from
CLASS
toRUNTIME
. This change is required for compatibility with the newr8
/d8
dexers. Contrary to the olddx
, these dexers remove annotations withCLASS
retention, presumably because it is expected that all bytecode processing happens before dexing. Or, who knows, it could be a bug. Note that DexPatcher always strips its annotations during processing, so this change does not affect the bytecode produced by the tool. - This release fixes a bug existing since the introduction of cross-class edits in v1.5.0, which involves arrays of the type being rewritten. This bug stems from a sample code bug in the documentation of smali's dexlib2 library. More info here.
- Minor command line changes:
- The
-d
option is now accepted as a short for--debug
. - The tool will no longer print usage information when it cannot parse the command line.
- The
- Client API change:
- The builder pattern is now used to construct core
Context
objects.
- The builder pattern is now used to construct core
- Changes to the build system:
- Reproducible builds.
- Gradle updated from v2 to v6.
- Switched to the
java-library
plugin. - Shadow plugin updated from v1 to v5.
- Deduplication of resources.
- About info moved to
META-INF/about
. - Rewritten tests:
- Locally installed tools are no longer required to run tests:
- Required tools are dynamically downloaded.
- All tests now run in build servers.
- Switched dexer from
dx
tod8
. - Improved output of test results.
- Locally installed tools are no longer required to run tests:
- New build diagnostics.
- Build code sharing with other DexPatcher projects.
- Signed artifacts.
- Publish artifacts to local and remote repositories.
- Compatible with Sonatype and JitPack.
- Added Travis CI setup with full testing.
$ dexpatcher --help
DexPatcher version 1.8.0-beta1 by Lanchon (https://dexpatcher.github.io/)
usage: dexpatcher [<option> ...] [--output <patched-dex-or-dir>]
<source-dex-apk-or-dir> [<patch-dex-apk-or-dir> ...]
main options:
-a,--api-level <n> android api level (default: auto-detect)
-m,--multi-dex enable multi-dex support
-M,--multi-dex-threaded multi-threaded multi-dex (implies: -m)
-J,--multi-dex-jobs <n> multi-dex thread count (implies: -m -M)
(default: available processors up to 4)
--max-dex-pool-size <n> maximum size of dex pools (default: 65536)
--annotations <package> package name of DexPatcher annotations
(default: 'lanchon.dexpatcher.annotation')
--no-auto-ignore no trivial default constructor auto-ignore
-o,--output <dex-or-dir> name of output file or directory
--create-map <file> create template map file based on output
--dry-run do not write output files (much faster)
-q,--quiet do not output warnings
-v,--verbose output extra information
-d,--debug output debugging information
-p,--path output relative paths of source code files
-P,--path-root <root> output absolute paths of source code files
--stats output timing statistics
-h,--help print this help message and exit
--version print version information and exit
code transform options:
--map-source apply map to identifiers in source
--unmap-source apply map inverse to identifiers in source
--unmap-patches apply map inverse to identifiers in patches
--unmap-output apply map inverse to identifiers in output
--map <file> identifier map file (repeatable option)
--invert-map use inverse of identifier map file
--compose-map <file> compose map file (repeatable option)
--invert-compose-map use inverse of compose map file
--deanon-source deanonymize anonymous classes in source
--deanon-source-alt deanonymize source with alternate plan
--deanon-patches deanonymize anonymous classes in patches
--deanon-patches-alt deanonymize patches with alternate plan
--reanon-source reanonymize anonymous classes in source
--reanon-patches reanonymize anonymous classes in patches
--reanon-output reanonymize anonymous classes in output
--main-plan <anon-plan> main anonymization plan (default: 'Anon[_]')
--alt-plan <anon-plan> alternate plan (default: '[_]_patch')
--no-reanon-errors treat reanonymize errors as warnings
--decode-source decode identifiers in source
--decode-patches decode identifiers in patches
--decode-output decode identifiers in output
--code-marker <marker> identifier code marker (default: '_$$_')
--no-decode-errors treat decode errors as warnings
--pre-transform <set> add pre-transform stages (default: 'out')
(<set>: 'none'|'dry'|'out'|'inout'|'all')
identifier encode options:
--encode-source encode identifiers in source
--encode-map <file> encode map file (repeatable option)
--invert-encode-map use inverse of encode map file
--escape-non-ascii escape non-ASCII characters
--escape-non-latin escape non-ASCII/Latin-1 characters
--no-ascii-escapes do not output ASCII escapes
--no-code-point-escapes do not output code point escapes
--obfuscated-types <ptrn> pattern for binary type names
(form: '[<pkg>/...][<cls>$...]<cls>')
--obfuscated-packages <ptrn> pattern for non-qualified package names
--obfuscated-classes <ptrn> pattern for non-qualified class names
(form: '[<cls>$...]<cls>')
--obfuscated-members <ptrn> pattern for member names
--encode-all-classes encode all class names
--encode-obfuscated-packages encode obfuscated package names
--encode-obfuscated-classes encode obfuscated class names
--encode-obfuscated-members encode obfuscated member names
--encode-reserved-chars encode names with reserved characters
--encode-reserved-words encode names matching reserved words
--encode-class-hints encode type hints in classes
--encode-member-hints encode type hints in members
--encode-member-type encode member type in members
--no-identifier-type do not encode identifier type
--no-multiple-hints only allow unique type hints
--no-nested-classes disable nested class processing
--ignored-hint-type <type> fully qualified name of type
(use '-' to remove defaults)
(repeatable option)
--ignored-hint-types <ptrn> pattern for binary type names
(form: '[<pkg>/...][<cls>$...]<cls>')
--encode-compilable allow recompile of obfuscated code