-
-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Bytecode reachability analysis for fine-grained target invalidation #2417
Conversation
…hods or perform field operations on a class
…n quickly and subprocess overhead dominates
@lefou I think this PR is more or less ready to merge. I've tried to flesh out the PR description with all the information necessary, including debugging information of how to understand what's happening in the analysis and why specific methods are getting invalidated. I've also done a bunch of manual testing and it seems to do what it's supposed to, given the known limitations of the approach. Even with the limitations, it should be a strict improvement over the status quo which just invalidates everything in the common case of a single It's plausible we'll hit unexpected edge cases once people start using it, but I think that pre-merge testing is reaching it's limits and we'll need to roll this out to see if people hit any such issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I really like the large set of tests and your inline comments. Without it I'd be pretty lost, tbh. I'd say lets merge and test it in the wild. Given the complexity of the topic and the amount of code, we should be prepared to seen some glitches, but thanks to the well prepared test setup, we can include reproducers pretty easily and also have the option to fall back to coarse grained invalidation as a last resort. So, yeah, let's give your great work a go.
I think we should merge this with a merge commit to keep the progress in the history. It might help in understanding the approach. |
I don't think I've kept the history clean enough for a merge commit; there are 171 commits too many of which are meaningless WIP. The actual code is pretty small (~1000 lines), and hopefully I've inline-commented most of the things worth taking note of |
As long as Mill is hosted on Github you can still see the commit history via the pull request history, even if it's not on the main branch. |
You've named the issue. ;-)
Also, without any anchor (URL) in the commit message, it's hard to find the branch even when Mill is hosted on Github. Also your IDE will not assist you. This later issue of finding the PR should be covered by our contributing/maintenance guide, though. Line 313 in 2c7fdfd
|
This PR replaces the coarse-grained script-import based target-invalidation-on-code-change (introduced in #1663) with fine-grained invalidation based on the JVM-level method callgraph (previous discussion). This callgraph is constructed by using ASM to parse the bytecode generated by compiling
build.sc
: We compute a code-hash for every method based on its bytecode, and propagate that hash throughout the call graph Merkle-Tree-style, resulting in amethodCodeHashSignatures: Map[String, Int]
dictionary that associates each method with a hash signature representing both its own code as well as the code of all methods that it calls (transitively). A more limited analysis is applied to upstream library code. This hash signature is then used as part of theinputsHash
for each target, causing a Target to invalidate after a code change only if the code change affected the Target's implementation method or some other method that it calls (transitively)Of the 7,000+ lines in this PR, only about ~1,000 are implementation code, the other ~6,000 are test cases. Most of the 100+ files added in this PR are also just tiny standalone Java/Scala files serving as test cases, with the main logic in
main/codesig/
being about a half-dozen files and a handful more for the test utilitiesAlgorithm
At a high level, the analysis logic living in the
main/codesig/
folder does the following:LocalSummary
parses the compile output ofbuild.sc
. This extracts the class hierarchy, computes method code hashes, and identifies method callsites within method bodies.ExternalSummary
parses any super-classes of locally-defined classes that are in the upstream classpath - Java/Scala stdlib, Mill code, or things that areimport $ivy
ed - and extracts the class hierarchy and method signatures without analyzing method bodies.InputStream#read
, can be properly traced to local implementations even if they do not extend the interface directly (e.g. aMyInputStream extends ByteArrayInputStream
class)ResolvedCalls
uses the class hierarchy to identify what potential implementations each callsite will resolve to.ReachabilityAnalysis
converts the method resolution results into a heterogenous call graph (comprisingLocalDef
s,Call
s, andExternalClsCall
s) that we feed into Tarjan's to generate topo-sorted strongly-connected-components. We then walk the SCCs in topological order to do the Merkel-Tree hash propagation in one pass, with every method within a SCC having the same hash (because they all call each other circularly, and so they must have the same transitive call graph)For every
Target
, we look up the hash of it'sdef
method, as well as the<init>
constructor methods for each of its enclosingModule
classes, and combine them to form the final hash of theTarget
Target
s must live onModule
s which form a tree of staticobject
s. We walk the object tree from theTarget
's directly enclosing module to the root module using a newenclosingModule
property on themill.define.Ctx
value, and look up all their<init>
methods. They should all only have one<init>
method since they're static objectsFrom the Call Graph Analysis literature, this is a relatively straightforward Class Hierarchy Analysis (CHA), with some tweaks called out above. Notably we aren't able to generate callgraphs via more precise algorithms such as Rapid Type Analysis (RTA), which would generalize our the lambda special-casing for all user-implemented traits.
Limitations
We do not do any kind of dataflow analysis.
def
s andval
s in the body of a class are handled differently, despite looking superficially similar.def
s are full methods and participate in the call graph, so changes would only invalidate downstream Targets that transitively end up calling that method.val
s are simply fields that are initialized in the class'<init>
constructor method or<clinit>
class initialization, so changing inval
ends up invalidating any downstream Targets that create anew
instance of that class: probably a much broader set of TargetsBoth the handling of JVM lambdas and the handling of virtual methods are approximations
InputStream#read
, we will assume that Target depends on allInputStream#read
s across the codebase, even if for subclasses ofInputStream
that never get instantiatedWe avoid detailed callgraph analysis of external/upstream libraries because the entire transitive codebase would be very large: e.g. all of Mill, all of
com.lihaoyi
, all of Scala std lib, all of Java std lib. Limiting our detailed analysis to the relatively-smallbuild.sc
compile output simplifies things and greatly speeds things up, at the expense of some precision in the callgraphApart from the question of CHA v.s. RTA, it is probably also possible to get more precise callgraphs by performing the analysis at the Scala level rather than at the JVM bytecode level (Constructing Call Graphs of Scala Programs). I chose to work at the JVM bytecode level because it's simple/stable/familiar to me, but trying to do the analysis at the Scala-level is potential future work.
Notes
I special-case handling of
new sourcecode.Line(_)
and explicitly ignore the argument (~always a literal number). This is because we explicitly do not want line number changes to force things to re-compile: even if they can in theory affect the output of a target, the assumption is that it's just there for debugging/inspect purposes and won't. The way I do this in a bit hacky and relies on the bytecode pattern generated by the Scala compiler, but it seems to work.This analysis in ignores fields entirely. In general, any change to instance or static fields that matters would be accompanied by corresponding changes to method bodies, even if just changes to the
<init>
constructor method.The approximations we use to handle InvokeDynamic lambdas and virtual method calls is necessary for this analysis to be precise enough to be useful.
def foo = T{}
Target to every other Target in thebuild.sc
.mill.Module
someone instantiates (e.g. by referencing the module name, which invokes the class static initializer, which instantiates the class itself) would have all its methods and Targets depended-on by default by the instantiating method.We ignore calls to
Target
-returning methods as part of the static callgraph analysis:Target
s,Input
s,Worker
s, etc. ThoseNamedTask
s will themselves get invalidated if necessary if the non-NamedTask
code they depend on changes, and invalidation will already get propagated through the runtimeTask
graph without needing to also model it in the static callgraph. This would improve precision while also greatly shrinking the static callgraphsThis relies on an additional assumption over the current analysis: it assumed that methods returning
NamedTask
s are pure and do not generate side effects. This is probably an OK assumption to make, because proper operation of the runtimeTask
graph also relies on that assumptionThis greatly improves precision for our use case. Virtual methods implemented in many different places is a big weakness of CHA since it has to assume any implementation could be called, especially combined with calls to methods on external library traits that are also treated conservatively and have to assume that any virtual method defined in that library (and thus any local implementation) can be called. However, in a Mill build, the vast majority of "virtual methods implementations" and "external library method calls" are the
mill.define.Target[A]
/T[A]
methods we override and call. Skipping static callgraph analysis for these and relying on the exact runtime build graph is thus a huge improvement to the precision of target invalidation.I wire up JSON logging (to disk in the
T.dest
folder) to the--debug
flag. It's too expensive to generate all the time (~doubles the time taken formethodCodeHashSignatures
), but we definitely want it to be available when things are misbehaving and we need to debug thingsAutomated Testing
main/codesig/test/cases/callgraph/
contains unit tests that assert the (simplified) call graphs are the right shape for a variety of minimal scenarios. This also includes a few less trivial scenarios in therealistic/
folder: some of my old Java games, a parallel merge-sort algorithm and some Castor actor code from my book Hands-on Scala, a Fastparse parser. Both Java and Scala code work because we perform the analysis at the bytecode level.We also exercise the transitive hash-propagation logic in some of the
callgraph/
tests by replacing the "hash sum" logic with "Set union", using it to compute transitive closures of the graph, and asserting that the transitive closures are what we expect. This adds test coverage for most of the logic around the hash propagation, without the expense of performing runtime code-changes and re-compilation that we need to perform in the integration tests.main/codesig/test/cases/methodhash/
contains tests that assert that the method code hash signature has the correct properties: that it changes when it should change and is un-changed when it should be un-changed (e.g. when there are only formatting/line-number/comment changes)integration/feature/codesig{simple,trait,scalamodule}
contain some integration tests in that test the workflow end-to-end including the integration into Mill's evaluator. These ensure that when you edit a file and re-run Mill, the correct targets are re-evaluated. Test cases include a single Target, Targets inside moduleobject
s andtrait
s, a singleScalaModule
, and a bunch of dependent/unrelatedScalaModule
sI had to update a bunch of existing tests - e.g. those in
integration/feature/invalidation/
- because now random code changes no longer invalidate targets within that file unless you actually change the target or method bodies that the target calls.This is a binary compatible change. I had to add some MIMA exclusions due to MIMA not correctly considering things nested within a
private object
asprivate
(lightbend-labs/mima#771).I added an opt-out flag
--disable-callgraph-invalidation
to swap back to the previous script-import-graph-based invalidation system, as a change-management measure to mitigate the risk of the novel invalidation algorithmManual Testing
I did some end-to-end tests on this PR, running
__.compile
on a few different projects:On the com-lihaoyi/fansi build, the incrementality seems to work as expected: e.g. changing the
fansi.JsFansiModule#scalaJSVersion
from1.10.1
to1.10.0
invalidates 87/796 targets which all seem Scala.js related, while changing theFansiTestModule#ivyDeps
uTest version from 0.8.1 to 0.8.0 invalidates 67/875 targets that are all test suite related. This is working as expectedOn the
com-lihaoyi/upickle
build, changingbench.scalaJSVersion
from1.13.0
to1.13.1
invalidates 28/3695 targets, while changingCommonJsModule#scalaJSVersion
invalidates 412/3695 targetsOn the
com-lihaoyi/mill
codebase:Adding a whitespace at the top of
build.sc
, to offset everyone's line numbers without changing their logic, invalidates 16/8154 targets, all downstream ofmillVersion
which is invalidated by the dirty hash of the repo checkout changingChanging
scalajslib.worker[1].ivyDeps
by re-ordering them, invalidates 17/8154 targets (just the ones above, + the one edited)Changing
MillScalaModule#scalacOptions
by removing-feature
, invalidates 836/8154 targetsChanging
contrib.playlib.WorkerModule#sources
generating a temporary file, invalidates 37/8154 targets, mostly stuff incontrib.playlib.worker[_]
Changing a constant
Deps.scalaVersion
ends up invalidating ~4343/8154 targets. I skimmed through the results and they seem reasonable: all.scalaVersion
and.compile
targets end up invalidated, while many external targets aren't invalidated because they don't have any local code to be affected by the callgraph analysis and their upstream build graph is not affected byDeps.scalaVersion
(e.g.integration.test.javacOptions
,contrib.codeartifact.mandatoryScalacOptions
)Adding a new
val abc = 123
to the root module invalidates everything, which is expected since it changes the constructor ofbuild
which could potentially affect any module nested within it.lazy val
s. Maybe we can do it automatically via a compiler plugin.val
s while still distinguishing whichval
is used in what method, but that is much more complicated analysis than the callgraph analysis this PR implementsThe Fansi and uPickle results shows that this approach does work for non-trivial builds, and the Mill results show that it works even on pretty large builds like Mill's own
build.sc
codebase.Debugging
The easiest way to understand what is going on with the callgraph analysis is to run Mill with the
--debug
flag. This generates extra output JSON files inout/mill-build/methodCodeHashSignatures.dest/
which give you an insight into what is going on inside the callgraph analyzer: thelocalSummary.json
,externalSummary.json
,resolvedMethodCalls.json
, etc. These are opt-in with--debug
due to the added overhead it takes to generate them (0.3-0.5s for the Mill codebase, would be larger for larger builds)On item of interest is the
spanningInvalidationForest.json
that is generated the second time you run Mill with--debug
enabled. This file contains a tree of nested JSON dictionaries containing every method definition or call that was invalidated, with the roots of the tree being the invalidation "roots" (i.e. methods which were invalidated without any parents being invalidated), and the tree structure indicating an arbitrary path from an invalidation root to each specific invalidated method. This is very useful for answering the question "why was this method/target/etc. invalidated in response to code changes"Performance
Performance-wise, ad-hoc benchmarks on
com-lihaoyi/mill
's own build show a ~5% increase inbuild.sc
compilation times due to this. Not nothing, but probably acceptable: the cost is only paid when thebuild.sc
is re-compiled, and it will likely end up saving much more time in tasks that we can avoid running (e.g. a single no-op Zinc incremental compile may be 100s of milliseconds)It's taken some amount of optimizations to reach this point. There are definitely further optimizations that can be done, e.g. replacing the various
Map
s we pass around withArray
or parallelizing parts of the analysis (It's mostly pure functional code and should be easy to parallelize)What Users can do to improve incrementality
Module
val
s should generally be avoided in favor ofdef
s orlazy val
s.val
s are all bundled together in theModule
constructor, so any change to anyval
invalidates any downstream code who depends on anything in thatModule
Abstract members of
Module
s should beTarget
s -T[V]
s - whenever possible, rather than plainV
s. We make a stronger assumption thatT[V]
methods are pure that we cannot assume in general for any method returning an un-wrappedV
, and can rely fully on runtime build-graph analysis which is a lot more precise than the static class-hierarchy-analysis we do no non-Target
abstract methods.BuildInfo#buildInfoMembers
invalidates the code in only that target, because we assume thatTarget
method dependencies and invalidation will be handled by the runtime callgraphBuildInfo#buildInfoPackageName
invalidates the code in almost everything on everyJavaModule extends BuildInfo
because we cannot guarantee that the<init>
method of thoseModule
's does not depend onbuildInfoMembers
.These restrictions are fundamental given the simple reachability/class-hierarchy analysis that we do in this PR, and lifting them would involve a more complicated dataflow analysis or purity analysis that would be much less simple to implement