-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Enable straight-to-jar compilation #597
Conversation
A validation involving this pull request is in progress... |
The validator has checked the following projects, tested using dbuild, projects built on top of each other.
❌ The result is: FAILED |
A validation involving this pull request is in progress... |
The validator has checked the following projects, tested using dbuild, projects built on top of each other.
✅ The result is: SUCCESS |
@lukaszwawrzyk Thanks for the contribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
import java.nio.file.Paths | ||
import java.util.zip.ZipFile | ||
|
||
class STJ(outputDirs: Iterable[File]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add Scaladoc explaining what STJ stand for, and what this class does before we forget?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this class be marked final?
type JaredClass = String | ||
type RelClass = String | ||
|
||
def init(jar: File, cls: RelClass): JaredClass = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add Scaladoc here, and perhaps consider renaming this method?
I normally use "init" to mean initialize STJ class, but it doesn't look like you're doing that.
Maybe a better name would be "jaredClassString"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that name is unfortunate. It was the first method I created when I started working on it and it kinda stayed like that for the whole time.
I think ideally for sake of readability I could have a value class JaredClass
wrapping the string with method that extract stuff and companion object with factories. Though this is actually mostly kept as File I suspect there would be plenty of conversions. I will try to experiment with that and see if I can improve readability with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for scaladoc I will add it in places you suggested. But aside from that is there a general rule to put it e.g. on every public method I added or something like that, or just for less obvious code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to generally document public methods for Zinc project since many people are involved.
@@ -240,6 +240,22 @@ class IncrementalCompilerImpl extends IncrementalCompiler { | |||
case Some(previous) => previous | |||
case None => Analysis.empty | |||
} | |||
|
|||
val compileStraightToJar = STJ.isEnabled(output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any unit tests for straight to jar behaviors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no unit tests, I was relying on scripted tests. I will take a look on what should potentially be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripted tests are fine too.
Thanks for the contribution. I'll have a look at this at some point in the week. Two first thoughts:
|
|
Yes, they are in theory compatible. What I don't understand is why we should accept code that comes with another license, and what the implications for the future of the project are if we do so. We've already accepted several PRs with different licenses before and I'm not happy with it. I'd like to know what @eed3si9n thinks of this.
I'd like to see more tests but not necessarily unit tests. I'll go through the code at some point and suggest places where we could test better. This is a big PR and we'll need to maintain it if in the future something changes, so I'm just trying to be more conservative than I'm usually. That being said, good job 👍 |
As per license, what I care is if the Lightbend CLA (https://www.lightbend.com/contribute/cla) has been agreed or not. |
I refactored |
I've had the first pass through the PR but, before discussing concrete technical points, I feel it's important that we discuss this whole approach from a global perspective. You're doing lots of things here and it's quite a scary PR to merge, especially because you touch lots of things that can have unexpected side effects.
|
I will address what I can quickly and go back to this after the weekend.
|
Could you look into a way we can reuse JDK's
Great, I'll have a look through them on Monday. Regarding my first point, I would be much more comfortable with all this logic if it's removed completely from the bridge and added in a new interface
I would propose another way to achieve this then: let's have a stamp index that is populated after every incremental compiler cycle (we don't need to read the timestamps from the JAR because we know no other process can write to it, let's just have a stateless stamper that takes a pre-computed list of class file names and jars). This approach would need to work for both Scala and Java generated classes. It would take more book-keeping but I think it's a cleaner approach and we would avoid the performance problem of opening the jar, reading the timestamp from the central zip index (I think it's called central directory?) and then closing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for posting this. I'm strongly in favor of including support for jar'd outputs in zinc.
I had previously attempted this, so some of the approach here looks familiar. I think that if this lands, it might be good to follow up immediately to make the larger (but more-mechanical/less-errorprone) change to move from File
to an enum of the two cases.
* This is enough as stamps are only read from the output jar. | ||
*/ | ||
class CachedStampReader { | ||
private var cachedNameToTimestamp: Map[String, Long] = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not be threadsafe unless it is volatile, probably. Alternatively, should it just be a lazy val
?
Regardless: I agree with Jorge that if it's possible to refactor this such that either 1) stamps are read eagerly and are immutable, 2) the stamps API has external caching ... would be preferable.
* @param jar a jar file where the class is located in | ||
* @return the class inside a jar represented as `JaredClass` | ||
*/ | ||
def fromURL(url: URL, jar: File): JaredClass = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an odd method... it feels like if the caller has already split the URL to find the file, re-splitting it here isn't necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is odd. Take a look at sbt.internal.inc.classfile.Analyze#urlAsFile. It previously was simply IO.urlAsFile
and it was enough. When compiling to plain files, classloader would either return a plain file or url to a class in jar. IO.urlAsFile
would just ignore the class file and take the jar. But now we can get url to file in jar and we want to keep the class part as long as the jar is the output jar that we are compiling to. So I reuse IO.urlAsFile
to convert url to file properly and then, if the extracted file is our output jar I know that I need to create a JaredClass
. To do this, I extract the relative path to class and combine it with already extracted file to avoid doing it twice. I just did not want to copy code from sbt.io.IO
, especially that it handles some odd corner cases. I'd feel more comfortable that it was in one place. What do you think about this?
@@ -118,7 +119,11 @@ final class API(val global: CallbackGlobal) extends Compat with GlobalHelpers wi | |||
if (!symbol.isLocalClass) { | |||
val classFileName = s"${names.binaryName}.class" | |||
val outputDir = global.settings.outputDirs.outputDirFor(sourceFile).file | |||
val classFile = new java.io.File(outputDir, classFileName) | |||
val classFile = if (STJ.enabled) { | |||
new java.io.File(STJ.JaredClass(outputDir, classFileName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like it would significantly improve type safety to use an enum to represent the two cases. But I could also imagine that that would be an even larger refactoring than this already is... so perhaps better as a followup.
Also, it would be good to confirm that you see some performance benefit on your usecase (if you're able to share that data). I believe that at least part of the perf benefit that we've seen when we've used jar usage is due to a combination of #547 and the fact that less analysis was actually stored without the analysis fixes included here. Of course, we'd absolutely expect to see some additional IO benefit. If possible, I'll try to run this build through our perf suite in the next few days as well. |
On Windows (since this where most of incremental compilation happen) we observed speedups ranging from 10 to 20% in compilation time. |
As for licensing, I guess I know everything now. So I was looking at zipfs from openjdk8. In that version it was located here
ZipFS as it is could be used to provide all the operations we need for STJ, though as I mentioned not as efficient. e.g. merging involves at a minimum creating temp file, copying there the content of both files, appending new index and moving temp file onto the original one. I think it won't reapply compression but I am not 100% sure. Removing files from jar will work the same way, it will create a copy without given entries. |
I agree with your sentiment of tweaking performance, but are we sure that we're not trading performance by correctness? On the one hand, I'm skeptical about including zip file system code into this repository (without any unit tests) that was meant to be a demo. On the other hand, we cannot include the zip file system implementation in JDK9 because that would be an outright license violation. I think it would be good we look into reusing the implementation that Scalac uses for writing jars (which uses under the hood the good old Aside from this observation, I see two other possibilities to move forward:
I'm not sure what the benchmark results for the use of this custom zip file system look like, so some numbers could perhaps help me make up my mind and convince me adding this code to the repo is a risk worth taking. When we're talking about more efficient, how much are we talking about? Only in Windows? Or also in Linux? |
Fundamentally I think I didn't add like conceptually new logic to bridge that was not there (except for obvious handling jars instead of files). For example the When I look at it now I could e.g. change API of
Which logic exactly you would like to move? How do you see the interface of JarManager? Should the bridge code still have code like
I am afraid that I won't have enough time to do that.
Looking through the usages of |
Well, I am never 100% sure if what I write is correct, even with tests I could miss something, but after running multiple builds and tests compiled with modified zinc I am pretty confident that zinc part of it is correct. Also the scripted tests actually test it indirectly, in my opinion rather extensively.
I feel like a link for the patch is missing? Anyway if you look at
As for
I was testing it on both linux and windows. The difference on windows was AFAIR ususally bigger than on Linux. I run for you the usual case of merging jars i.e. merging a fairly small (9 classes) jar to a big one (scala library).
Here are results of deleting like 20 classes from scala-library jar (also linux):
|
For a cleaner(?) approach for collecting stamps I can simply add say With such approach it wouldnt need to be lazy. Now it has to be as the stamper is created before the compilation, but it can only collect valid data after the compilation (on first call). What do you think about this idea? What should ideally the api be? |
The same benchmark for merging jars on windows:
Deleting:
Note that absolute numbers are notably bigger. |
I altered the stamping part with what I mentioned. Stamps from jars are read explicitly in places they are needed without altering the existing stamper apis. |
Looks like a spurious error, I restarted the CI: https://ci.scala-lang.org/sbt/zinc/131 |
Let's see if waiting a little bit more makes the CI pass. I don't see how your code could make dependency resolution of the bridges fail. Locally, coursier fetch works. |
@jvican The build passed today (without changes), I also used matrix to run scripted tests with --to-jar. |
@jvican : I see that this went to the |
@jvican I am also interested in this being published :) :) :) |
I am planning to get sbt and Zinc 1.3.x RC-1 within a month or so. |
I would love a M1 release with what we have now, including the straight-to-jar compilation, if that's at all possible. |
1.3.0-M1 is out - https://repo1.maven.org/maven2/org/scala-sbt/zinc_2.12/1.3.0-M1/ |
Thanks eugene, will take a look |
I just tried this in our build. Unfortunately, I'm seeing a significant increase in downstream compilations (touch a file in an upstream module, incrementally re-compile downstream module) when I enable this compile-to-jar functionality. Using 1.3.0-M1 without compile-to-jar is fine (no regression) The Zinc integration in question is here https://github.com/lihaoyi/mill/blob/master/scalalib/worker/src/ZincWorkerImpl.scala#L256-L338, in case anyone wants to review it and tell me if i'm doing anything wrong @jvican @lukaszwawrzyk |
@lihaoyi : A thing to watch out for would be your downstream modules being treated as monolithic jars. Having integrated this into pants, and dumping the analysis in text mode, I see downstream deps for this example being stamped as:
...which indicates that rather than stamping the classfiles in the jar, it's currently stamping the whole jar. |
I wonder if it's the way I'm looking up analysis files based on val analysisMap0: Map[os.Path, os.Path] = upstreamCompileOutput.map(_.swap).toMap
def analysisMap(f: File): Optional[CompileAnalysis] = {
analysisMap0.get(os.Path(f)) match{
case Some(zincPath) => FileAnalysisStore.binary(zincPath.toIO).get().map[CompileAnalysis](_.getAnalysis)
case None => Optional.empty[CompileAnalysis]
}
} Do I need to write some logic to map between classfile-in-jars and actual filesystem paths? |
@lihaoyi: I doubt that this is an issue with your particular integration; rather, some more work that needs to be done following up on this patch. |
I'd gladly take any suggestions on how to improve performance, or what seems to impact performance the most. While testing overall performance I was focusing on windows and we are very happy with the performance on windows. The compilation itself is slightly faster. The whole workflow that this feature enabled is much faster. |
@lukaszwawrzyk just to be clear, we're not talking about a performance issue, but a feature regression: multi-module incremental compilation simply does not work with compile-to-jar enabled |
Ah, sorry. I misunderstood the issue. I don't see anything obvious in the implementation provided. Relevant timestamps are in the jar. The stamps of jar should not matter. I don't remember doing anything special in integrations we had wrt to analysis and things like that. It would be best to just debug the issue. |
More info here. With jars, I see module dependencies that were previously treated as |
Could someone create a GitHub issue summarizing the regression/feature interactions involved in compile-to-JAR please? It sounds like the feature is not ready to be used without some more tweaking. |
This PR represents my work on implementing straight to jar compilation in zinc (#305).
To enable the feature, one should specify output as .jar file. This will cause scalac to write files to jar directly. This worked before the PR, but obviously some stuff had to be adjusted for the actual incremental compilation to work.
Most important things that had to be handled
/develop/zinc/target/output.jar!sbt/internal/inc/Compile.class
Most of the details related to the feature are in
sbt.internal.inc.STJ
object and something similar inxsbt.STJ
as it is difficult to share code between those.Optimizations
ZipFileSystem was not good enough as it was rewriting jars. What I done is I took that code and kept only the code that allows to manipulate the index. Reading and writing the index to file from ZipFileSystem was most efficient implementation I could find. On top of that I implemented the required operations in performant manner.
Other changes
API of
ReadStamps
was altered - I addedreset()
method to easly allow reseting cache of product timestamps between compilations. This is because I made the stamper stateful to avoid reopening jar for each product (that would kill the performance).Scripted tests were updated to run for both STJ and regular compilation. Running scripted with STJ requires changing a hardcoded flag in
IncHandler
. I was just using it for development. I am open for discussion on how to do it properly and whether we should e.g. run all scripted tests twice for each build and how to implement that.Slight addition was to allow disabling compression with a flag when exporting analysis (performance).
Another small addition is to ignore change of
-d
option for javac as it is overriden anyway.This PR also contains multiple changes related to Windows file system. Mostly about closing things properly to avoid locks (mostly jars). This includes fixing scripted tests on Windows (large percentage of them were failing because of inability to clear the temp dir between tests). There are still a couple of them that are flaky, but I didn't investigate it further.