Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

remove lazy analysis from zig #2529

Closed
tse-gratis opened this issue May 20, 2019 · 8 comments
Closed

remove lazy analysis from zig #2529

tse-gratis opened this issue May 20, 2019 · 8 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@tse-gratis
Copy link

I suggest that these should be compile time errors. A real usecase is for importing a file such as std/c/dragonfly.zig, and not having a test case for each define

I stopped using scheme in the past because an accidental typo landed in a rarely tested code-path, and it took weeks before it was discovered. On the other hand I guess zig compile time is faster by skipping dead code, but shouldn't a sensible program have little dead code

valid.zig:

const happy = @import("idontexist");
const other = @import("import.zig");
const time = @import("std").os.time;

pub fn main() void {
    var hi = "hi";
    if (1 == 0) { hi = 5; }
    @import("std").debug.warn(other.c);
    var bye = time.timestamp() / 0;
}

pub fn temp() void {
    var x = 12 + banana;
    i_am_a_banana;"sf";
}

import.zig:

pub const a = anything_you_like;
pub const b = 12 / 0;
pub const c = "welcome\n";
@SamTebbs33
Copy link
Contributor

What are the issues in the code that you think should produce compile-time errors?

@tse-gratis
Copy link
Author

tse-gratis commented May 22, 2019

Something like:

valid.zig:1: file does not exist
valid.zig:7: expected [2]u8
valid.zig:13: undeclared identifier
valid.zig:14: undeclared identifier
valid.zig:14: expression value ignored
import.zig:1: undeclared identifier
import.zig:2: divide by zero

I'm not including the divide by zero on line 10. All of the others can I believe be gained by switching on comptime evaluation for dead code (outside of tests)

I realize these errors don't reduce bugs in the binary. But they certainly reduce bugs in the code! Which can only be a good thing

As I understand it, currently code can be either 1) maximally checked 2) checked when zig test is run 3) maximally unchecked

I propose that option number 3 be removed

Assuming I've understood things correctly, imagine importing any external library into your code. Currently the guarantee is just (3) that identifiers are syntactically correct

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label May 28, 2019
@andrewrk andrewrk added this to the 0.5.0 milestone May 28, 2019
@andrewrk andrewrk changed the title should give compile time error remove lazy analysis from zig May 28, 2019
@andrewrk
Copy link
Member

Lazy analysis is one of the core design premises of zig; it's going to be very difficult to change at this point. Lots of things rely on it. I recognize there is a problem that it can be difficult to refactor code that is not covered by tests. I do want to handle this use case better, and I have some ideas. I'll leave this open to track the existence of this use case / unsolved problem scenario, until it's decided to do something about it or not.

@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 May 28, 2019
@tse-gratis
Copy link
Author

Zig is extremely well designed in my opinion, and

Lots of things rely on it [lazy analysis]

So there are probably things I'm missing, but:

Again, thanks to lazy analysis, this can allow [zig test] to narrow a build to only a few functions in isolation

zig test is actually eager, at least in user facing semantics, since it defaults to compiling all top level identifiers within the test block, and any filtering has to be explicit. Those semantics could be retained, and even become slightly more natural without the idea of laziness, and zig build would work equivalently

Zig has lazy top level declaration analysis, which means that if a function is not called, or otherwise used, it is not analyzed. This means that there may be an undiscovered compile error in a function because it is never called

I wonder if compiler caching could be done not at the file level, but at the top declaration level -- with the cache keys being either the strings input into the parser, or a compacted output from the parser. This could be interesting to consider

Apart from the first compilation, this is in one sense lazier than the current design, since only changed declarations would get recompiled, and yet it would also mean zig validates 1:1 what the programmer sees, rather than a subset

I'm sure that has technical challenges though. I've thought of a potential way to solve link-time slow down. Other issues might be employing lazy analysis for evaluating infinite loops and other things. I don't know. But I wonder:

Being simplistic, lazy analysis is really, I assume, a time/correctness tradeoff. Saving time, for the sake of correctness. What are the actual % savings by being lazy in Debug mode? I imagine the major cost is (for the first compile) in llvm code-gen for each import, which would NOT appreciate eager analysis

Can we fix that by being lazy just for out-of-tree imports. Or share caches across projects, by placing caches relative to the imported files -- or have a common cache with the std library???

So, many ideas, but my least favorite of which is zig telling me code is correct, when it isn't...

@tse-gratis
Copy link
Author

Sorry, one more comment:

Lazy analysis is generally not an optimization

Yes, for ReleaseFast/ReleaseSmall I can imagine people downloading green code, and simply compiling it once. In that case lazy analysis makes sense, and becomes some kind of comptime equivalent of undefined behavior

But imagine a programmer in Debug mode using N functions from a library. They will add one function; compile/test; repeat. With eager analysis those functions will be parsed/compiled once. With lazy analysis I believe they will be parsed every time, and compiled factorial(N). Whatever the precise details, in this common case eager analysis is a massive saving.....

-- even if the net saving is unclear, the programmer is paying more upfront, and so skips payments during the compile/test cycle. Also lazy analysis increases the risk of the Nth function not working

To quote:

  • Compile errors are better than runtime crashes
  • Compile errors today are better than errors in the future

Sorry if I'm pressing the point too much, I think this is my last comment ;)

@andrewrk
Copy link
Member

I believe this is addressed by #3028.

@tse-gratis
Copy link
Author

Hiya thanks for the #3028 proposal. This comment should maybe go there, but I don't want to presume, or stand in your way

This gives the compiler the ability to assume that any branches not taken, are actually dead even across all the desired set of builds.

Mmm. Yes only if every branch not taken has a target/test. So I think mostly No... For two reasons:

  1. The last C functionality I wrote had to be built up in a series of steps; both within functions, then linking them together, and then lockless threading to call them from the final executable

If I had started with the lockless call from the executable, I'd of just gained an executable that crashes or doesn't compile -- and also that's the easy step. So instead I first built up the functionality using C compiler errors to check and guide what I was doing

While I see good points in your proposal, it would not have helped me implement that functionality

  1. The errors I most want to catch is in code that's wrong. Truly dead code is more likely to be wrong. Which (I assume in zig) can be caused by elided maths skipping parts of an equation; a switch case not encountered; etc, etc. Basically mistakes are easy to make and hard to spot

So while proposal #3028 reduces the amount of dead code left unchecked, it certainly does not remove all of it; and just by looking at the code, it can be very hard to know what code the compiler considers dead

-- major bugs have entered software through optimization passes doing elision for instance


I don't know if you will find those thoughts helpful or not. But assuming they contain some validity, what might I do about them?

For each target/build options, the user can choose whether this is just for analysis purposes or whether it should actually produce output.

I've got no experience of running multibuilds. So no real input to give here: Whether it makes the ui better, or if were just simpler to call the compiler once for each build?

That's a separate question, but something that may retain a little of the multibuild concept, but also resolve issue #2595 (at least for me) would be a single cli option to follow every path and check everything

Of course I would advocate and assume that programmers would want that to be the default... But anyway

I hope this feedback is at least a little useful, thanks

@tse-gratis
Copy link
Author

-- Sorry, I misrepresented that first quote:

This gives the compiler the ability to assume that any branches not taken, are actually dead even across all the desired set of builds.

Yes the branches are 100% dead. This is true

I guess I'm trying to say this is just in the current binary, but it is not dead to the programmer, nor is it dead to future binaries as the programmer updates the code

-- The code I am updating generally has the most dead branches, and is also the place I most need typechecking

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

3 participants