Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Spurious appveyor 32-bit test timeouts #46903

Closed
arielb1 opened this issue Dec 21, 2017 · 9 comments
Closed

Spurious appveyor 32-bit test timeouts #46903

arielb1 opened this issue Dec 21, 2017 · 9 comments
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-slow Issue: Problems and improvements with respect to performance of generated code. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@arielb1
Copy link
Contributor

arielb1 commented Dec 21, 2017

The appveyor 32-bit MinGW test builders on appveyor are sometimes slower than expected and time out, which causes some of its builders to exceed the 3 hour limit (this had also happened I think in the start of December, if someone can bother digging up these PRs).

It appears that a "good" build (e.g. https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5766) takes 150 minutes, while a "bad" build on the same code can exceed the 3 hour (180 minutes) limit.

It appears that in some cases (e.g. https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5551) other builders also get close to the limit, but I haven't seen any of the hitting it yet. The reason appears to be that the 32-bit test builders (both MSVC and GNU) are the slowest, taking the "full" 150 minutes even on a good day.

I'm not that sure what the best solution is - eventually we could play with checkpoint/restart, but I would not want to do that on Windows first.

Maybe it's possible to investigate the cause of the slowness, or to bump the time limit, or to split the pc-windows-gnu builders (the latter would also speed up the cycle time).

However, the Windows 32-bit test builders being the slowest of our entire group seems to be a good cause to split them (this also makes some sense, because they spawn a lot of processes, which is slow on Windows).

Cases:

@arielb1 arielb1 added A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Dec 21, 2017
@kennytm kennytm added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Dec 21, 2017
@pnkfelix pnkfelix added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 21, 2017
@alexcrichton
Copy link
Member

Picking two random logs good bad the major difference seems to be that the good log finishes compiling the compiler at 01:01:30, whereas the bad log finishes at 01:21:05, a 20 minute delay from the original one. AFAIK no real extra work was done in the bad log. I believe that AppVeyor doesn't guarantee a constant level of performance (shared hosting and whatnot) so I think that we just get less CPU time during peak hours (or at least that's what I think).

In that sense I think the only real solution here is to do less work per job. That may mean cutting tests from 32-bit MinGW tests or sharding the builder.

@kennytm
Copy link
Member

kennytm commented Jan 6, 2018

#47154 may be a cause to the recent explosion in timeouts. The timing also match since #46278 is merged at 2018-01-01T19:04:27Z. There is a fix in #47161.

#46910 has caused about 40–50% increase in time spent on fulldeps tests. But it is not sufficient to explain the previous timeouts since that just means an additional 4 minutes at most.

@kennytm
Copy link
Member

kennytm commented Jan 9, 2018

#47161 has landed but the error rate is still not decreasing 😢

@alexcrichton
Copy link
Member

I've done some analysis of our historical trends to see what's going on here. This is specifically for the i686-pc-windows-msvc builder that's running tests on AppVeyor

First up we have the trend of the total build time over time:

https://i.imgur.com/ePWXNQX.png

Clearly we're on the up and up!

Next I broke it down by stage. Here I was taking a look at various stages in the build:

https://i.imgur.com/akskz7I.png

Here we can see for sure that various stages are getting slower, and if we look at each of them in isolation (not stacked up) we get:

https://i.imgur.com/1pmu9m8.png

which from this seems to indicate:

  • The run-pass test suite is getting steadily slower over time. I'm not sure if this is a slower compiler or more tests, but my guess is a slower compiler.
  • The bootstrap itself is getting steadily slower over time. Both stage0 and stage1 are getting slower at what appears to be roughly the same pace.
  • Something I haven't focused on here (the "other" blob) has added nearly a half hour to the build time over the past month ish

The raw data (not smoothed, but stacked and not stacked) is unfortunately pretty hard to decipher. I also unfortunately don't quite know where to go from here..

@withoutboats
Copy link
Contributor

Surely the size of the code base and test suite is growing over time, I think this is the expected result unless compiler speed is improving at a greater rate than the code base is growing (which seems unlikely).

@alexcrichton
Copy link
Member

@withoutboats I agre yeah but there's been a severe uptick over the past ~200 builds which means our build time is increasing way faster than it was before, which seems worrisome..

@Aaron1011
Copy link
Member

This seems to be another example: https://ci.appveyor.com/project/rust-lang/rust/build/1.0.6426/job/do1stdu2mywwkyf7 MSYS_BITS=32, RUST_CONFIGURE_ARGS=--build=i686-pc-windows-gnu

bors added a commit that referenced this issue Feb 24, 2018
Split MinGW tests into two builders on AppVeyor

Run-pass and compile-fail tests appear to take the most significant chunk of time, so split them into their own builder.

Should help with #46903.

r? @kennytm
cc @alexcrichton
@Mark-Simulacrum
Copy link
Member

Closing as fixed. We've had multiple successful builds on AppVeyor, the 32-bit MinGW builders are both now around 2 hours.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-slow Issue: Problems and improvements with respect to performance of generated code. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants