Spurious appveyor 32-bit test timeouts #46903

arielb1 · 2017-12-21T11:30:42Z

The appveyor 32-bit MinGW test builders on appveyor are sometimes slower than expected and time out, which causes some of its builders to exceed the 3 hour limit (this had also happened I think in the start of December, if someone can bother digging up these PRs).

It appears that a "good" build (e.g. https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5766) takes 150 minutes, while a "bad" build on the same code can exceed the 3 hour (180 minutes) limit.

It appears that in some cases (e.g. https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5551) other builders also get close to the limit, but I haven't seen any of the hitting it yet. The reason appears to be that the 32-bit test builders (both MSVC and GNU) are the slowest, taking the "full" 150 minutes even on a good day.

I'm not that sure what the best solution is - eventually we could play with checkpoint/restart, but I would not want to do that on Windows first.

Maybe it's possible to investigate the cause of the slowness, or to bump the time limit, or to split the pc-windows-gnu builders (the latter would also speed up the cycle time).

However, the Windows 32-bit test builders being the slowest of our entire group seems to be a good cause to split them (this also makes some sense, because they spawn a lot of processes, which is slow on Windows).

Cases:

https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5550 (Update RLS and Rustfmt #46144) - MSYS_BITS=32 test pc-windows-gnu, 2017-11-29
https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5550 (Revert #46360, re-enable macOS dist images. #46366) - MSYS_BITS=32 test pc-windows-gnu, 2017-11-29
https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5559 (incr.comp.: Remove ability to produce incr. comp. hashes during metadata export. #46370) - MSYS_BITS=32 test pc-windows-gnu, 2017-11-30
https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5743 (Point at def span on redefined name diagnostic #46802) - MSYS_BITS=32 test pc-windows-gnu, 2017-12-18
https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5756 (nll part 5 #46733) - MSYS_BITS=32 test pc-windows-gnu, 2017-12-20
https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5765 (Issue #46589 - Kill borrows on a local variable whenever we assign ov… #46752) - MSYS_BITS=32 test pc-windows-gnu, 2017-12-21

alexcrichton · 2017-12-23T18:37:08Z

Picking two random logs good bad the major difference seems to be that the good log finishes compiling the compiler at 01:01:30, whereas the bad log finishes at 01:21:05, a 20 minute delay from the original one. AFAIK no real extra work was done in the bad log. I believe that AppVeyor doesn't guarantee a constant level of performance (shared hosting and whatnot) so I think that we just get less CPU time during peak hours (or at least that's what I think).

In that sense I think the only real solution here is to do less work per job. That may mean cutting tests from 32-bit MinGW tests or sharding the builder.

kennytm · 2018-01-06T17:44:02Z

#47154 may be a cause to the recent explosion in timeouts. The timing also match since #46278 is merged at 2018-01-01T19:04:27Z. There is a fix in #47161.

#46910 has caused about 40–50% increase in time spent on fulldeps tests. But it is not sufficient to explain the previous timeouts since that just means an additional 4 minutes at most.

kennytm · 2018-01-09T07:37:09Z

#47161 has landed but the error rate is still not decreasing 😢

alexcrichton · 2018-01-11T21:06:00Z

I've done some analysis of our historical trends to see what's going on here. This is specifically for the i686-pc-windows-msvc builder that's running tests on AppVeyor

First up we have the trend of the total build time over time:

Clearly we're on the up and up!

Next I broke it down by stage. Here I was taking a look at various stages in the build:

Here we can see for sure that various stages are getting slower, and if we look at each of them in isolation (not stacked up) we get:

which from this seems to indicate:

The run-pass test suite is getting steadily slower over time. I'm not sure if this is a slower compiler or more tests, but my guess is a slower compiler.
The bootstrap itself is getting steadily slower over time. Both stage0 and stage1 are getting slower at what appears to be roughly the same pace.
Something I haven't focused on here (the "other" blob) has added nearly a half hour to the build time over the past month ish

The raw data (not smoothed, but stacked and not stacked) is unfortunately pretty hard to decipher. I also unfortunately don't quite know where to go from here..

withoutboats · 2018-01-11T23:23:07Z

Surely the size of the code base and test suite is growing over time, I think this is the expected result unless compiler speed is improving at a greater rate than the code base is growing (which seems unlikely).

alexcrichton · 2018-01-12T17:23:03Z

@withoutboats I agre yeah but there's been a severe uptick over the past ~200 builds which means our build time is increasing way faster than it was before, which seems worrisome..

Aaron1011 · 2018-02-21T21:13:32Z

This seems to be another example: https://ci.appveyor.com/project/rust-lang/rust/build/1.0.6426/job/do1stdu2mywwkyf7 MSYS_BITS=32, RUST_CONFIGURE_ARGS=--build=i686-pc-windows-gnu

@kennytm

Split MinGW tests into two builders on AppVeyor Run-pass and compile-fail tests appear to take the most significant chunk of time, so split them into their own builder. Should help with #46903. r? @kennytm cc @alexcrichton

Mark-Simulacrum · 2018-02-25T15:57:14Z

Closing as fixed. We've had multiple successful builds on AppVeyor, the 32-bit MinGW builders are both now around 2 hours.

arielb1 added A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Dec 21, 2017

kennytm added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Dec 21, 2017

pnkfelix added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 21, 2017

kennytm mentioned this issue Jan 7, 2018

Provide suggestion when trying to use method on numeric literal #47171

Merged

This was referenced Jan 8, 2018

Remove unused LLVM related code #47233

Merged

Add iterator method specialisations to Range* #47180

Merged

petrochenkov mentioned this issue Jan 13, 2018

Remove impl Foo for .. {} in favor auto trait Foo {} #47416

Merged

This was referenced Jan 16, 2018

Implement repr(transparent) #47158

Merged

Rollup of 6 pull requests #47492

Closed

alexcrichton mentioned this issue Jan 17, 2018

Remove dep-info files as targets in themselves #47035

Merged

petrochenkov mentioned this issue Jan 17, 2018

Update rustfmt to 0.3.6 #47454

Merged

This was referenced Jan 18, 2018

[beta] Update cargo on beta. #47431

Closed

rustc: Lower link args to @-files on Windows more #47507

Merged

kennytm mentioned this issue Jan 23, 2018

Rollup of 14 pull requests #47678

Merged

This was referenced Feb 18, 2018

Remove "static item recursion checking" in favor of relying on cycle checks in the query engine #47987

Merged

[beta] Backport #48252 #48340

Merged

Update nightly to 1.26.0 and bootstrap from beta. #48343

Merged

Update RLS #48349

Merged

Aaron1011 mentioned this issue Feb 21, 2018

Consider using sccache to cache Rust code on CI builds #48412

Closed

This was referenced Feb 22, 2018

rustc_mir: handle all aggregate kinds in, and always run, the deaggregator. #48052

Merged

rustdoc: Foldable impl blocks #47894

Merged

Mark-Simulacrum mentioned this issue Feb 23, 2018

Split MinGW tests into two builders on AppVeyor #48487

Merged

Mark-Simulacrum closed this as completed Feb 25, 2018

This was referenced Feb 26, 2018

[beta] temporarily disable rust-lang/rust#46833 due to rust-lang/rust#48251 #48379

Merged

[beta] Backport "Fix rustdoc test ICE" #48454

Merged

petrochenkov mentioned this issue Feb 26, 2018

Rustc explain #48337

Merged

This was referenced Feb 27, 2018

[stable] 1.24.1 stable release #48445

Merged

Rollup of 13 pull requests #48577

Closed

Provide context for missing comma in match arm and if statement without block #48338

Merged

Backport LLVM fixes for a JumpThreading / assume intrinsic bug #48583

Merged

mati865 mentioned this issue Mar 7, 2018

Replace all const evaluation with miri #46882

Merged

This was referenced Mar 8, 2018

[beta] backport #48181 and #48362 #48793

Merged

Clarify interfaction between File::set_len and file cursor #48480

Merged

pietroalbini mentioned this issue Mar 12, 2018

[beta] rustbuild: pass datadir to rust-installer #48930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spurious appveyor 32-bit test timeouts #46903

Spurious appveyor 32-bit test timeouts #46903

arielb1 commented Dec 21, 2017

alexcrichton commented Dec 23, 2017

kennytm commented Jan 6, 2018 •

edited

Loading

kennytm commented Jan 9, 2018

alexcrichton commented Jan 11, 2018

withoutboats commented Jan 11, 2018

alexcrichton commented Jan 12, 2018

Aaron1011 commented Feb 21, 2018

Mark-Simulacrum commented Feb 25, 2018

Spurious appveyor 32-bit test timeouts #46903

Spurious appveyor 32-bit test timeouts #46903

Comments

arielb1 commented Dec 21, 2017

alexcrichton commented Dec 23, 2017

kennytm commented Jan 6, 2018 • edited Loading

kennytm commented Jan 9, 2018

alexcrichton commented Jan 11, 2018

withoutboats commented Jan 11, 2018

alexcrichton commented Jan 12, 2018

Aaron1011 commented Feb 21, 2018

Mark-Simulacrum commented Feb 25, 2018

kennytm commented Jan 6, 2018 •

edited

Loading