Skip to content

Osx + pi broken on ci-release #1484

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
MylesBorins opened this issue Sep 3, 2018 · 24 comments
Closed

Osx + pi broken on ci-release #1484

MylesBorins opened this issue Sep 3, 2018 · 24 comments

Comments

@MylesBorins
Copy link
Contributor

We have a release planned for tomorrow, please halp

@refack refack added the incident label Sep 3, 2018
@refack
Copy link
Contributor

refack commented Sep 3, 2018

From the logs this seems to be related to #1479
requireio workers have trouble connecting to GitHub (to git pull) and our CI and WWW servers (to push binaries)

@MylesBorins
Copy link
Contributor Author

this is blocking tomorrows release

/cc @nodejs/build

@rvagg
Copy link
Member

rvagg commented Sep 4, 2018

🤦‍♂️ I'll try and get it sorted but I'm banging my head here a bit trying to figure it out

@rvagg
Copy link
Member

rvagg commented Sep 4, 2018

So I'm fairly certain now that this is my ISP's fault, but also that there's not much I can do about it in the short-term cause of how bad ISPs are at dealing with subtle technical issues that don't involve "turn it off and on again". They've tried to blame it on "work in my area" which I know is garbage. I'll keep working on them and if all else fails I may have to find me a new ISP.

The interesting thing about the release host is that ssh to the web host works occasionally, like a dice roll where the host is uncontactable some of the time but works just find others. I can run ssh node-www over and over on that host and it'll occasionally work. I'm having similar problems with other key hosts, including github, which explains some of the flakyness we're having with arm-fanned (not all).

@rvagg
Copy link
Member

rvagg commented Sep 4, 2018

I've tried a whole bunch of things like setting up jump/proxy hosts to ssh through, using ssh on a different port, but everything is flaky to some degree and doesn't provide 100% reliability. Something's seriously messed up with my connection.

I've unticked "Archive artifacts only if build is successful" in iojs+release so I think this means we'll have .pkg and .tar.?z files in Jenkins regardless of whether they fail to upload to staging. If that works then I could manually upload to staging if they fail. So @MylesBorins, ping me when you have builds if they fail and need manual insertion. You should be able to look into the failed build in Jenkins to see an artifact or two.

Other than that, my only other suggestion is that we could switch to the new macOS release builder for these old branches. There's a small amount of concern about whether we'll have consistency in binaries but I don't think it'll be a problem with macOS. Pi's are a different matter unfortunately and I'm not sure our cross-compiler is going to produce the same level of compatibility in binaries.

@MylesBorins
Copy link
Contributor Author

I'm a bit confused how your isp is affecting the osx release... Are you hosting a Mac mini?

I can manually get this release working but would like to see a long term solution to this. Any idea on a timeline?

@MylesBorins
Copy link
Contributor Author

Not seeing any of the assets in the latest ci-release build. The 8.12.0 release cannot move forward until this is resolved.

@gdams
Copy link
Member

gdams commented Sep 4, 2018

@MylesBorins I was under the impression that we were using the macstadium 10.11 machine for Node 8 releases? Without having access I can't check.

@MylesBorins
Copy link
Contributor Author

@gdams we have historically used 10.10, I personally would prefer us not changing things mid release

@gdams
Copy link
Member

gdams commented Sep 4, 2018

I thought that was agreed in #1391? Perhaps I misread it. Either way if we need 10.10 we should just provision a new machine rather than replying on @rvagg to host it himself.

@MylesBorins
Copy link
Contributor Author

I'm ok with however we move forward, I would very much like to see this release go out today and not be dealyed by infrastructure... what are our options?

@gdams
Copy link
Member

gdams commented Sep 4, 2018

I can possibly provision a new 10.10 machine at macstadium and run the playbooks? We'll need to manually add the additional release stuff to the machine though?

@MylesBorins
Copy link
Contributor Author

Looks like another potential issue with the OSX machines... something about timestamps

https://ci-release.nodejs.org/job/iojs+release/3727/nodes=osx1010-release-tar/console

SIGN="Developer ID Application: Node.js Foundation" PKGDIR="node-v8.12.0-darwin-x64" bash tools/osx-codesign.sh
+ set -e
+ '[' 'XDeveloper ID Application: Node.js Foundation' == X ']'
+ codesign -s 'Developer ID Application: Node.js Foundation' node-v8.12.0-darwin-x64/bin/node
node-v8.12.0-darwin-x64/bin/node: A timestamp was expected but was not found.
make: *** [node-v8.12.0-darwin-x64.tar] Error 1
Build step 'Conditional steps (multiple)' marked build as failure
Skipped archiving because build is not successful
Sending e-mails to: michael_dawson@ca.ibm.com
Notifying upstream projects of job completion

@KoenLav
Copy link

KoenLav commented Sep 4, 2018

My comment in the other repo was not constructive in nature, apologies for that.

However I do think that the way Apple "facilitates" developers (locking everything down) is a root cause of (these kinds of) problems and this concretely holds back (rapid) development in many teams.

I would love to see some improvements on this, hence I pointed it out (in the wrong place).

ps deleted the comment :)

=====

On a more constructive note, trying to troubleshoot the issue:

The issue with the timestamps not being available appears to be a known issue which could be related to the network condition of the build machines.

Someone suggests using: --timestamp=none in the codesign command to sign without timestamps (not sure how that impacts anything else).

Examples:
https://forums.developer.apple.com/thread/8187
https://bugs.eclipse.org/bugs/show_bug.cgi?id=445050
sindresorhus/create-dmg#25

@MylesBorins
Copy link
Contributor Author

I've unticked "Archive artifacts only if build is successful"

This didn't seem to stick, I've flipped the bit again and I'm trying to run the build one more time... if this works and I can manually upload the artifacts to dist we are unblocked for today

@MylesBorins
Copy link
Contributor Author

MylesBorins commented Sep 4, 2018

Assets are now appearing, but we have a new edge case on the pkg installer build job

 > git fetch --tags --progress https://github.com/nodejs/node.git +refs/heads/*:refs/remotes/origin/* # timeout=30
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from https://github.com/nodejs/node.git
	at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
	at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
	at hudson.scm.SCM.checkout(SCM.java:504)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
	at hudson.model.Run.execute(Run.java:1798)
	at hudson.matrix.MatrixRun.run(MatrixRun.java:153)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:429)

https://ci-release.nodejs.org/job/iojs+release/3729/nodes=osx1010-release-pkg/console

edit:

The next run worked but ran into the timestamp issue...

@MylesBorins
Copy link
Contributor Author

MylesBorins commented Sep 4, 2018

seems like it is require-io 1 that is failing to connect with github

https://ci-release.nodejs.org/job/iojs+release/3730/nodes=osx1010-release-tar/console

edit:

digging a bit more it seems like this worker has not had a successful build in over a week. Do we have anything in place to notify the build team when something like this happens?

https://ci-release.nodejs.org/computer/release-requireio-osx1010-x64-1/builds

@rvagg
Copy link
Member

rvagg commented Sep 5, 2018

Do we have anything in place to notify the build team when something like this happens?

no, and we'd get a lot of noise because these v8-canary jobs fail so often

@MylesBorins
Copy link
Contributor Author

no, and we'd get a lot of noise because these v8-canary jobs fail so often

It seems like we could monitor just particular release streams going red. Lots of noise from v8-canary doesn't seem like a good reason to not have a 6.x and 8.x canary so we aren't chasing heisenbugs the day of a release

@MylesBorins
Copy link
Contributor Author

While we are able to make a collection of assets that I could manually upload I don't have time today to review all the assets and make sure they work. There were also some oddities that showed up in CITGM that I have not been able to reproduce outside of our infra. Due to this I've opted to delay the release a week.

@nodejs/build do you think we will be able to get the ci release infrastructure stable by next Tuesday?

@rvagg
Copy link
Member

rvagg commented Sep 5, 2018

@nodejs/build do you think we will be able to get the ci release infrastructure stable by next Tuesday?

Yes, I think we can make that commitment. If I can't get things solved on my end we'll come up with work-arounds of some kind.

@mhdawson
Copy link
Member

mhdawson commented Sep 5, 2018

Work on alternate osx release machine: #1486

@rvagg
Copy link
Member

rvagg commented Sep 6, 2018

I think I've managed to find someone at my ISP that has a clue, or at least I have a ticket escalated to the people that might have a clue. Currently evidence suggests that things have improved on my end even though I haven't heard back from them yet, so crossing fingers that it's sorted. I've re-submitted the original build job to try it out: https://ci-release.nodejs.org/job/iojs+release/3734/

@MylesBorins
Copy link
Contributor Author

This appears to now be fixed 🎉

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

6 participants