Skip to content

Support builds of OpenSSL from vendored source (take 2) #963

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Jul 30, 2018

Conversation

alexcrichton
Copy link
Collaborator

This is a revival of #684 to see if I can help push it across the finish line!

Closes #580

@alexcrichton
Copy link
Collaborator Author

@sfackler ok I think it's almost ready to go! For whatever reason musl fails to link on 1.20.0 but succeeds on 1.21.0. The remaining piece is two failing tests on musl. I'm not really seeing what's going on there, do you recognize these errors?

@sfackler
Copy link
Owner

The MUSL failures seem kind of strange and not perfectly deterministic: https://circleci.com/gh/sfackler/rust-openssl/9150

I'll try poking around and see what's going on.

@sfackler
Copy link
Owner

This seems pretty weird - I can repro this issue locally in a Docker container using the vendored feature, but not when manually building OpenSSL with what I think should be an equivalent build config?

CC=musl-gcc ./Configure --prefix=/openssl linux-x86_64 no-dso no-ssl3 no-unit-test no-comp no-zlib no-zlib-dynamic no-engine no-async no-shared -O2 -ffunction-sections -fdata-sections -fPIC -g -m64

Is there anything that's missing?

@alexcrichton
Copy link
Collaborator Author

Hm I assume that the test failures are something to do with system cert store things, perhaps when building locally those are picked up and/or are available, but when building in docker they can't be found?

@sfackler
Copy link
Owner

No, the errors are coming from loading specific files, nothing to do with the system root: https://github.com/sfackler/rust-openssl/blob/master/openssl/src/ssl/test.rs#L660.

I'm using the same container for the manually built and vendor-provided OpenSSLs too.

@alexcrichton
Copy link
Collaborator Author

Hm weird, your invocation does look exactly like what the build script does as well. You're sure as well that the extra support for loading an external openssl can't be involved? The paths there are quite different in openssl-sys's build script, but in theory in non-essential fashions...

@sfackler
Copy link
Owner

Not sure but I don't think it is.

@alexcrichton
Copy link
Collaborator Author

Hm so different errors each time may indicate a bug of some form like a race condition...

@sfackler
Copy link
Owner

It definitely seems like a race condition. We could maybe try using a newer rust distro? Might be a MUSL bug that's since been fixed or something?

@alexcrichton
Copy link
Collaborator Author

Hm I actually can't reproduce the test failures at all localy in the rust:1.21.0 container, cargo test --target ... is always working for me

@sfackler
Copy link
Owner

Hmm no 1.27.2 still hits it.

It seems to be really rare for me running locally in a docker - you may need to run cargo test in a loop for a while.

@alexcrichton
Copy link
Collaborator Author

I get totally different failures if I take the failing musl binary and then run it on my host system (ubuntu 16.04).

The strace logs of it failing on my system show it's trying to stat /co/target/x86_64-unknown-linux-musl/debug/build/openssl-sys-0ce663444c7e2f3c/out/openssl-build/install/ssl/certs/6a909d98.0 which is clearly wrong, and doesn't exist in the target folder anyway.

Confusion!

@sfackler
Copy link
Owner

That stat is expected - that's where the vendored install is configured to look for certs but the failure to open it shouldn't cause issues.

Part of the difference when running via cargo test vs not is that cargo leaks the openssl-probe environment variables so those tests you've ignored on OSX will only pass on Linux when run through cargo.

@sfackler
Copy link
Owner

sfackler commented Jul 28, 2018

The tests also depend on the test directory in the root of the repo if you're not bringing that along with the binary.

@alexcrichton
Copy link
Collaborator Author

Aha that definitely makes sense. I got the tests running locally on my native system with that, and sure enough with enough loops I get weird failures.

What's baffling me now as well is that I'm getting segfaults sometimes as well. I managed to catch one with a core dump and saw some interesting things in GDB

I have no idea where the SIGSEGV came from unless hlt does that, and if so it could be connected with musl's a_crash() function (from poking around the source) which sounds like internal memory corruption.

Unfortunately rr isn't able to capture a failure and valgrind immediately dies with an unrecognized instruction.

It definitely looks like race conditions and/or memory corruption going on here. If this is due to how we're compiling OpenSSL we've got problems landing this PR, but if it's otherwise just an existing bug in rust-openssl, OpenSSL, or musl I'm less certain of how to keep tracking this down...

@alexcrichton
Copy link
Collaborator Author

er, didn't mean to close

@sfackler
Copy link
Owner

hlt can only be run in ring0 so that seems pretty suspicious! I wonder if musl uses that instead of ud2 to abort?

@alexcrichton
Copy link
Collaborator Author

@sfackler perhaps the musl vendored build should be left out for now?

@sfackler
Copy link
Owner

It could but I'm a bit worried that we're giving people a broken OpenSSL at least with MUSL.

@alexcrichton
Copy link
Collaborator Author

FWIW this is basically how we're already distributing OpenSSL with Cargo, so it's in that sense not that much more broken than before? It may be that rust-openssl's test suite uses much more of OpenSSL than Cargo by default does (I wouldn't be surprised at all by this) and maybe some other part tickles musl in the wrong way?

In any case I've removed from now. I'd be down for adding a warning in the readme that musl may be buggy as well

@sfackler
Copy link
Owner

It's theoretically basically how we're already distribution OpenSSL but I can only get these test failures to happen when the OpenSSL is built via the vendored feature so there's seems to be something specifically broken there.

@alexcrichton
Copy link
Collaborator Author

Ok using the exact same build invocation I've been able to reproduce using an external build of OpenSSL. It looks like the culprit may be -static as a flag to musl-gcc. No idea. I've pushed a commit to openssl-src-rs to filter that out and a commit here to test it.

@alexcrichton
Copy link
Collaborator Author

hm @sfackler is there a different invocation to get git deps working on CircleCI?

@sfackler
Copy link
Owner

Huh, that's weird! Could it have anything to do with conflicting MUSL "runtime" versions getting pulled into OpenSSL vs the Rust side of things? I've always been a bit unclear about how that was supposed to work.

Oh yeah I can fix the git dep thing - just needs an SSH key with permissions to more than just this repo.

@sfackler
Copy link
Owner

Oh looks like that wasn't it...

@alexcrichton
Copy link
Collaborator Author

Yeah I'm guessing that maybe it tweaked the header files or something like that and maybe a different malloc was pulled in? I have no idea what -static does for object files...

@sfackler
Copy link
Owner

Running with --verbose, looks like cargo might not handle auth for submodules correctly?

error: failed to load source for a dependency on `openssl-src`

Caused by:
  Unable to update https://github.com/alexcrichton/openssl-src-rs

Caused by:
  failed to update submodule `openssl`

Caused by:
  [23/-1] authentication required but no callback set

@sfackler
Copy link
Owner

It specifically has to do with the .gitconfig circle uses:

[url "ssh://git@github.com"]
	insteadOf = https://github.com

Deleting that, the checkout works.

@alexcrichton
Copy link
Collaborator Author

Ah yeah this is all related to Cargo's "sorry I don't parse ~/.ssh/config" issue, I'll try just removing the gitconfig

@sfackler
Copy link
Owner

run: git config --global --unset 'url.ssh://git@github.heygears.com.insteadOf' is the way to do that I think.

@alexcrichton
Copy link
Collaborator Author

Looks like that did the trick for CircleCI at least! I'll publish a new version of openssl-src

This is a revival of sfackler#684 to see if I can help push it across the finish line!

Closes sfackler#580
@alexcrichton
Copy link
Collaborator Author

Alright that should do the trick!

@sfackler
Copy link
Owner

Yay! There's some documentation to write and decisions made about versions but we can follow up on an issue.

@sfackler sfackler merged commit aba2969 into sfackler:master Jul 30, 2018
@alexcrichton alexcrichton deleted the vendored branch July 31, 2018 17:24
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants