-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Cirrus: Run checks directly on the host #1334
Conversation
ahh right most of the tests are skipped when not running in a container, there was something I needed to tweak to get around that... |
8dfc215
to
910bd20
Compare
Wait...what?!?!?! It passed?????? @mtrmac PTAL (it's still a prototype). |
oh! wait, I made a typo, and we probably want developers to be able to build a container for testing in... |
0b2848e
to
e2d1464
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration tests fail on OpenShift startup… Assuming that’s not transient (I have now triggered a re-run), debugging that doesn’t look like much fun.
Another sort-of-concern is that while running a custom container inside a custom VM is rather wasteful, and running CI directly on a VM is certainly nice, for local development it is convenient that make test-integration
(and make test-system
?) can be run in a container, so it is useful for CI to enforce that those two continue to work inside a container.
So, would running vendor+build+validate+unit directly on the VM, and integration+system in a container be (even possible and) worth the effort?
Yes, we need to do both, the challenge is keeping the environments and experience as similar as possible.
There's no reason we can't/shouldn't do both in parallel (on separate VMs). |
1953bfa
to
a792502
Compare
The openshift thingie seems to be failing to start due to SSl/TLS crap:
But I also see a fair number of what looks like errors binding to port 8443 of the local IP...I wonder if maybe something in the GCP networking setup is (somehow) blocking this? I would think the system/kernel smart enough to recognize that as the local address and not actually send anything out. |
Yeah. I don’t know what that’s about :/ It’s quite possible that a networking difference changes the detected primary IP address inconsistently in some parts of the code vs. others, or something like that.
Binding or connecting? Connecting retries are expected, that’s just Line 117 in 71e7a58
Binding to
Yes — OTOH OpenShift might be trying to be “useful” on the local network; we just run an |
Oh, good thought. I can check this manually by comparing the logs from a job run from master.
Interesting, I didn't notice that.
Hmmm, so the VMs definitely have TWO IP addresses. One is internal-network only, the other is external. IIRC, the logs mentioned the 10.x.x.x (internal) network. That should work, but maybe not. We're using a very vanilla/default GCP networking setup, it's possible the default is to block inter-VM communications. In any case, I think this is going to require some hands-on work. |
@mtrmac QQ: Is the purpose of this to verify Skopeo can talk to the OpenShift registry? Is there a (testing) reason why skopeo needs to remain compatible with an unsupported OpenShift? Reason I ask is...might it be easier to get a newer/supported registry to test against rather than fight/fix old one? |
Yes,
Somewhat. We need the registry for testing two things:
In an ideal world I don’t expect to happen any time soon, we would teach github.com/distribution/distribution the X-R-S-S API, and test against that — a simple registry server is much smaller and easier to run. But nobody has worked on such a server, to my knowledge; and even in that case we would probably want to ensure interoperability with the OpenShift implementation of X-R-S-S. (… and in a truly ideal world, we would have testing coverage of the most of the major registry products, but that’s another problem space entirely.) |
Okay, thanks for the insights. Yeah, mocking it would be a major undertaking and additional maintenance, much better to use the real-deal. Too bad it's such a behemoth. There are also now registries on both github and gitlab, in addition to quay and in the google cloud. I assume testing those wouldn't add much value, but please correct me if I'm wrong. |
Okay, I setup an POC container build over in the c/automation_images repo.. Next I will work to incorporate it's use into both CI and the Makefile. |
75729a2
to
363e1fa
Compare
@mtrmac I've taught Meanwhile...let me see about fixing some of these docker-rate-limit problems, they're starting to drive me crazy 😠 |
We're running as root, so I doubt that simple case is the problem, but could it be an SELinux issue maybe? I'm hitting all kinds of problems in podman land with these same VM images, due to policy updates: containers/podman#10522 Specifically, any volume mounts with I'm having a really (REALLY) hard time parsing all the openshift/registry output mixed in with the test output. Is there any way we could send the openshift stuff into a file instead of printing it all? It's really easy for me to tell Cirrus to attach a file after the task runs (pass or fail). |
confirmed: |
Close enough of a guess, it seems.
That’s not quite what I see. The home directory, per See #1403 , which includes various debugging printouts, and where the integration tests pass when run with |
I’m probably thinking about this incorrectly — I think those log streams have identifiable prefixes, so it seems possible to filter them out after-the-fact, using tools if necessary. So far it seemed convenient enough to me to have everything in the same log stream, usually in the natural order (“starting registry” … registry start-up logs … “waiting for registry start-up finished”); separating some of those logs out into external files would almost every time require looking up one or more of those files (presumably the main log would contain paths?), which seems to be more work to me. Is it the visual noise on each individual line? Too many useless log lines? The fact that everything is mixed in a single stream, regardless of what the individual lines look like? Something entirely different? Some combination of factors? Admittedly:
|
That's fine too, it's just hard on my eyes. But if it makes sense to you and you can make sense of it, I'm fine just being the lever-puller.
From the perspective of someone who doesn't see it much / ever, it's hard to visually distinguish between test harness output and subject (skopeo) output and tooling (openshift, registry, etc.) output.
The horrendous amount of redundant messages from openshift/etcd ("blah blah tls cert invalid blah blah") are also really confusing. But yes, this can be a problem for another day and PR. I too am concerned as I don't remember seeing them before, maybe they were simply not shown before (somehow)?
Oh wait, this just struck me. When the tests run, it's not a login shell, but in a |
I think there’s basically two cases of the asynchronous output: Some other synchronous commands log output. Skopeo itself typically doesn’t log output, unless the test fails and the output is included in an assertion message. Probably none of that is 100% consistent, as the code grew over time. I can’t really remember any more, and of course past problems are not that much of an indication of future problems, but it seems reasonable that those noisy I don’t feel about the output format strongly… I guess PRs are welcome?
My first guess is that this has something to do with recent Go only looking for host names in subjectAltName instead of CN. But that’s really just a guess with no supporting evidence at all. |
Huh...well it seems (somehow) the tests passed. I didn't notice anything funny in the output, but I'm a bit skeptical. Let me push this small |
f68c889
to
dec382f
Compare
69552c5
to
53a3d99
Compare
Well, okay then it passed again. Removing the draft status, let's get on with final review before something breaks 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work!
I’m really only worried about the /etc/os-release
check question; the rest are non-blocking nits.
In order to meet achievable deadlines converting from Travis to Cirrus CI, one significant artifact was carried forward (instead of fixing): Depending on a `--privileged` container to execute all/most automated checks/tests. Prior attempts to remove this aspect resulted in several test failures. Fixing the problems was viewed as more time-consuming than simply preserving this runtime environment. Time has passed, and the code has since moved on. This commit removes the legacy need to execute CI operations in a `--privileged` container, instead running them directly on the host. At the same time, the necessary test binaries are obtained from the same container used for development/local testing purposes. This ensures the two experiences are virtually always identical. Signed-off-by: Chris Evich <cevich@redhat.com>
@mtrmac thanks for all your help on this too, couldn't have done it w/o you. What a doozy of a PR! 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks again!
FWIW https://cirrus-ci.com/build/6216764259303424 lists some “Issues found while parsing configuration” — most look like stylistic preferences rather than something that needs fixing. I have no idea about the severity (or it might have been a well-considered long-standing decision), I just happened to randomly notice that field (well, the whole page, actually) for the first time today. |
Ugh, I know it does that everywhere now. For some reason they really don't like how we use |
In order to meet achievable deadlines converting from Travis to Cirrus
CI, one significant artifact was carried forward (instead of fixing):
Depending on a
--privileged
container to execute all/most automatedchecks/tests.
Prior attempts to remove this aspect resulted in several test failures.
Fixing the problems was viewed as more time-consuming than simply
preserving this runtime environment.
Time has passed, and the code has since moved on. This commit removes
the legacy need to execute CI operations in a
--privileged
container, instead running them directly on the host. At the same time,
the necessary test binaries are obtained from the same container used
for development/local testing purposes. This ensures the two experiences
are virtually always identical.