Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

investigate reliability of connector syncs on m1 k8s #10122

Closed
jrhizor opened this issue Feb 5, 2022 · 1 comment
Closed

investigate reliability of connector syncs on m1 k8s #10122

jrhizor opened this issue Feb 5, 2022 · 1 comment
Labels
area/platform issues related to the platform needs-triage priority/medium Medium priority team/platform-move type/bug Something isn't working

Comments

@jrhizor
Copy link
Contributor

jrhizor commented Feb 5, 2022

I'm running into something strange on M1 using the published PokeAPI + JSON Destination images on a locally built platform.

If I run on docker-compose I can run it as many times as I want and see successful runs (even 180+ in a row). However, if I run on kubernetes, there's a chance of the syncs getting stuck. Specifically what happens is that the source completes and closes but the destination never completes. The destination's remote-stdin closes but the pod is just hanging. This doesn't really look like a platform pod orchestration problem (?) because the actual main loop just sits there indefinitely, and the platform just waits as expected. This happens fairly reliably (I almost always run into this by 10 runs of this sync).

This transient behavior of getting stuck happens on both the non-container orchestrator and container orchestrator approaches with similar frequencies and in the same place.

I haven't attempted to replicate on other connectors (that would point to where the issue is better). I think the issue must be either M1-specific for docker desktop or a race condition in the entrypoint. The only reason I think it may be M1 specific is because we haven't seen this precise issue in builds (linux amd64) (worth confirming) or cloud (linux amd64).

The remote-stdin container only logs the following before closing:

❯ kubectl logs -f destination-local-json-sync-6-0-zqark remote-stdin
2022/02/04 23:43:02 socat[1] W ioctl(5, IOCTL_VM_SOCKETS_GET_LOCAL_CID, ...): Function not implemented

but that is also logged on succesful runs.

I think the first steps would be to run the syncs 10-20x on:

  1. run with non-json destinations on M1 docker-for-desktop to see if this is a destination-specific problem (seems unlikely but worth ruling out)
  2. ec2 build runners kubernetes
  3. docker-for-desktop kubernetes on non-M1
  4. run minikube or any other non-docker-desktop/kind version on M1

No matter which environment we narrow it down to, we'll need to probably add diagnostic logging in the entrypoint to try to find exactly where it's getting stuck in the termination.

I'm putting this at high priority since it's a non-trivial barrier to trusting the result of platform code changes locally when developing on M1.

@jrhizor jrhizor added type/bug Something isn't working priority/medium Medium priority area/platform issues related to the platform needs-triage labels Feb 5, 2022
@jrhizor jrhizor changed the title investigate reliability of connector syncs on m1 investigate reliability of connector syncs on m1 k8s Feb 5, 2022
@davinchia
Copy link
Contributor

duplicate of #2017

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area/platform issues related to the platform needs-triage priority/medium Medium priority team/platform-move type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants