Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Figure out what's causing Python 3.11 instability #446

Open
zorbathut opened this issue Dec 17, 2022 · 2 comments
Open

Figure out what's causing Python 3.11 instability #446

zorbathut opened this issue Dec 17, 2022 · 2 comments
Labels
bug Something isn't working complicated Extra attention is needed P1 critical

Comments

@zorbathut
Copy link
Contributor

We have a problem where Python 3.11 is causing uncommon crashes on the live server. Some stacktrace screenshots:

image
image

Things tried:

  • Changing the base image to ubuntu:22.04 works fine on Python 3.10. Updating to Python 3.11 is difficult because Poetry isn't compatible with the Deadsnakes 3.11 (Poetry is incompatible with unstable-tagged Python builds (Invalid PEP 440 version: '3.Y.Z+') python-poetry/poetry#6925). This should be tested further.
  • poetry:3.10 works fine; poetry:3.11 doesn't work.
  • I haven't gotten this to happen in the test framework, although the test framework doesn't do real HTTP requests. Maybe we should try making real HTTP requests?
  • This happens both on the live site and the dev site.
  • This has not been observed locally, however.
  • I've tried recompiling sqlalchemy without its C module. This made no difference.
  • I've tried disabling gevent, greenlet, and the messy signal stuff we're doing. This made no difference.

If you've got ideas on how to reproduce this, let me know, I'm happy to try stuff out.

@zorbathut zorbathut added P1 critical P2 priority bug Something isn't working complicated Extra attention is needed and removed P2 priority labels Dec 17, 2022
@justcool393
Copy link
Collaborator

this appears to be some sort of memory corruption of some sort. given the random nature of where the crashes are occurring and the fact that these crashes are occurring in both code that makes no sense to cause them and some other stack traces where the top frame was "Garbage collecting..." i'm relatively convinced of this.

python 3.11 iirc is notable for performance improvements and if i were to guess (no evidence for this) is that something expected something to be somewhere in python 3.10 but it's not there in python 3.11 and this is causing memory corruption.

given this i'm inclined to believe our culprit is one of (in descending order of likelihood)

  1. c extensions. there are a bunch of things we use C for indirectly and someone who isn't playing nice by using the correct memory allocation functions might be opening it up.
  2. python 3.11 itself. i find this unlikely, but i do think py3.11 is a factor.
  3. some random freak of nature that hates us specifically. this is prolly it tbh.

@justcool393
Copy link
Collaborator

justcool393 commented Mar 1, 2023

oddly enough one of the strange things with this is that it's hard for me to reproduce. i'm curious does prod + dev have any major differences with deployment than the docker version?

we probably could try and point valgrind at it to see what's blowing up but we'd need a coredump for that

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working complicated Extra attention is needed P1 critical
Projects
None yet
Development

No branches or pull requests

2 participants