Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

gh-128045: Mark unknown opcodes as deopting to themselves #128044

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DinoV
Copy link
Contributor

@DinoV DinoV commented Dec 17, 2024

When accessing co_code on a code object we'll first run through to do the de-opt https://github.com/python/cpython/blob/main/Objects/codeobject.c#L1706

This removes any unknown opcodes. Instead the deopt table should just recognize unknown opcodes as deopting to themselves, allowing the extensible interpreter loop to consume unknown opcodes.

@DinoV DinoV changed the title Mark unknown opcodes as deopting to themselves gh-128045: Mark unknown opcodes as deopting to themselves Dec 17, 2024
@DinoV DinoV marked this pull request as ready for review December 17, 2024 20:12
@DinoV DinoV requested a review from markshannon as a code owner December 17, 2024 20:12
Comment on lines 251 to +256
for name, deopt in sorted(deopts):
out.emit(f"[{name}] = {deopt},\n")
defined = set(analysis.opmap.values())
for i in range(256):
if i not in defined:
out.emit(f"[{i}] = {i},\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not testing this at all in cpython, I'd suggest we at least add a couple of assertions to make sure this is correctly covering the range of opcodes:

Suggested change
for name, deopt in sorted(deopts):
out.emit(f"[{name}] = {deopt},\n")
defined = set(analysis.opmap.values())
for i in range(256):
if i not in defined:
out.emit(f"[{i}] = {i},\n")
defined = set(analysis.opmap.values())
for i in range(256):
if i not in defined:
deopts.append((f'{i}', f'{i}'))
assert len(deopts) == 256
assert len(set(x[0] for x in deopts)) == 256
for name, deopt in sorted(deopts):
out.emit(f"[{name}] = {deopt},\n")

facebook-github-bot pushed a commit to facebookincubator/cinder that referenced this pull request Dec 18, 2024
Summary:
When CPython hands out the bytecode it will first do a de-opt on it:

https://www.internalfb.com/code/fbsource/[241e0d0760d4c107a3c0ac2b4914524120b0c909]/third-party/python/3.12/Objects/codeobject.c?lines=1540&reveal=1540

This uses the de-opt table which only defines the known opcodes, meaning unknown opcodes get turned into 0's. We need CPython to at least define unknown opcodes to at least de-opt to themselves.

Upstream PR: python/cpython#128044

Reviewed By: jbower-fb

Differential Revision: D67350914

fbshipit-source-id: 0073efab52da1be775272e7dd9ae5a46468ccb10
@markshannon
Copy link
Member

I think we regard undefined instructions an error.
If you want to pass custom bytecodes to your custom interpreter, a more robust approach might be needed.
We can do the deopt thing for now, but it seems fragile.

For example, one thing I have considered is storing bytecodes in a compact format on disk: Instructions without an oparg would take 1 byte, those with an oparg would take 2. We would then combine the unmarshalling and quickening steps to create the full in-memory form in a single pass. Custom instructions would not survive this process.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants