-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Ensure UTF-8 encoding when communicating with a subprocess #8563
Ensure UTF-8 encoding when communicating with a subprocess #8563
Conversation
For my understanding: This fix is only relevant if |
I'm not entirely sure that's true: it seems to me this should also be relevant when dealing with output produced by the child process. IIUC: by not specifing |
I'm not sure either about any unexpected regressions due to this change, but if you guys are really concerned about it, setting |
There are examples in My first guess is that But I'm not sure I yet understand what is going on to make this necessary: why don't the parent and child pythons already agree on the encoding, regardless of whether it is utf8? |
OK. I just guessed that tweaking
I'm not an expert of python or character code, but one thing I know is the encoding of source code is a different thing in Python, because Python assumes UTF-8 regardless of the locale:
Let's see how the following script works on a windows instance with the Japanese locale. You can reproduce it on GitHub Actions: https://github.com/privet-kitty/ci-encoding/actions/runs/6599347808. import subprocess
import sys
import locale
from typing import Optional
def run(
no: int,
encoding: Optional[str],
input: str,
options: list[str] = [],
) -> None:
print(no)
cmd = [sys.executable, "-I"] + options + ["-"]
try:
proc = subprocess.run(
cmd,
stdout=subprocess.PIPE,
input=input,
check=True,
text=True,
encoding=encoding,
)
print("stdout:")
print(proc.stdout)
except Exception:
print("exc_info:")
print(sys.exc_info()[1])
print(f"{locale.getencoding()=}")
run(1, None, 'print("a")') # OK
run(2, None, 'print("中")') # NG
run(3, "utf-8", 'print("中")') # NG
run(4, "utf-8", 'print("中")', options=["-X", "utf8"]) # OK In my environment, the output is as follows. (This is much the same as that of the CI above, but the output order is a bit different, probably because an exception is raised in another thread.)
My interpretation: case 2: case 3: case 4: |
so perhaps a simpler / less invasive fix is to put a PEP-263-style explicit encoding at the top of then both parent and child python can use the same locale encoding, which might or might not be utf8 but it doesn't matter because they will again agree. so far as I can see other python scripts executed by poetry only use ascii characters - this one is the odd one out because it includes the value of |
Possibly. Yet another thing I'm a little worried about is the |
I wouldn't worry about that. Anyone who is relying on such environment variables to influence the behaviour of poetry can simply be told not to. |
OK. I added the encoding declaration and tried the same non-ascii working directory test but in vain. Again the I can observe this phenomenon locally. Python's PEP-263 behavior seems to be weird, when
So far I've failed to find any document that can explain this behavior. |
I reckon running python scripts as |
4b69108
to
2dd04cf
Compare
Is this superseded by #8565 and can be closed? |
@radoering yes, thanks. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Pull Request Check List
Resolves: #8550
Manual tests
test_run_python_script_non_ascii_input
failed on windows instances before fixingbase_env
: https://github.com/privet-kitty/poetry/actions/runs/6591942649sys.stdin
is not utf-8 but cp1252 on a Windows instance (when piped) . So I didn't need to prepare a special non-utf-8 environment for testing. If it is better to set up the test environment explicitly, the only way I know is invoking a PowerShell commandSet-WinSystemLocale <some locale>
before testing in workflows.test_run_python_script_non_ascii_input
.) As a one-time testing, I modified the master branch to use a non-ascii working directory in CI, saw some test cases failed, and it succeeded after merging this change.