-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Move vendored modules into a vendor
directory
#129222
Comments
Android doesn't do anything special with these libraries, so it shouldn't be affected. |
Gentoo used to remove these libraries entirely, but we don't do that anymore. Which reminds me that I was supposed to fix the issue that prevented us from doing so. Also, I was supposed to see if we can remove them. Days are too short. |
s/nedbat/ned-deily/ Such a change should have no impact on macOS builds, macOS installer builds, or iOS builds. |
Oups, sorry for mistagging you and for the miswording (forgive my lack of macOS knowledge). |
Er... this is funny but... who should I ask for Ubuntu actually? |
Bikeshedding: call it For example, pip and setuptools have |
Mimalloc has substantial modifications from upstream. Daan Leijen was working on changes to mimalloc to support the features needed by free-threaded CPython upstream, but it's probably going to take a while. Even then, it will require CPython specific build configuration -- it's never going to be something where you can just drop in a shared library provided by the OS. |
@doko42 had/has been the Debian/Ubuntu contact |
I think the point of |
Thank you, I wasn't aware of it! I've removed from the list of possible vendored modules.
I don't mind. At least, vendoring modules could be included last (or first) and be well-separated from other includes thanks to their name starting with |
It may still make sense to include it in the vendored modules directory. I don't have a strong opinion one way or the other. It depends on what you are primarily trying to communicate with the vendored directory. For example, it's third-party code in terms of license, attribution, and code style. But Linux distributions should probably treat it more like internal CPython code. For example, they should not substitute their own mimalloc package. |
Thanks for the ping! I think there would be some minor pain downstream because we have a weird custom build system for the extension modules, but it's fine. What about changes like #127932? |
For me
I'm not sure I understand your question but I'll try to reply with what I understood. If the upstream has issues, we can't really change the vendored copy even if we want to because we have some SBOM checks. So we can't just "hotfix" HACL* when we want to (we need to wait for upstream updates). So whether we move it to |
Historically, the cpython copies of some of the long-time "vendored" modules have been patched and modified in the cpython repo without regard to upstream. That has lead, of course, to maintenance issues when new upstream versions are released. I think you need to clarify here whether your intent is to move to a model of strictly vanilla vendored versions with separate necessary patch files in the repo so that SBOMs for the vendored versions could be used. That's potentially a much bigger change, i.e. reviewing all the cpython copies and extracting our changes into separate patches, updating the devguide to explain the process, etc etc, and one that needs to be carefully done. But it would likely be of long-term benefit to do so. |
On Windows (and it's open to other platforms) we already debundle sources into https://github.com/python/cpython-source-deps and pre-built binaries into https://github.com/python/cpython-bin-deps. They are referenced by build script, rather than submodules, but at least it means our only build-time network access is to the same service that holds the rest of our code (i.e. if it's down, our build is failing already). We do also patch these from time to time, but usually in individual commits to preserve the history. Are you proposing to move them there? Or just to another directory in the current source tree? |
My point is that we already did apply a patch to HACL* — is the idea here that we would revert that change? I think the subsequent comment from @ned-deily (at #129222 (comment)) captures the broader concern. |
That's my main intent. Namely, try to separate what's being vendored (and can be updated if the vendor's upstream has changed) from what's being "our" code. I guess we can also allow separate patches to be applied on our vendored copies if needs arise (e.g., if the vendored copy has issues and the upstream takes a long time for making the change or does not want to make the changes).
I'm just proposing to moving it in another directory.
Err, no? we did apply a patch to HACL* using the upstream HACL* or am I wrong? I mean, what we did is just update the bundled HACL* library once the necessary patch has been written out there or am I missing something? |
Cloning the cpython git repo already requires an internet connection. Doing a recursive clone doesn't require more of one. The release tarballs should absolutely include the contents of a submodule as well -- in fact you have to since you can't later clone the submodules, Internet connection or not, since there is no submodule metadata. The advantage of using submodules is that it becomes very very very clear what is vendored versus what is patched. The Meson build system does something similar using "wrap files" built into the core build system, that contain an URL download location for the upstream release tarballs and a checksum, which works even without git metadata. This also means that Meson can distribute your choice of "fat" release tarballs with vendored dependency fallbacks, or "thin" release tarballs with just the wrap definition, and at build time it will download and checksum the vendored dependency only if you don't have a system copy, or conditional on configure options such as "force vendor dependencies for standalone builds" or "forbid vendor dependencies for distro policy". The code to handle this robustly is not trivial, which is why we built it into the core so that it only has to be maintained in a single place. You probably want to stick with submodules. |
Moving the vendored projects to an explicit directory would not impact Fedora/RHEL. We would need to change the paths for those that we rm, but that should be it. |
If you're bikeshedding the name, github code search already excludes directorys named
Ideal: We'd make a top level Doing that would be good for project health IMNSHO rather than scattering stuff around as we do today. re-packagers should appreciate it as well as they each have their own ideas of what they do and don't want to include vs pull from external sources and it makes it more obvious what we've got that could be considered for that. Non-goal 1: Vendored things each need their own management in terms of how we get them, update them, and patch them if necessary for our repo. Don't try for a single approach there, it won't fit everything. There may be some commonalities on how that gets done but that kind of thing can be worked out on a case by case basis later. Non-goal 2: Vendored things in our tree do NOT need to match upstream. applying patches is part of vendoring. the important thing is to track those patches and automate recreating such changes when updates are needed rather than expecting the next person doing it to fully understand. |
BTW, the reason |
... or |
Oh... then maybe we can use that name :') If it's supported by Github, I would vote for 'third-party' because I don't need to press shift for
Yes, that's what I think as well. I don't think we can have a unified approach.
I'm ok with this as well. I don't mind having each vendored module with its own patching system (I mean, we're already doing it somehow). Now, AFAICT, there doesn't seem to many issues about just moving files which was my primary concern here. I'm no expert in redistribution but since I raised the issue on Discord, I thought it was also good to actually first ask whether this is something that would be interesting for CPython in the long term. And I'd like to thank everyone involved in this (constructive) discussion. |
By the way, Meson explicitly mandates that all "subprojects" (thirdparty code with their own build system that can be autoconfigured) must exist in a single top-level directory (default: |
If the question is just whether the vendored libraries would be concentrated in the one directory called If the question is how happy I am with the continuing tendency to vendor many libraries and instead of collaborating with the canonical author of the library, use modified version of the third party library, then I am absolutely against, and I consider such action (except for the very temporary important security fixes or such) always wrong. We have then situations like #92875 where only SUSE and RedHat complain about a bug, and because of local modifications it doesn’t show in CI. |
I don't see how that ticket has anything to do with using modified versions of a library. |
Modified libraries were mentioned, and generally this ticket has a lot to do with a bigger acceptance of the practice I strongly oppose. |
Sorry but I don't see how you can possibly compare:
If it were a modified version of the third-party library, then the modifications would make it impossible to have a system copy at all. That's an unrelated topic, and is the case for things like mimalloc -- regardless of what directory it is stored in, you and I don't have the option to What is your objection to expat being moved to a different directory? |
On Mon Jan 27, 2025 at 11:50 PM CET, Eli Schwartz wrote:
What is your objection to expat being moved to a different directory?
As I said above, absolutely none, I think the name of the directory is literally bikeshedding. And if --with-system-expat still work, then it is even better. And yes, I understand that sometimes upstream projects die, and then it is better just to include them or something like that, but I wanted to make sure that there is a voice registered protesting the general (perhaps just outside CPython, but I see it everywhere) tendency for vendoring (should I name Rust and Go?).
Also, why don’t we have --with-system-mimalloc?
|
That was my main intent. I knew how it could affect downstream redistributors, so I wanted to first know if just moving stuff around is already a problem or not. By centralising what is vendored (possibly patched, possibly amended for our needs, but essentially something that can be thought as an external dependency that does not need additional installation steps), from "pure" CPython code (namely something written from scratch), it would be easier to manage those dependencies (whatever "manage" means here; it can be "maintained", "removed", "patched" or even "added"!)
I am personally not in favor of vendoring stuff as it adds unnecessary files that could be found elsewhere as dependencies but deciding whether we continue vendoring stuff or not should be a separate question IMO. Maybe a bit more of a context. I was actually fixing UBSan failures and reviewing files in the project tree, hunting for possible UAFs and I was like "oh but those folders are just unnecessary for me, those are vendored stuff" and I wished I didn't need my IDE to render them (though they should be indexed for autocompletion to work if needed). If everything was in a single folder, it would be easier to just exclude them from code search and to also avoid modifying them by mistake. It also makes the project more structured (for instance, I thought that mimalloc was entirely vendored but that wasn't the case so it's also a good thing to know what's ours and what's not). |
Ironically enough, both rust and go do NOT use source vendoring at all. They require rebuilding statically linked dependencies in local build tree scope, which is binary vendoring, but fetch it from the canonical source using a language-specific package manager each time... it is again unlike using a
Because cpython did not simply vendor mimalloc, cpython forked and patched mimalloc and thus it is impossible to build against it even if you try. #113141 is a tracking issue discussing those patches, which proposes to get cpython to align with upstream mimalloc sufficient to use a system mimalloc. Please address your comments there instead. :) |
Thank you, I will. |
Feature or enhancement
Proposal:
In CPython, we have some vendored libraries namely
libmpdec
,hacl
, andexpat
. Those libraries are meant to be clone of their upstream (think of them as git submodules) and many times have I been exluding them from code search as they usually have non-CPython code involved.We have a plan to remove vendored
libmpdec
(#115119) and we're almost here. In the long term, we could also try to unvendorexpat
(probably nothacl
as it's used to implement hash functions fallbacks when OpenSSL is not present).Affected modules:
libmpdec
expat
_hacl
Some advantages:
hacl
).Python/Programs
splitSome inconvenients:
We can start with some modules that should be kept untouched such as HACL* sources and progressively move the others to reduce the work and conflicts. It doesn't need to happen in one go (for instance, we may well ignore the
libmpdec
case if we manage to make it unvendored before). I don't think we have much open PRs withexpat
(by the way, we could have a refresh script forexpat
to ease maintenance like #126623).Now, the question is how this could affect downstream redistributors. I'm asking first on Github since I don't know whether they are active on Discourse or not. If everyone tells "it's fine", then I'll ask on Discourse to see if there are more redistributors that could be concerned.
cc
I don't know how moving
mimalloc
related stuff would affect the free-threaded build in particular, so I'm also going to ask @kumaraditya303 and @colesbury about it. EDIT: Turns out it's a no go formimalloc
as there is some CPython dedicated stuff, so we can put it out of the list (see #129222 (comment)).For the
hacl
includes, I can take care of it.The text was updated successfully, but these errors were encountered: