#wheelnext
1 messages Β· Page 1 of 1 (latest)
π
π
Hello all! See the above GH link for most of the details we have written down atm. A bunch of folks are collaborating on evolution of the wheel spec, variant support, and other stuff.
This is very much a community-driven fully open initiative to evolve the wheel spec, produce PEPs, reference implementations, etc.
A thing from the top of my mind: top_level.txt should be standardized. importlib.metadata.packages_distributions() relies on it. See also: https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494
Any better place where to bring that up for wheel-next?
I had another instance where I wished I could map a project name to the modules and packages they provide. I also realized that I suspect people come to PyPI looking for a project by its import name and is not always successful (look at sklearn Β· PyPI as a redirect example for this), so having a reverse lookup at pypi.org might be useful. Would...
Reading through the thread as context, I think this is a great idea. I think I agree with Thomas, Ofek, and others that re-using Provides seems the most appealing.
Oh and as for where it should go, we should probably make an ideas repo under the wheel-next Github org a la faster CPython https://github.com/faster-cpython/ideas
Above?
For anyone else like me that so rarely looks at Discord channel headers I barely even recognised the screenshot that mirrored the current contents of my Discord window: https://github.com/wheel-next π
Ah it's not above on mobile
Ah, looks like on iOS at least, you have to first click on the > next to #wheel-next
Sorry for the delay, but I've finally had a minute to set up https://github.com/wheel-next/ideas, so please feel free to open an issue with ideas for wheel enhancements or changes otherwise!
I wish that foo-version.dist-info folders could be shuffled to be named dist-info/foo before the next standard
sounds kinda reasonable
If that happens, it will mean that old installers will not be able to locate the WHEEL file and thus determine the version.
Yeah, I think for that reason the format of "dist-info folder in a zip file with metadata" was one of the invariants I've listed to keep.
Depends on the compatibility story, though. If it's "emit wheel 1.x wheels if they work for you, otherwise emit wheel 2.x wheels and accept some clients won't be able to install them", then it may not matter too much how the 2.x wheels fail on older clients, so long as they do fail.
Current resolvers will still pick a 2.0 wheel even though they can't install it, so that is something that needs to be handled before any 2.0 wheels start existing on pypi.org. Otherwise people will see CI breakages and downstream get error messages saying "this wheel is unsupported" with no actionable information. I don't want a Python 2->3 for wheels, and I expect most users do not want a breaking in change in wheels with no clear migration path for users.
Is that just an issue with current resolvers not knowing how to handle 2.0 wheels? Does this need more metadata exposed by index, or is this something that can just be solved by contributing what's missing to various resolvers?
don't know about other options, but installer is hardcoded to fail installation if wheel version is not 1.x. And IIRC indices don't expose wheel info other than file itself and file hash, sometimes METADATA
All installers MUST fail on installation of a wheel of a major version they do not support per the wheel 1.0 spec. Currently @short epoch is right that the version is not exposed at all unless the resolver can get the WHEEL file either through a range request or downloading the wheel. But no resolvers as far as I am aware check for wheel compatibility when resolving wheels.
In case of a compression algorithm change, it wouldn't be able to read the version at all since I think it's only included in the archive, right? Filename doesn't expose it.
true
Yeah, this is why I think it is important to maintain the outer "zip file of metadata". the actual package data (which will take up most of the size anyway) could be something completely different
Is there anything stopping new filenames for future proof wheels
This could be fixed by adding a mandatory assumption that if you encounter a compression algorithm you don't support, you must treat the wheel as unsupported.
no
I mean old installers won't recognize it, so there are issues with doing so
so, do we just need installers to actually fail predictably for things outside of what they support?
because that seems like something they already should have been doing
It's slightly preferable that at least the resolvers keep trying until they find a installable one (and warn that the newer was not installable)
But if they havent already been doing that, then people would need to update their resolver anyhow right?
I think defining semantics around what to do with unsupported files is really hard and starts to restrict what you can do in the future. If someone does pip install somerandomfile.extension. I don't want pip assuming that .extension files are potentially some future wheel variant, but at the same time, I don't want to have older installers break if I change the extension. Somewhat of a catch-22
*break in confusing ways
I've written resliant to change binary protocols before, the easiest thing to do here is if anything isn't as expected, you assume you aren't capable of handling it.
so there's currently multiple layers of that, if you get something that doesn't decompress in an expected way, not a wheel you support.
If you get a wheel file with unexpected content, you dont support it
If you get to the point where you can parse a wheel file, and it's a version you don't support, you don't support it
Oh absolutely, but there is a more fundamental issue of what to do if you don't know if a file is a wheel or not
That is an issue if you want wheels to fail reliably on future wheel versions if the extension changes
so I think as a practical matter we should not change the extension
anway, I have a lot written up for a wheel 2.0 PEP that goes over issues like this, that I hope to publish in a few weeks
agreed on not changing the extension, but I do somewhat wonder if the choice to base the initial spec on a zip file is going to create any problems. it would have been easier here if the format had it's own semantics and a version field at a fixed offset from start of file, even if the inner content was "just an embedded zip file" at first.
As far as I'm aware nothing forbidden ensuring the first few entries in a file are the metadata, then one wouldn't even need the index to do opportunistic fetch of the metadata
Personally I found custom nested formats more painful, whl is super easy to create and debug with preexisting tools
It's literally just a zip of what would be installed
How about we define the list of compression formats to try in order?
Please don't make it complicated, just keep the zip file
the zip file, and the metadata being inside it with compression, means that the metadata can't be reliably gotten in a forward compatible manner
it doesn't have to be complex to change that, but there's no way to benefit from picking a better different compression to standardize on if a requirement is that resolvers give friendly error messages involving the required wheel version other than forcing the index to provide that info.
I'd be fine with the error message quality being sacrificed here tbh, I'm fine with the error message just being "hey, we don't know how to support this file, the assumption is that it is a newer wheel version, but it could also be corrupted"
but with a zip file, there's no forward compatible option here. what if 2 years from now we're all using foo_compress because of some miracle savings over zstd? tools today don't even know about foo_compress, it's a mythical future. We can't tell them they have to try every compression method, that version will still fail to extract the info needed for the metadata version.
(for the record, I'm fine with keeping it a zip file, I'm just pointing out where the choice of a zip file means it can't reliably get the metadata in the future unless we force the index to handle it for supported wheel versions)
I think we'd probably want to do something similar to debian files, where in wheels the metadata is kept un-compressed in the zip file, and the actual data can be in a separate archive format (e.g. tar.zst)
This allows keeping the old metadata but adopting new compression formats
I vaugely remember that Zip supportes per file compression Format choice
it does, but that still requires downloading a lot more to determine the version.
i.e. inside the zip file, you'd have a .dist-info folder, then next to it package_data.tar.zst (straw man name, please don't bikeshed π
) for the package data.
Ideally indices should be serving .metadata, and support http range requests as well
How about standardize to have metadata as first entries and providing the range details in index data
Then fetching metadata would just be a partial download
If we hasn't started with "just make it a zip file", the wheel version could have been encoded in the file header, at a fixed offset, allowing even something like that to be less costly to get on demand/cached or at upload.
I'm not suggesting we change to that now, because anything we do here that changes it from being a zip file is much more breaking in terms of ability to handle it.
Wouldn't it be better to just serve metadata per PEP 658?
ah yeah, agreed. But also making it a zip file makes it very easy to work with, and that is something I don't want to seriously sacrifice
yeah, ship sailed on that. I do think it could have been done in a way that remained easy to work with if we all agreed that the data section of the file needed to remain easy to work with, and that the standard library would have a way to invoke python -m some_mod some.whl to get at that easy to work with inner file
especially cause that could have just been "header with metadata, append zip file"
sure
so I guess the questions of how to get to wheel 2.0 is
how should resolvers fail wheels the installer can't handle?
do resolvers need any metadata that isn't already in pep658?
which resolvers or installers in use need changes?
I imagine your draft covers these, but if you have an answer for the third one of those, I may be able to put some of my time into helping address that.
Yep! These are exactly along the lines I've been thinking. The third question seems like it shouldn't be covered in the PEP to me (perhaps discussed in the compatibility section). But I expect the answer to be "all of them"
yeah, I think the 3rd is more a consequence of the first two + the current state. probably something that needs addressing for the most popular resolvers before pypi can accept 2.0 wheels even if the pep doesn't end up mandating that.
oh, interesting - https://packaging.python.org/en/latest/specifications/binary-distribution-format/#recommended-archiver-features recommends putting the metadata at the end
i suppose the idea being to grab the last "chunk" of a archive to get both the zip index and the metadata
yep, I expect we will need implementations before anything gets accepted, and pip and warehouse seem to be the most obvious choice
On the file naming front: if the wheel major format goes in the extension (i.e. *.whl, *.whl2, *.whl3, etc), then older clients won't even see the wheel versions they don't support (since they won't be looking for that extension). With build tools emitting the oldest wheel version that provides all the features the project they're building requires, this frees up major iterations of the spec that add significant chunks of functionality to also clean up cosmetic details when doing so is deemed worthwhile.
The difference between this and the py2 -> py3 transition is that installers will be able to happily support both whl and whl2 installs. Projects would also be able to choose whether to fall back directly to sdist from the whl2 format, or whether it makes sense to offer a whl when whl2 isn't supported (for example, if whl2 has better dynamic linking support for extension modules, a statically linked whl may be a valid fallback option).
If itβs whl whel wheel wheeel etc Iβm on board
Do note that dropping the zip file would also mean you can't import directly from a wheel anymore (and that's ignoring importlib.metadata no longer working if the metadata is compressed separately), so any changes here also trickle down into the stdlib
@toxic wraith I was wondering over the weekend whether version pinning could partially solve the "you started releasing 2.0 wheels but I don't understand them" problem? Meaning, I would expect a package to start releasing 2.0 wheels after something like a major version bump (yeah, I know, can't count on proper semvering), but in any case, a project can't release both 1.0 wheels and 2.0 wheels at the same time. So if you have a dependency that uses 2.0 wheels, and your install starts to fail, you can just ceiling the package version, and then that old installer will never try to grab the 2.0 wheel. Not super convenient but workable, I think!
I don't think 2.0 wheels should require a major version bump in semver of the package, that would probably hurt adoption (people don't like to do major version bumps) and seems leaky abstraction-wise.
Pinning does solve the issue of incompatible wheels, but only after running into the problem. A lot of my design thinking about this is how to make the UX of the transition as painless as possible. Part of that is to avoid a 2.0 wheel failing to install because of an old installer, pretty much ever. I do think that @hard escarp may be right that the correct path is to use an entirely different file extension so there is no chance of conflicts. I need to think about the implications of that. Another important constraint is I don't want to have to have a new file extension for each new feature, that would get very confusing quickly.
It's possible that's the way out, but agreed you don't want a new extension for every new feature. I'm also concerned about what happens for wheel 3.0 (if that is ever needed). Do we just keep bumping the extension?
It's true that you can only pin after the failure, but it's a failure of your installer's compatibility so the other solution is just to update your installer, but that does leave a window between the time 2.0 wheels are uploaded and your installer of choice gains support for the new format.
Yep, I think the most important part of defining the transition process between wheel 1.0 and 2.0 is the gap where most users' installers don't support 2.0
Let's add support to the installers before the package managers then!
The way I've been thinking about the major version bump: if a project can already ship wheels, then they would probably prefer a minor version bump where an installer ignoring a new wheel feature doesn't break the install. That suggests the primary intended audience for a new major wheel format version would be projects that only ship sdists because wheels don't work for them.
Otherwise there would need to be some huge improvement to counterbalance the "lower levels of installer support" downside that would exist for some time, and I must admit I struggle to picture what could be that beneficial.
Another possibility that occurred to me is that if the major benefit was an opt-in mechanically reversible trade-off (e.g. internal xz compression at the expense of giving up direct import support), and parallel publication was supported, tools could emit and upload both formats during the transition period before eventually dropping the legacy 1.x wheel.
There are a few conflicting considerations I see:
- if wheel 2.0 can be published in parallel to 1.0 wheels, projects suddenly take up twice the space on pypi, which is already an issue
- if we don't allow this, either
a) 2.0 wheels are selected by old installers and raise an error which likely will be unactionable (older installers don't know what to recommend if they encounter a new wheel, except to try to update which may or may not help) or
b) they aren't selected and users don't see updates if their installer is older. It would also be unfortunate if this happened silently
I think not selecting them but emitting a warning that such a choice happened is the most reasonable choice to make. I also think whether or not to allow side-by-side wheels is a question of whether or not the pypi admins think it is sustainable
Variation on b) for projects that publish sdists: --only-binary and --prefer-binary on older clients don't see the updates, source-allowed installs start building from source instead of using the no-longer-recognised wheels.
As far as the allowing-side-by-side-distribution question goes, the one carrot I could see making that worthwhile for PyPI in the long run is if the eventual promise is wheels getting much smaller on average (which implies the two-layout solution: the importable layout needed for bootstrapping use cases like ensurepip, and a more agressively compressed one which allows for everything other than the wheel metadata to be shipped inside a nested xz compressed tarball).
True! And that's not a great experience either :/
The other way this could play out is that 2.0 wheels are outright banned from pypi.org for a period of time so that installers get updated and adopted until a large enough proportion of users support the new format. Private/non-pypi indices could be allowed to use wheel 2.0 to start to test out implementations
Hah, I'd genuinely forgotten about that option. I think it came up way back when Daniel Holth first raised the prospect of wheel 2.0.
#installer is ready to implement the moment standard is defined π«‘
hmm, the main index for packages not allowing new standard? I can already hear the screams and see the pitchforks and torches
We've done it before. Linux wheels were defined for private use long before they were accepted on PyPI (and for comparable reasons: in private use cases, you could make sure they were only installed in the environments they were designed for, even though the pre-manylinux Linux wheel tags were hopelessly ambiguous about the system ABI they needed)
It's certainly not a great option, but it's an option worth considering.
I'd rather get that then the pitchforks from breaking nearly everyone's CI pipelines π
If installers communicated to PyPI the version of wheels they supported, PyPI could potentially adjust the Simple API view they present to such installers. It could possibly leverage some of the multiview approaches being proposed by the variants work.
Other than that, what installer support would be considered "enough"? Is it enough that say the top 3 installers in use support the new standard, or that the top X% of installers talking to PyPI support it?
I would probably view it as proportion of all downloads over a time period use installers that support it, or something like that
A multiview would still double the size of projects pypi.org hosts, and I'm not sure if that's a reasonable ask.
On the other hand, while it would double the storage, it would show immediate decreases in bandwidth comparatively if 2.0 wheels could use xz or zstd
Yeah. It might be a wash. The storage doubling problem might only effectively be a problem for packages that are already bumping up against project size limits (i.e. doubtful pure-python packages would have much of an impact).
In any case, I think installers generally should start communicating wheel version compatibility to PyPI now, just in case it's super handy later (if they don't already).
someone somewhere suggested tying the transition phase to python version. this has some nice properties... you expect to have to update things for new python anyway, no one's existing setup will be broken, you don't double storage, most of the projects that would benefit from a wheel2 will probably be uploading new wheels for new python anyway. especially if there are stdlib changes
Do you mean that wheel 2.0 wheels would only be installable on newer Python versions? I don't particularly like that, as I think it would lengthen the adoption window, and be confusing for users
Some people will want to adopt the new wheel format right away, and by tying it to a Python release it would mean they'd have to wait however long until the next release is
as a starting place, i meant that build tools default to wheel1 for wheels that target old python and wheel2 for wheels that only target new python versions
but i think e.g. if we use zstd and include zstd in stdlib, wheel2 being only installable on newer py versions might well make sense, otherwise to my naive mind bootstrapping pure python installers might be slow or fiddly
i also think it's good for the adoption window to be legible. most folks know what pythons they intend to support, but i don't think many folks have a sense of what installers their users use
So part of my goals with wheel 2.0 is to not need everything changed all at once. I'd like zstd support to be a follow up PEP that won't require a new major version
It me. I've been advocating for that. π
Yeah, I have been mulling that over more. I want to think through it a bit more and weigh the pros and cons.
I have an email thread somewhere with @glossy cradle and @cedar iris about exactly this... that I meant to respond to... but then it got awkwardly delayed... and now it'd be weird to respond there. π
Just spotted this thread (thanks for the ping @icy trench!) I find tying this to Python version very weird, and I've not been able to think through how it would work cleanly. I feel like it might create more problems than it solves.
Correct, that's the intended use. And it's what pip does when the index doesn't support separate metadata downloads.
I do wish we'd put the metadata at the front of the file though.
It's a saner request to be like "gimme the first 1000 bytes" or so, rather than "the last 1000 bytes" but I also don't know how the zip format works. π
The wheel spec currently recommends putting .dist-info at the back, since zip is designed to make mutation at the end easier
(basically you can add a new file/directory record and re-write the listing and be done)
mutating the metadata of a wheel seems like a rarish thing to do, so I feel uneasy making this recommendation going forward, but perhaps someone has more context on why it is this way
Also @icy trench the way zip works, you actually cannot just read the first 1000 bytes, because technically (though very unlikely) a zip file could list an entry as deleted, so unless you check the central directory record, you just don't know if what you're reading is right or not.
zip is a very odd file format
That's good to know!
The fact that the zip format is rooted at the end rather than the beginning is what makes it possible to append an executable at the start (a zipapp script wrapper, or a self-extractor, or whatever). The format is weird, but in a useful way.
Oh absolutely!
Is using JSON for metadata on the table? (Spent quite a while fighting EmailMessage in pyproject-metadata).
I think it is unlikely because I would like the metadata format to be backwards compatible
There is a lot of tooling that assumes EmailMessage and I don't really want to boil the ocean on that
The current metadata format is soo bad, though.Things like multilines and unicode are a pain to have to hack around for each tool. You'll notice most backend can't even use EmailMessage to write it, only to read it.
Also, is negative extras also being discussed? There was a recent issue on packaging-problems, but don't think it got into the wheel-next repo.
Currently wheel-next is as far as I am aware focused on the wheel format and metadata changes needed for changing the format to better support various improvements to the wheel format. I think negative extras could be it's own standalone PEP regardless of the wheel format, right?
If you want to follow the EmailMessage metadata fight:
- https://github.com/pypa/pyproject-metadata/pull/142
- https://github.com/pypa/pyproject-metadata/pull/150
- https://github.com/pypa/pyproject-metadata/pull/152
pyproject-metadata now uses a EmailMessage subclass with a customas_bytesand a custom policy, but at least it does use it. But it would be nice if that wasn't needed in the future, and we could just write and read JSON.
I think negative extras require new metadata or changed metadata, which seem like it might make since for wheel-next?
I mean changes to the metadata are somewhat orthogonal to the wheel format aren't they? They also touch sdists
(sorry early send)
The wheel spec only says "{distribution}-{version}.dist-info/METADATA is Metadata version 1.1 or greater format metadata."
Ah, true.
I could see an argument that a new file format makes it easier to make the transition to a new metadata format, but I also don't really want to sign up to write another PEP at this time, I already have 4 or so stewing in various states π
By the way, one of the perfect use cases for .dist-info at the back would be modifing the tags, but that was too tricky for wheel tags, which just copies it: https://github.com/pypa/wheel/blob/7bb46d7727e6e89fe56b3c78297b3af2672bbbe2/src/wheel/cli/tags.py#L122-L124
You could probably have a multiyear transition period from RFC 2822-ish format to JSON where both were included in the wheel. And/or automatic-ish conversion between the formats. But yeah, that's also not a PEP I'm going to write!
Metadata 2.1 defined a canonical translation from the key:value header format to JSON (after metadata 2.0's failed attempt at going JSON-only): https://peps.python.org/pep-0566/#json-compatible-metadata
So a new wheel PEP could just reference back to that and use JSON internally for the metadata files. However, any new metadata.json file would need to be shipped alongside the existing METADATA key:value file, otherwise you would lose the "just unpack the .dist-info folder" compatibility with the installed package database spec.
Some PEP is going to have to be the first that says "Ship both files, make sure they say the same thing" if we're ever going to be able to ditch the legacy key:value format.
The wheel 2.0 PEP could also recommend that installers generate metadata.json from METADATA for the wheel 1.x format.
Then some future wheel 3.0 PEP could ditch METADATA entirely.
Hmm, so would it make sense for #pyproject-metadata to produce both formats? (that is, at least provide the tools to do so easily?)
I see there doesn't seem to be a standard name for it, just a way to convert it
any reason to ditch METADATA in favour of json? not sure what the gains would be
@short epoch #wheelnext message
I think the biggest advantage is having explicitly terminated strings.
Plus it would be better for cross language tooling. If a build tool is in another language, it may not have an email parser, but it almost certainly will have a json parser.
And the email parser module in Python is a mess. Riddled with TODOs, strange inheritance, etc. And the parts we use don't officially support unicode, that has to be hacked in.
The format is basically "RFC 822" but with uncode and indented multiline strings" - when you modify a standard, it's no longer a standard π
Adding JSON support in https://github.com/pypa/pyproject-metadata/pull/168. Not that anyone can do anything with it until a file name is specified, though.
This is really interesting, thanks for the link. For the PEP I am working on (semi-unpublished but coming soon), I'm defining a short key:value RFC 2822 file for the new data, but I should probably point to this and define it to be JSON only.
Welcome to the world of email. For one of the oldest internet standards, and at-first-glance-simplicity of it, there are so many bolted on additions and variations that it's actually really hard to parse those formats correctly.
Just just take a break real quick and go create email-next
Once weβve got that wrapped up we wonβt have any more issues here!
oh that should take no time at all, I have the IETF on speed dial!
You already have the JSON metadata spec as a base. Just need to extend it to email messages.
Not it! I've already paid my email karmic debt back many times over!
Yeah, for a new file, definitely go with JSON (installation from a direct URL reference is already recorded that way, for example)
Getting out of the hole will be painful, but we can avoid digging it any deeper.
They can return it from HTTP APIs! Standardising a JSON metadata interface for PyPI was the immediate motivation for that bit of metadata 2.1, rather than anything involving the file formats.
project urls should be a dict π¦ I think that's the main thing I dislike
I spent most of today re-writing a draft of a wheel evolution PEP, based on the idea of using a different file extension for wheel 2.0 (and onwards). Now that I am done and went through the rejected ideas I think I am starting to be of the opinion it would be better to use *.whl.
My main qualm is that unexpectedly getting an sdist is probably quite bad
True, we added a special case for the comma-separated list in Keywords, but missed the comma-separated key,value pair in Project-URL. Maybe we could add an optional parallel project-urls field to the JSON, and then eventually deprecate the oddly formatted project-url list? A similar trick could be used to migrate from maintainer-email and author-email JSON strings to maintainer-emails and author-emails JSON lists (those are also comma-separated lists, but they didn't get special cased the way keywords did).
I like the general move to JSON for these kinds of largely inter-machine formats. TOML is great for humans, JSON terrible for humans, but JSON is fine for machine-to-machine.
my toughts exactly
Yeah, it's a messy file format based on my experience w/ packaging.metadata. I think we could start on a transition now if we make the JSON file optional, but require it must match what's in METADATA. And then we slowly raise the importance of the JSON format until we flip it and METADATA is optional and then eventually drop it.
(while we are at that, we could consider moving all* meta files in wheels to JSON)
i've run into multiple issues with METADATA parsing, so something saner would be a nice to have. off the top of my head https://github.com/pypa/setuptools/issues/3808 but there have been others
there are some confidence inspiring words in the spec π
However, email formats have been revised several times, and exactly which email RFC applies to packaging metadata is not specified. In the absence of a precise definition, the practical standard is set by what the standard library email.parser module can parse using the compat32 policy.
Added a test for a long version in https://github.com/pypa/pyproject-metadata/pull/174 π
from trying to figure out the correct format of METADATA files, i'd strongly favor switching from the email-ish format to json in a next wheel iteration
RECORD is a well-defined CSV format, and INSTALLER is a single line of text, so they wouldn't gain much from a migration to JSON.
Similarly, while entry_points.txt and EXTERNALLY-MANAGED aren't as well defined as a JSON document would be, ini-files aren't as dire a mess as email header parsing.
That leaves METADATA as really the only file where the status quo is so bad that a migration effort might actually be worth it. (We're already trying to contain the file format problem to these 5 files by making newly standardised files JSON)
from my perspective, we'd ideally have an archive of a single json file with all metadata and a directory with the module that is being installed.
both from the perspective of a wheel writer and a wheel installer, we only need the information in METADATA plus maybe a wheel version
I don't think we'd want a single json file, the different files serve different purposes and having them split out is useful I think
INSTALLER for instance doesn't come from the wheel, it comes from the installer, so a single json would mean that the installer has to mutate the file that comes from the archive
I agree with the RECORD, but entrypoints we could consider I think
files that are added after unpacking are a separate story, i only meant this for the wheel itself
Technically RECORD can't well describe a file that has a mix of commas and " characters I think...
But if you have a file like that a lot of things are going to break π
I'm pretty sure that's possible. You just put the entire thing in quotes and double every inner quote.
Oh interesting! Maybe I don't understand the CSV parser implementation well enough, I didn't realize it supported quote character escaping.
The multiple files are also there so checks for optional metadata can be done with a single stat call.
That said, entry points are potentially a decent candidate for merging into the main metadata file.
Agreed, that may be something I add to the wheel 2 format PEP
PEP 759, External Wheel Hosting: https://discuss.python.org/t/pep-759-external-wheel-hosting/66458
This PEP proposes what I think is a unique approach to safely hosting external wheels, while keeping PyPI as the source of truth for project and release metadata. The PEP proposes a new uploadable file format, easily derived from .whl files, which includes metadata to point to an external wheel hosting service, tied to organizational accounts. ...
Another reason to dislike the email format: https://github.com/python/cpython/issues/119650 - just found I can't use importlib.metdata.metadata("iminuit") because it has a custom license text, and the license won't parse correctly.
You'll be happy to know that I hope to publish the wheel evolution planning PEP later this week π
it seemed that each python tool i looked at had their slightly own rules for multiline values in METADATA, feel free to ping me if uv publish is clashing with parsing in Python somewhere
Do you have some examples? I could modify the format from pyproject-metadata to make it more compliant if there's a tool that produces better metadata. (PS: are there plans for uv publish support attestations?)
there seem to be different indent lengths and i haven't found a reference which one is the correct one; i saw the two version in https://github.com/pypa/pyproject-metadata/pull/150/files#diff-7d938dbc255a08c2cfab1b4f1f8d1f6519c9312dd0a39d7793fa778474f1fbd1L135-R141 and another tool (hatchlign or poetry i think?) had a third style; i went with pyproject-metadata out of pragmatism but i'd prefer something specified with common library support (e.g. a serde plugin); serde or toml would be the default choices here
i haven't looked into attestations at all yet - do you want to file a feature request with some context on motivations and the current state of pypi support?
that is PEP 740, right?
In the current 0.9.0 betas for pyproject-metdata, it indents to the width of the field name, based on setuptools. I think any indent causes it to be fine for the parser, and the parser keeps the indentation, so the consumer has look for common indentation and remove it. But I'm now wondering if there's some escaping mechanism that I'm missing (hatching seems to miss it too, if so).
You can look at how packaging.metadata tries to parse the license field since that would be what PyPI uses to validate
I wrote a pyproject-metatdata -> packaging.metadata round trip test this morning, actually. Was trying to get at the failure with License containing links but haven't reproduced it outside of importlib.metadata.metadata yet. I've got more things to try tonight, have to teach soon.
Ahh, good point about PyPI - yes, that happily parses it and adds a ... after some point. So I guess I should expect that test to pass. Great, that means I need actual importlib.metadata.metadata in testing.
PEP 777: How to Re-invent the Wheel
https://discuss.python.org/t/pep-777-how-to-re-invent-the-wheel/67484
While talking with people about a wheel 2.0 design, it became very clear that before we could talk about what a wheel 2.0 could look like, we needed to talk about how to get there (beyond just incrementing the wheel major version number!). This PEP defines a path to making wheel evolution easier, so that future PEPs can focus on the changes to ...
@toxic wraith just want to say thanks for putting up the proposal. I know the conversation is a bit derailed and frustrating, but I'm appreciative of the work you've put in.
Thank you! I really appreciate the kind words.
And a reminder you don't have to feel obligated to reply to everyone
True π
AFAIK there's nothing preventing one from making something a perfectly working zip file and also have a file header.
Zip files are weird: they have a footer instead. That means you can prepend whatever you want.
I'm not sure if you need to adapt offsets, but yeah.
This has been discussed in the PEP 777 thread. For compatibility purposes people don't like the idea of adding a header
Reading a single file from a zip is quite cheap, you usually only need two read calls (one for the directory at the end, one for the actual file). Downloading the entire zip file is not necessary if the web server supports Range request. I wrote a wheel inspector once that would extract metadata from wheels from pypi without fully downloading them. If there is version info embedded as a file-tag (an empty file with a special name) that that would be just one read call.
Yeah, both pip and uv (and I'm pretty sure poetry too) use range requests if the metadata isn't available via PEP 658
yeah, Poetry does range requests
Is there a discussion thread for PEP 771? I was just going to ask if an implementation in pyproject-metadata would be useful, I could probably put that together pretty quickly.
not yet, the author wanted to add a few more things before opening the discussion:
https://github.com/python/peps/pull/4198#issuecomment-2607862272
There is a DPO thread here https://discuss.python.org/t/pre-publish-pep-711-default-extras-for-python-software-packages/77892
[BIG UPDATE] The PEP PR is live: PEP 771: Default Extras for Python Software Packages by astrofrog Β· Pull Request #4198 Β· python/peps Β· GitHub We have been fairly at hard work with @trobitaille. We published and opened the PR on Github, we invite you to review it and comment it. As part of the Wheel-Next open-source initiative, there is a β...
@restive vessel - this is the Debian/Ubuntu spec for multiarch: https://wiki.ubuntu.com/MultiarchSpec
see specifically the "Binary package control fields" and "Extended semantics of per-architecture package relationships" sections. The problem space they have is a little more complicated than what wheelnext is currently pondering, in that they need to support simultaneously installing both (e.g.) i386 and x86-64 libraries. (It may be the case that this is worth pondering for Python, actually.) So there are two axes, whether a package is designed to not conflict with other architectures of itself (no overlapping files) and they can be co-installed, and whether a package can satisfy a dependency from another package of a different architecture. That second axis can be expressed as "this is always true" or "this is true if the depending package opts in to it being okay".
They only have one one dimension of variant, though.
(and you might find https://wiki.debian.org/Multiarch/TheCaseForMultiarch interestingly familiar π )
Very interesting !
hey, the link https://github.com/wheel-next returns 404. was it removed or is the access restricted?
it was renamed to wheelnext (all one word)
It looks like the channel topic needs to be updated to https://github.com/wheelnext
done!
Thank you!
@pseudo galleon here, here
Hello everyone, I just joined this channel by introduction from Donghee!
Since I'm delivering Python-based product to many air-gapped on-premise enterprise setups, the topics covered by the wheelnext community intrigued me!
Welcome! We'd love to hear more about your pain points and which current topics are of interest
there are many; i will summarize them later!
BTW, I'm currently in San Jose to attend NVIDIA GTC, and also it is possible to attend the summit on this Friday morning as announced in the website. It would be nice to get to know how things are going and share my interests if it's ok.
Could somebody let me know how to attend the summit? (possibly with one of my colleagues)
@sturdy imp I'm contacting you in private.
I can't guarantee it but I'm gonna do my best π
Long time listener, first time caller. Great sessions today, thanks for the organizers for organizing π
@thick basalt it was a pleasure having you !
Please all make sure you join https://contribute.wheelnext.dev to stay updated
Howdy everyone! It's Eli from the wheelnext summit last week!
Hi Eli!
is there a plan to produce a report of the summit?
Hey everyone! It's Vyas from the summit. Nice to see everyone here.
Hi
Sorry I forgot to check to discord @midnight marsh
Yes it's in process
I didn't realize there was a summit π
was it a one time event, or is it a regular meeting?
Hello everyone! I just noticed this page https://wheelnext.dev/who_are_we/
I wasn't able to attend the summit unfortunately but am quite interested in the effort nonetheless, so you can include Hatchling to that list and I plan to support whatever we do (not sure how that page gets updated)
@white jacinth that will be with pleasure. Can you confirm you received the meeting invite for next week ?
Super interesting to get your feedback on the work we are doing
Got it, I'll be there (virtually)!
Fantastic !
(on behalf of @restive vessel as their message got zapped by our filters)
@formal inlet @midnight marsh @thick basalt @sturdy imp @frail portal @short epoch @stone lodge @fathom lintel @frank hound @lost nest @sick nest @tardy crown @prime marsh @glossy cradle @hard escarp @nova knoll @icy trench @scenic pulsar @junior tundra @quasi robin @river linden @tawdry raven @glossy cradle
~ Sorry for the mass tagging - only once I promise ! ~
Did you all receive the monthly meeting invite by email ?
If not:
- make sure you're subscribed to: https://mail.python.org/mailman3/lists/wheelnext-announce.python.org/
- send me (@restive vessel) your email in private I'll forward you the meeting.
I did and I plan to attend!
thanks for the ping! subscribed now and got the invite from the archive
I can see it in the Archives. Now subscribed. Where's the date/time?
I just registered and subscribed to the mailman list!
I just downloaded the .ics and added it to my calendar
In the archive message, you could find the ics attachment file.
Ahh, I see it, thanks!
probably easier that way since we don't have to mess with timezones π
I have a 50% collision with another meeting if it's monthly on this day and time (including the one next week) but I might be able to make it sometimes and for special cases (like first meeting)
@lucid pike you were the one asking for that day & time specifically
I have a biweekly Wednesday meeting 10:30-12, this is at 11:00 for me, FYI.
We could move it to your "offweek" like one week forward or backward if that helps
Like "first wednesday of every months"
First Wednesday of every month would collide 50% of the time, approximately.
I haven't received an email (I'm not on the announce list) but I don't think I really followed what's going on with wheelnext (or contributed to the discussion) for there to be value in me attending π Mostly just posting to let you know I got the ping :)
Well, my US/Pacific mornings are terrible almost anyway you slice it, but I'm trying to keep things down to only double booked. There's probably no good time for everyone. I guess you could run a doodle or something to find the least bad time for the most number of people.
It's a great place to get an update on what's going π You dont necessarily need to participate. As you so wish
~ Adding some quick reference for WheelNext ~
https://wheelnext.dev/ - The reference - Our work & Summaries of meetings/proposals
https://mail.python.org/mailman3/lists/wheelnext-announce.python.org/ - Subscribe to our updates.
https://github.com/wheelnext - Your contributing journey to WheelNext Starts here
FYI work is no longer having me work on general packaging outside of my 20% time, so I don't have the bandwidth to participate deeply. I unsubscribed from the announce mailing list; anything I need to do to not get an invite in the future?
We should send out the meeting invites through the mailing list, so no more work should be needed on your end. It's a loss to the community to not have you working on packaging as much :(
agreed, but Brett's new work on AI tooling/MCP for VS Code sounds equally as awesome π
Get me enough money to retire and then we can talk about me coming back (and I'm not exactly disappearing, I'm just not getting 50% of my work time for packaging; there are still a couple of PEPs coming π)
For anyone wondering what Ofek is referring to because I haven't talked publicly about it yet, my work time has shifted to 70% tools for AI agents to use (i.e. like what all the new-fangled MCP servers do), 10% WASI, and 20% whatever open source I want (which will be motivated by the Python Launcher for Unix so that when my kid is old enough to use Python they don't come home from school one day and say, "Dad, why is Python code so hard to get running?" π )
I won't make the call this time mix of migraine and having the kids
I am running a bit late
@restive vessel hi, I'm waiting to be let in
Is it this one? https://mail.python.org/archives/list/wheelnext-announce@python.org/thread/KOI5BN7J73V5I7TDBAT6JVIXIQQ773C7/
Meeting ID: 250 773 415 763 2
Passcode: jE7Gw3so
For anyone interested in continuing the shared library loading discussion from the meeting, please comment here! I also note that @formal inlet already has a #dynamic-library channel, so it may eventually make sense to coalesce there (but for now, starting here since this is where everyone coming from our meeting will see)
Is the meeting over? Shall I stop waiting to join? π
they're doing Q&A I think
I had to drop off but I wanted to know how we'll handle the overlap between variants and compatibility tags. Are we planning to have build backends create variants for things like python versions or abi3, etc?
Oh! Maybe try joining again?
It is not this meeting
Absolutely! Sorry about the confusion over which meeting it is!
np! I read "The original event is the correct one" on the cancellation so tried the first email π
I also got confused and joined the wrong meeting π
@restive vessel β Is it useful for me to update our implementation in uv to match the latest changes?
Per my question on the call, I think it does make sense to use a variants.json per version
Otherwise, wouldn't we need to query every provider used in any version, in perpetuity?
The downside is that we need an additional registry query for every package-version that we inspect
FWIW, I don't think the dynamic-library implementation is suitable as a proper library loading mechanism
it works on the majority of systems, but it isn't inherently portable, so it's not something I'd consider adopting in a PEP
implementing it properly would require changes to the Python upstream, and even then, it's quite tricky
either that or modifying package binaries on installation
or sacrificing per-package dynamic dependencies, in favor of global dependencies
none of which is great
@wind cosmos as a highlevel summary of the workflow to integrate variants in an installer:
- Detect if
variants.jsonexists on the index, download it, parse it into adict - Pass the dictionary to this function: https://github.com/wheelnext/variantlib/blob/main/variantlib/api.py#L42 so that "variants gets sorted for you"
(the optional arguments are here for you to control/overwrite any default behavior - for now just don't specify them)
In return you get an ordered list of hashes (str) and you can do whatever you want with it and install the wheel that you want (respecting or not the order that variantlib provided you).
Not every hashes might be available on platform X (example no CUDA variant on MacOS) so you might have to go down the "hash priority list a few shots to find one compatible variant actually on the index"
Makes sense, thanks
(Just confirming that weβll likely re-implement all the variantlib logic in Rust)
@wind cosmos until the design stabilize I recommend not doing that though. The ground will be moving under your feet for quite a while still.
We purposefully designed things so that it's easy for you to do that. There is no "forced inheritance" on variantlib even on the plugin side. However plugins might actually depend on it because it's practical
We need to find how the plugin systems could work for you, we heavily used the concept of entrypoints. Can you read them from rust ?
Is the plugin system coupled to entrypoints?
Yes that's how we do "auto detection" we didn't find any other way
I thought the intent was to move closer to something like PEP 517
"Auto detection" of what? Installed providers?
Yes
Why is that necessary, though?
Aren't the variant providers declared in variants.json?
We might actually be able to relax this - but we still need auto detection to build variants
I think auto-detecting state from an existing environment is a design anti-pattern, personally
Can you explain why it's necessary for building?
$ variantlib make_variant \
--file xgboost-3.1.0-py3-none-win_amd64.whl \
--property "nvidia :: driver :: 12" \
--property "nvidia :: arch :: sm90" \
--property "x86_64 :: version :: 4" \
--output-directory .
Variant Created: xgboost-3.1.0-py3-none-win_amd64-a0e2749e.whl
We don't specify plugins (we could) - variantlib look on the system which plugin declare nvidia or x86_64 namespace and auto-validate the properties and inject the dependency into variants.json/METADATA on the plugin
we could add --plugin <package_name>
But then similarly
$ variantlib analyze-platform
variantlib.loader - INFO - Discovering Wheel Variant plugins...
variantlib.loader - INFO - Loading plugin from entry point: x86_64
variantlib.loader - INFO - Loading plugin from entry point: nvidia_variant_provider
#################### Provider Config: `nvidia` ####################
- Variant Config [001]: driver :: ['12.8', '12.7', ... '12.0', '12']
###################################################################
#################### Provider Config: `x86_64` ####################
- Variant Config [001]: level :: ['v4', 'v3', 'v2', 'v1']
###################################################################
or
$ variantlib get-supported-configs
variantlib.loader - INFO - Discovering Wheel Variant plugins...
variantlib.loader - INFO - Loading plugin from entry point: x86_64
variantlib.loader - INFO - Loading plugin from entry point: nvidia_variant_provider
nvidia :: driver :: 12.8
...
nvidia :: driver :: 12.0
nvidia :: driver :: 12
x86_64 :: level :: v4
x86_64 :: level :: v3
x86_64 :: level :: v2
x86_64 :: level :: v1
Would cease to work (not really critical they are more helpers)
From an installer perspective, I'll just repeat that I really think the design should be declarative: the user or the package declares the providers it needs, and tools handle the rest. Inferring from the existing environment makes everything stateful and also couples the provider environment to the target environment unnecessarily (unlike PEP 517, where the build backend and build dependencies can be installed in an isolated environment).
Honestly I'd suggest the same for (e.g.) variantlib make_variant. I think I'd expect the caller to specify the providers.
We might actually be able to do that. In the original design (up to 3 weeks ago) wasn't possible.
I believe we might have the tools to actually make it happen
And - as a bonus - it would probably fix our namespace clash issue as we move to default-auto-variant mode
Let's touch base on this. There's a very good chance that could be doable
Ideally (IMO), in that invocation, the user specifies the providers, and variantlib creates an ephemeral environment, installs them, and queries them. Then the whole flow is stateless and the user doesn't have to think about creating the env, installing thigns, etc.
I quite agree with you
How about caching
It's quite expensive to install X package at every install
Even beyond that - some plugins might take 1-2 seconds to execute - so you really want to cache their output
Now caching is "sensitive" because when to void it ? Maybe until you restart your computer ? We can assume you're unlikely to hot-swap a CPU or GPU π Most driver change will require a restart. And it would need to be a cache-to-disk because it could be reused from multiple uv sync commands
Give me a few weeks maybe a few weeks after PyCon to play with this. It might actually be more easy than we originally thought
I think we took very good steps in this direction
Yeah I'd probably cache the variant provider outputs and we'd support invalidating it on the uv CLI
(Separately, I also think that following the PEP 517 design is likely a good path for optimizing PEP acceptance)
(It's well-proven)
Indeed there seems to be some commonalities in terms of requirements
So the other piece I would need to implement here (implied) is: look at the variant providers and install + invoke them to understand the supported variants on the machine.
For now we assume that variants are already installed on the machine.
Variantlib load them for you and invoke them
The function I pointed above is really the main entrypoint to the logic
For a first implementation you really "just need" to pass the variants.json you downloaded as a dict to the function and it returns you everything sorted by priority.
Then you look on the index which ones actually exist/don't and you stop at the first available on the index π
We are trying as we speak to change some assumptions to allow auto installation.
Thanks for the idea of following PEP 517 design π
That's why i was saying dont reimplement variantlib just yet. The design is far from stable.
The only parts I would consider stable are:
- data model
- resolver
@wind cosmos do you mind glancing at: https://github.com/wheelnext/pep_xxx_wheel_variants/issues/35
I think this new design will allow us to do everything you mentioned and talked about
Took us quite a bit of time to find a "functional recipe" - we believe that this design would check absolutely all the boxes.
And from the pyproject.toml we stayed as close as possible to PEP 517 - it was a very good advice
Will review! Hopefully today
On first glance this looks pretty good
I think this is a major step forward in the design
Your comments were very useful actually π Thanks for that
FYI https://github.com/wheelnext doesn't have a direct link to the website (had to go to the website repo to find the URL)
Perhaps not clear enough that it is a link but this leads to the website
I've developed "badge blindness" over the years from some projects overdoing them, so I didn't even think to look at the badges for the URL π
Understandable π
Can someone put it in the org settings, like at https://github.com/python?
@restive vessel -- If I want to update the uv prototype based on the latest changes, what's the best resource to look at?
Do you want to reimplement variantlib? Or for now you're okay to use it?
I honestly think it may be easier to reimplement it, but for sake of explanation, let's assume I'll use it for now!
https://github.com/wheelnext/variantlib/blob/main/variantlib/api.py#L42
this function is virtually the "only entrypoint" you need .
- you resolve the version you would normally install for package X
- you check if {package}-{version}-variants.json exists on the index
if no => package not variant enabled proceed as usual
if yes => download it and pass the entire content as a dict to this function.
The function will do everything and return you a "sorted list" of compatible hashes.
You go one by one to see if you can find it on the index. You install the first you find. If you can't find any, install the non variant
I would recommend to rely on variantlib until the design stabilize and the proposal also.
It will really reduce the amount of re-engineering you have to do.
It's not exactly the funniest part to engineer over and over smthg π
However we think that interface is reasonably stable now.
Got it, thanks!
And where can I find an example variants.json file?
(Thatβs effectively post-processed wheel metadata from across the release, right?)
Use this index: https://mockhouse.wheelnext.dev/pep-xxx-variants/
You should be able to do uv pip install dummy-package and it should install
Downloading https://mockhouse.wheelnext.dev/pep-xxx-variants/dummy-project/dummy_project-1.0.0-py2.py3-none-any-36028aca.whl (1.3 kB)
Would install dummy-project-1.0.0-36028aca
Note: Variantlib can call both uv or pip in the background to install the plugins in an independant virtualenv. However given that we are using a special index. I suspect I'm gonna need to do some adjustments to read the --index and --default-index from uv.
Variantlib installs plugins? I assumed that was the responsibility of the installer
Is that a temporary thing for the prototype or part of the design?
We are not entirely sure about this part. If you want I can include a flag to deactivate this part. I think ultimately this will be complicated to store that in variantlib
Though assuming you install then yourself in a separate venv. How will variantlib know where to load them from? You're providing a path to the venv?
I'll need to look at the design more closely, but I think it's unlikely that you'll want variantlib to be responsible for installing and managing an environment
I was imaging that variantlib was like packaging: a well-isolated library, small enough to be vendored, that implements the standard
I don't think it should rely on an installer. It's also really hard to respect user settings, etc., if you're calling uv (for example) from within variantlib.
I guess I was assuming that the installer would interact with the plugins, and the variantlib API would be simple enough that you're just passing in data and getting data out without having to interact with any external systems?
So the installer would install the plugin in an isolated environment, ask it for the enabled variants, then pass those to variantlib, etc.
I'll comment on the PR
I'll need to look at the design more closely, but I think it's unlikely that you'll want variantlib to be responsible for installing and managing an environment
I actually agree with you - it was just easier for now.
Good thing is that I actually managed to implement cross-environment loading. So I should be able to allow you to install on your side - and me to load them.
These types of things are not part of the "PEP" per say. For now - let's just get it to work.
In the future we can think about what would be the best integration for installers.
I'm also completely opened to the idea to have many entrypoints depending on "how much do you want variantlib to do for you"
I think all your assumptions are reasonable - so far we are very much "playing with interfaces" and discovering - as we implement it - what we need and from where.
Sounds good!
@wind cosmos I have an interface complete for you to be able to install & controle the plugins (without us doing it).
However we don't have yet a good idea on how to load from an isolated venv - so only "current environment" supported [for now] we'll think after pycon how we can best do isolation (Installation is easy, the difficulty is cross environment execution)
I hope that helps
maybe we could try something like pex (https://github.com/pex-tool/pex) as I've tried in https://github.com/achimnol/splitbrain/blob/main/src/splitbrain/bootstrap.py ... though it increases reliance on yet another 3rd party
you could think pex as a simplified interface of managing a dependency tree + hermetic but volatile virtualenvs
the problem here is also cross-environment (and cross-interpreter) execution, too...
we should rely on stringified code snippets passed to the subinterpreter API
for variantlib, we would need another mechanism if it needs to support pre-3.13 CPython versions where subinterpreters are not available, though
though this is an extremely sketchy idea, but I think we could consider "packaging an entire dependency tree and loading it in an isolated environment" for plugin usecases as a new wheelnext topic...?
this is what I've shared with Chris Gottbrath in the summit on March 21st
Isn't that a very similar problem to build time dependencies in general? Maybe we should use the same solution
PEX in particular doesn't work on Windows (last time I checked)
maybe we could try something like pex (https://github.com/pex-tool/pex) as I've tried in https://github.com/achimnol/splitbrain/blob/main/src/splitbrain/bootstrap.py ... though it increases reliance on yet another 3rd party
In principle it's not a bad idea - except that you can't specify a standard and say "use pex" withoutpexbeing a standard itself.
And really we don't need that much complexity π
Isn't that a very similar problem to build time dependencies in general? Maybe we should use the same solution
Yes and pretty much solved withsubprocess.runwhich I think is the reasonable idea to use
ah, yes, we could take the same approach like build backends
regardless whatever venv isolation method we use, i think we also need to take care of managing (cleaning up) cached venvs from the user side or automating it
So far I have ~ virtually ~ re-implemented / borrow from PyPA/Build
It's actually working but realllly dirty
Sooo ... Let's just pretend it's not π
@sturdy imp will you be coming to PyCon ? Would be great to catch up !
Yes, I'm coming! @restive vessel
Hi everyone! Iβd love to contribute to wheelnext and wasnβt sure whether this channel is open to newcomers or not. I totally understand if it's not the case and I'll just lurk and read the discussions (that was my primary goal, get close to where innovation happens) π
I see that the recent discussions are about variantlib. I didn't get the chance to read the code in-depth and play with it yet, but from what I've gathered it's a toolkit to pick the most appropriate binary wheel for a given machine, taking into account the GPU backend, provider, CPU specific builds etc. right? It does resonate with me as someone who struggles continuously with GPU-aware packages (working as a machine learning engineer).
I was wondering where would an extra pair of hands be most useful right now for the project, if even it needs that π and I totally get that the project is still in its early days (I think it's not even mentioned in the wheelnext.dev proposals) so maybe you don't want a lot of people wandering around it xD
Welcome, we're definitely open to newcomers! The variants proposal https://wheelnext.dev/proposals/pepxxx_wheel_variant_support/ is what we are currently focusing on. Most of the discussion/work is going on in the issues for the proposal https://github.com/wheelnext/pep_xxx_wheel_variants/issues
Welcome @patent vapor absolutely delighted to have you onboard. Make sure you check the How to Contribute page and signup to the mailing list: https://wheelnext.dev/how_to_participate/
May I ask that you develop a little your recent struggles around It does resonate with me as someone who struggles continuously with GPU-aware packages (working as a machine learning engineer). . This is very useful for us to understand the problems people are facing.
Also do you wish to contribute on your free time or as part of your role in a company ? We are happy to provide credit for the work & efforts on the page: https://wheelnext.dev/who_are_we/
Will you be at PyCon ? If so let's catch up ! Happy to grab a coffee
I read the document and the issue #38. I'll go through the rest as well to understand the different perspectives and the current state. I'm unfortunately having trouble reading the DPO discussions, though I'm sure they're a trove of knowledge as well. I saw that you have tutorials as well! I'll try them out and I saw that variantlib is a dependency of the project (I understand now why it's not mentioned in the wheelnext.dev proposal!).
Thank you! I have registered to the mailing list.
My struggles are similar to the user story "A user wants to install a version of PyTorch that is specialized for their GPU architecture."
I work in a consulting company and in general we deploy for clients using their preferred stack on their preferred infrastructure. Sometimes it's not us, the developers of the package or the solution that deploy it, but other teams. And the issue here, is how to ensure they use the appropriate versions of our dependencies. The client might be using a different infrastructure than which we developed for initially. If it's only PyTorch, then there is a way, a bit clunky in my opinion, to communicate this information through the pyproject.toml. And, I think, this is thanks to PyTorch having an index for their wheels. It's not always the case though, and that's when issues arise because it's harder to communicate that information and enforce it.
The other issue, less common though, is when researching / testing different packages from ML papers. In many cases these packages are not mature and were developed just as a way to showcase the results of the paper or for reproducibility. So we have to manually go through the dependencies and look for which ones are GPU-aware (generally through our own knowledge of the field but that's error prone) and try replacing them with the corresponding versions for whatever we're testing on. Sometimes it works, sometimes not ^^'
I wish to contribute on my free time π but it's amazing to see such big players, both from open source and companies contribute to this.
And thank you both for the welcome, I really appreciate it ^^
DPO is probably not a good place to start, but also has a lot of past discussions on the topic which are useful to know. I don't think you need to rush to read all of the DPO threads.
I'll keep that in mind thanks! I'll check-in regularly on the DPO threads to follow along and not having to read 100+ long threads in the future haha
Will you be at Pycon?
As Emma said I would not get too deep in DPO. It's a rabbit hole of information.
The absolute most useful thing - not too much time consuming - is to participate on public channels / discussions and test prototypes. Try it yourself on your own projects. Have others to test it and provide feedback.
Many many of these projects are far from trivial and clear actionable feedback is absolutely critical to forge the best proposal we can. Especially from people who understand and are confronted to the difficulties these proposals try to address
Oh sorry I forgot to reply to that. I won't be at PyCon unfortunately :/ Hopefully next year! Hope everyone has a great time!
Thank you both for your advice and for the welcome. I'll test on as many different projects as possible and if I encounter any issues or if I have any quality feedback I'll share it on this channel or on GitHub issues.
hey hey everyone,
Looking forward to PyCon - we have published the most up-to-date tutorials on Wheel Variants.
https://github.com/wheelnext/pep_xxx_wheel_variants/blob/main/README.md
Your help is really helpful - thanks
@restive vessel Just landed to Pittsburgh today. Looking forward to seeing you at PyCon π
@languid bolt can you ask a few core dev of XGBoost to try the tuto ?
Yes, I will ask Jiaming to try it
I changed a few things π Please try again
Yes, I will try it again
Also I remember you saying you'd organize an engineering sprint at Pycon. Is it going to happen? Let me know the schedule
Sunday: talk
Monday/Tuesday Sprint
i'm trying to follow https://github.com/wheelnext/pep_xxx_wheel_variants/blob/main/docs/tutorials/meson-python.md
it seems to report a misleading error message, saying "You can not access plugins outside of a python context", when it fails to import the plugin entrypoint by mis-spelling (NVIDIAPlugin should be NvidiaVariantPlugin as in the followed successful command)
another strange behavior:
maybe "armv8-a" is not recognized as "aarch64 :: version :: 8a"
(it's a Linux VM run by OrbStack on a M4 pro laptop)
anyway this looks like a plugin-specific issue
@sturdy imp thanks for the bug report - I'm aware of the :"You can not access plugins outside of a python context" error - still need to identify the issue.
The rest will look into it π
Thanks a lot
on a development node with RTX5080 and AMD Ryzen CPU, the plugin's property detection seems to work as expected
@sturdy imp
anyway this looks like a plugin-specific issue
Yes it is - there are a few "quirks" witharchspec- we are figuring them out. It orthogonal to variants but we need to address it.archspecis really lacking on MacOS - inside docker it completely fails to report anything
"You can not access plugins outside of a python context"
Re-Open the tutorial for this one - I rebuiltvariantliband fixed the issue. And updated instructions π
Any success / issue on the install tutorial ? Should be pretty straightfoward
Hi everyone!
I think there is a very minor issue with the pip submodule of pep_xxx_wheel_variants.
E.g., at this line: https://github.com/wheelnext/pip/blob/27bbc0d370aaf80cc8d0a64c300d8971481bddf8/src/pip/_internal/utils/variant.py#L39
I don't think it's possible to use the same quote character inside an f-string as the one deliming the string itself for Python versions <= 3.11 (and pep_xxx_wheel_variants requires >= 3.9) since it was introduced in 3.12 (PEP 701)
At least with Python 3.11 I get SyntaxError: f-string: unmatched '('
I also have a question if you don't mind. I was trying to understand the issue #38 as an entry point to understanding variantlib. What I am not able to fully comprehend is the build isolation issue. Or at least, if the issue still holds.
From what I could gather from the issue, the preferred solution was to use something similar to PEP 517. I thought that I could try my hands at this, just as an exercise and tried to figure out the parts of variantlib responsible of this. And it seems to me like everything is fully implemented.
Running the tutorials with build isolation (e.g., isolated=True in AutoPythonEnv) while removing the raise NotImplementedError from IsolatedPythonEnvMixin and IsolatedPythonInstallerEnv seems to be working correctly, at least on Linux x86_64 with pip as env backend.
I was wondering what I was missing out in my understanding π€ or maybe the current code for build isolation was implemented right after the issue?
Thank you in advance!
PS: I mean by tutorials, the numpy, pytorch and xgboost tutorials.
I'm now struggling to learn meson because I have to migrate the old-fashioned setup.py to use it .... haha
Don't worry you don't need
You're building a GPU variant right?
Build your wheel the normal way. Add this to the pyproject.toml
Once done transform the wheel as a variant with this command
This is literally how we do it for torch @river lynx π
Changing the build backend is a complicated endeavor. That's why we created this tool. It changes nothing to how you build your wheel. You just "make it a variant afterwards"
ahha! π‘
but i've already almost done it...
and i also wanted to retire setup.py and setup.cfg anyway π₯²
btw, it will make the job easier a lot!
They're both separate steps effectively. You can do either change, in either order that you want.
almost there, but somehow the plugin namespace is not recognized... hmm
anyway, it's time for the packaging summit!
looking through the variant demos, no hatchling support yet? I might be able to hack something together though, it doesn't look hard
though, now that I'm looking through the repos, I'm way more interested in the native loader stuff -- I've been doing a variation of that since 2020 across a dozen or so wheels
it's just a variant of why FFY00 is doing over in #dynamic-library
I don't quite understand why it's using RTLD_LOCAL though -- I had ran into issues with that on macOS so I use RTLD_GLOBAL on that platform, but maybe it's just something with the libraries I'm using
if I understand the way that you patched meson-python, the idea is that you specify the variant that you're building using config setting? Unfortunately, hatchling doesn't support config settings yet (https://github.com/pypa/hatch/issues/1072) so it would have to be implemented by a build hook instead -- but I don't think it would support adding those extra metadata keys you need. Hm. I guess the wheel modification command is good enough for now.
i found the reason. variantlib make-variant requires pyproject.toml to specify [variant.providers.*] and [variant.default-priorities] while variantlib plugins scans all entry points in the current venv.
Now I succeeded to build the first variant-enabled wheel!
I also discovered that variantlib make-variant's -P option is going to be renamed to -p by https://github.com/wheelnext/variantlib/commit/ef78f42d2d681c00b393bb55ef2189c3a33d8f7b, so I just changed my build script to use --property for forward-compatibility π
In sprint: the variantlib design issue about how we are going to support nvidia driver version (in addition to the nvidia cuda toolkit version). e.g., cuda versions could be enumerated but nvidia versions... could be MANY, and we would need a rich version constraint expression support here.
This issue affects XGBoost FYI
Bradley suggested me to use the "paired" cuda toolkit versions instead of the nvidia driver versions
but this may not work in some cases -- where the application (like Backend.AI) heavily depends on NVML, etc.
also in the scenario of containerized CUDA apps, the host may not have the CUDA toolkit at all but only NVIDIA drivers
if the variant-aware wheels are installed in the host side (as part of a workload hosting platform like Backend.AI), the variant matching should run against the NVIDIA driver version number.
for simplicity and manageability, we could split or fork the official nvidia-variant-provider plugin for our purpose, but still the nvidia driver versions are too complex to embrace in the current variantlib design.
my only wish for nvidia-variant-provider is to best fit cuda drivers with cuda variant. i.e. 12.6 built wheel is technically compatible with a 12.4 driver (afaik) so for torch 2.7.0 it should install 12.6 if you have 12.4 installed locally (like on my test machine)
wait, is that actually true? I would assume that a wheel built against CUDA 12.4 is compatible with a prod system with installed CUDA 12.6 but not vice versa? (also this seems like something the driver should know for sure or not, so yes, if that is true, then I agree the driver should reflect that)
(or is the thing that building against 12.6 is fine if you don't use any new symbols since 12.4 which rarely happens, and so it turns into the same shape of problem as auditwheel/manylinux? I think in this case it's still the driver's responsibility to look at the wheel and determine that it is 12.4-compatible and label it as 12.4 instead of 12.6)
I've always been under the impression that they should be compatible within one major cuda version
(unless it's a newly added architecture)
if that's the case, then IMO the variants should just advertise the CUDA major version (so 11/12/etc. alone), and if necessary have compute capability as another variant axis
it's definitely possible to have binaries compiled against an older CUDA version to work with GPUs that weren't even announced when that CUDA version came out, though they won't take full advantage of the new GPU's power
The answers to the above questions about compatibility depend on a few factors. If you build against CUDA 12.4, then yes you are pretty much always safe to run on CUDA 12.6. The converse is conditionally true (I really wish I could give you a simpler answer on this part). If you only use the runtime APIs (functions starting with cuda* like cudaMalloc) and not driver APIs (functions starting with cu* like cuMemGetInfo), then you are almost always guaranteed that things will work (the exceptional case is if you are using a brand new API that specifically requires a newer driver, e.g. if you started using cudaMallocAsync on the minor version of CUDA that introduced it). If you use the driver API, then you cannot usually build with a newer driver and run on an older driver (yes, the term driver is unfortunately very overloaded here).
Yes, it is also possible to have binaries compiled with an older CUDA version work with newer GPUs. Typically this works by embedding PTX code in your binary. PTX is effectively an IR that can be JIT-compiled for any compute architecture, so at runtime the CUDA driver will detect that you have PTX and compile that for you even if you don't have precompiled machine code specific to your architecture.
(also sometimes new hardware is the same compute capability major, I remember being surprised at things showing up in 8.x. I believe that means you don't even need PTX/JIT, the same machine code will work, but using e.g. SASS built for capability 8.0 on an 8.9 GPU is presumably suboptimal in some way)
@sturdy imp if your project is open source, please feel free to ping me for review/questions for your meson-python migration (GitHub username rgommers). I'm a meson-python maintainer and would like to see wheel variants succeed, so happy to help here:)
thanks, I already finished this part! I have a question, though: is it possible to parallelize building cython modules when using py.extension_module()?
where is the current draft spec (or closest thing to a spec) for the thing where wheels declare in their metadata which providers they need? I don't see it in either wheelnext/pep_xxx_wheel_variants/ENGINEERING.md nor wheelnext/wheelnext/docs/proposals/pepxxx_wheel_variant_support.md
I don't think it is written somewhere yet, but if you inspect a variant wheel you can see it in METADATA, I believe (at the time of writing this I was commuting and didn't read your message well sorry!)
@sturdy imp multiple extensions will already be built in parallel. For a single .pyx -> .c -> extension-module there is no way to parallelize, since they're single file compiles.
I'd like to parallelize a foreach loop that serially calls py.extension_module(), not the individual calls of course. I was wondering if there is a commonly used pattern when each module can be compiled independently.
@sturdy imp that is automatically done in parallel. meson generates a ninja.build file in the build directory, and ninja will by default build with 2*n_cpu + 2 jobs in parallel. You can control build parallelism with its -j flag. You can pass that through meson-python like so when building a wheel: python -m build -Ccompile-args="-j6".
This is probably best continued in https://github.com/mesonbuild/meson-python/discussions/ or #meson-python.
I think that's probably true if your scope is to support all flavors of Python on all possible operating system. It would be great to figure out what reasonable scopes might be. I'm going to continue discussions on #dynamic-library, hopefully we can pull in some more people there
what are the exact semantics around variants? my understanding is
- a variant property is a triple {namespace, feature, value}
- a wheel has a list of variant properties (with at most one property with the same namespace and feature, maybe?)
- each provider outputs variant properties
- if there exists at least one variant property that was outputted by a provider, and is in the list of the wheel, then the wheel is eligible
?
so if I declare a wheel that says nvidia :: cuda :: 12.8 and amd :: rocm :: something, then it will install on machines with CUDA 12, machines with ROCm, and machines with both, but not machines with neither?
and if I declare a wheel that says nvidia :: cuda :: 12.8 and x86_64 :: level :: v4, then it will install on machines with CUDA 12 or machines with AVX-512 (or both)?
or am I misunderstanding
From my understanding, when you build a distribution, you can build different wheels for it, using different provider plugins. So like before you get the windows wheel, the Linux wheel etc., but now there's support for specific environments encoded by the hash.
So you can't build one specific wheel for both Nvidia and amd but you'll build two wheels.
Both these two wheels will have a METADA file that contains all the previous spec metadata and now including which provider plugins it requires (all of them, even if the wheel is built for one specific platform).
But I don't think that's what most of us should care about, the index will have a ...-variants.json like on the mockhouse from the tutorials. I think that file will allow tools like pip to fetch the correct wheel for your environment. It'll look at what plugins it requires, it can install them (in an isolated environment I believe?), get the value for your environment, the hash, and then fetch the proper wheel and install it. Note: the provider plugins give a relative preference for different values that are compatible with a given environment as well.
On an environment that satisfies no provider plugin then the default will be the non variant wheel.
So says PyTorch requires both Nvidia and AMD, it'll build different wheels for both. If you're on CPU, you'll get the CPU wheel by default.
My reading of the current implementation is that it will pull the json file and match a wheel that has at least one variant that the local plugins are advertising (and if there are multiple wheels the precedence stuff goes into effect, yeah, but at the moment I want to know whether a wheel matches atall), meaning that I can in fact build a single wheel with both nvidia and amd support, but I canot meaningfully build an nvidia+x86_64-v4 wheel. Specifically the patched pip calls variantlib.api.check_variant_supported() which does bool(list(filter_variants(...))), i.e., if at least one variant survives the filtering, then it returns True.
I'm curious if I'm reading it right, and if so, if this is intentional
Oh sorry I think I misunderstood what you meant by your question. I thought requiring different plugins for different environments. If you require two (or more) different provider plugins for the same wheel, then the environment must satisfy all conditions. Because in the code you link to, you have this allowed_properties=supported_vprops and they're built from all plugins (I might be wrong on this one, I didn't look for which code calls this function etc., to make a proper analysis). I might be missing something though. But just conceptually I think it's better to enforce an "AND" relationship rather than an "OR". "OR" introduces some kind of uncertainty I guess. You'd be asking questions like, "does it work on this environment or on that environment or on both" (in that case just do an AND). It wouldn't make sense to have a wheel that works in one environment but not on the other, better just publish two wheels separately. I guess π€
Hey there π
Thanks for the PyConUS talk, I had a few questions that were clarified by watching it π
Most of the problems described, are things that I'm suffering by packaging PySide and Shiboken (the Qt bindings from the Qt Company), so looking forward to give it a try
Also I help a bit PyPI support by addressing the size limit issues and it's an everyday issue for new projects requesting >1GB wheel sizes, or >100 GB project size
, so hopefully this can be addressed with wheelnext π
(I hope I never need a wheel like that...)
I hope that one day I'll play a jam session with Jimmy Hendrix π Until then I focus on what I can do and brush up my skills π
Jokes aside - happy that we are able - bit by bit - to move the needle.
@cosmic crystal do you mind detailling what was unclear before & what the talk answered ? It's helpful to know how we can improve the communication.
Also what are the specific bits you are struggling with - happy to get you involved in the design phases if you're open to it (or even have you contribute to the proposals!)
I'm definitely thinking about driving to Long Beach next week with a backline. PyCon jam session!
Wow PyConUS really sneaks up on you!
@restive vessel spammers got to us? Or was someone hacked?
Oh gosh ... Well I'm blocking immediately anyone to message the list without pre-approval.
Sorry about this
oh noooes, I did a long write-up and might have never hit enter :C I was on some conferences...so let me be brief:
I didn't know there were many companies behind
I though it was "only for having new tags"
when you mentioned solving the symbolic links during the talk, you got me. This IS one of the major issues we currently face.
With my pypi hat on: tackling the problem with other architectures, like gpu, is what we need, it's a very large of projects that decide to 'ship everything' just because we don't have a way to satisfy some gpu-specific shared libraries in a better way, ending up with bloated wheels.
Wheel variants is definitely the most involved proposal and in my memory showed up around the time of the WheelNext branding (or at least I heard about them both around the same time), so that's understandable! Discussions about symbolic links, making puns about "how to reinvent the wheel," etc. happened as far back as PyCon US 2024 (maybe earlier) but I don't think either variants or the WheelNext name was around then.
We kinda had this idea in the back of our mind for a while. We decided to unify everything under a single banner to clarify communication and make it easier for people to get involved.
@cosmic crystal really glad the talk resonated with you. Let us know if there's anything you'd like to get involve directly on. Otherwise getting your feedback on designs & proposals will always be immensely appreciated
Thanks for sharing more info, I for sure would like to look into the project, because I will probably move PySide6 to use it the moment I can allocate some time to work on it.
Another question, maybe that for sure is orthogonal to the project, but still wanted to ask: I didn't see in the docs/talk is Stable ABI, we are one of the few 'large' projects using it, and we had a problem for some time regarding getting the cp3X-abi3 tag by using pyproject.toml only, we were required to create a placeholder setup.py in order to fake an Extension and get the abi3 tag. This can be done now on pyproject.toml with the experimental option [tools.setuptools] .
( Here you can see the old fake setup.py Extension: https://codereview.qt-project.org/c/pyside/pyside-setup/+/649992/1 )
So I was wondering how is the relationship of the project with the development of setuptools and the options we have within pyproject.toml. Are you planning to contribute to setuptools or try to maybe add a new backend?
Hi all, I've just joined, looking forward to participating in discussions! π
The design of variants currently uses a pyproject.toml configuration, and I think we'd like to keep it that way. A static list of variants for each project can be dispatched to build backends to do dynamic things per-variant. We definitely want to work with packaging projects across the ecosystem to ensure support for variants where people will want them.
welcome!
Hi all, just FYI an updated draft of PEP 771 has been published, and I've opened a new DPO thread with information about the updates: https://discuss.python.org/t/pep-771-default-extras-for-python-software-packages-round-2/94905
Following the discussion on the first published version of PEP 771 in February, we are happy to share with you the updated version of PEP 771: Default Extras for Python Software Packages: Letβs discuss this updated version here, and no longer use the previous thread. Thank you to everyone who participated in the previous discussion! The ma...
Hi everyone, I was working on building variant wheels for XGBoost and found out that pip wheel no longer works with latest version of pep_xxx_wheel_variants. I submitted a small patch to fix it: https://github.com/wheelnext/pip/pull/9
Thanks a lot @languid bolt - unfortunately until we vendor variantlib into pip ... There will be broken APIs due to how PIP is working
Though your PR is merged π
reading https://github.com/wheelnext/pep_xxx_wheel_variants/issues/60, i was wondering if it makes sense to aim for generic plugins that are always enabled?
for example for the cpu microarchitecture/SIMD features, maybe the authors only publish for x86_64 (windows and linux) and aarch64 (mac), but the variant provider could support x86_64, aarch64 and other architectures, so that someone building locally gets (unless they chose otherwise) a SIMD-enabled build optimized for their CPU
this would reduce the configuration surface a bit as variant providers are always there, just don't do anything on platforms they don't know
Hi everyone, I just posted https://discuss.python.org/t/native-lib-loader-documentation-and-best-practices-on-using-native-libraries-in-python-wheels/98111 to continue the discussions around best practices for loading shared libraries. If you know anyone who would be interested please link them to that, as well as to the #dynamic-library channel here. Thanks all! Looking forward to continuing that discussion
Hi everyone, as part of the wheelnext effort Iβd like to share Sharing libraries between wheels documentation with you. Itβs inspired by @rgommersβs pypackaging-native with a more narrow and specific focus on the mechanics of how the underlying system libraries (namely the dynamic loader) handle loading libraries, as well as discussing the...
Hi everyone, my name is Travis, and I'm a core maintainer for conda, and I'm looking to contribute to this effort however possible πͺ π
Hi Travis, it would be a pleasure to have you on board.
Is there a specific domain or problem you would like to contribute specifically? We should have a wide array of endeavors to keep yourself busy π
The team has been hard @ work ! And we just posted a major update on DPO about Wheel Variants: https://discuss.python.org/t/wheelnext-wheel-variants-an-update-and-a-request-for-feedback/102383
This update is enriched by 4 blog posts simulataneously published:
Astral: A variant-enabled build of uv: https://astral.sh/blog/wheel-variants/
NVIDIA: Streamline CUDA-Accelerated Python Install and Packaging Workflows with Wheel Variants: https://developer.nvidia.com/blog/streamline-cuda-accelerated-python-install-and-packaging-workflows-with-wheel-variants/
PyTorch Foundation: PyTorch Wheel Variants, the Frontier of Python Packaging: https://pytorch.org/blog/pytorch-wheel-variants/
Quansight: Python Wheels: from Tags to Variants: https://labs.quansight.org/blog/python-wheels-from-tags-to-variants/
Eager to hear about your feedbacks !
We, the WheelNext group and contributors, are writing to share an update on our open-source effort to address some long-standing challenges within the Python packaging ecosystem, particularly the distribution of packages with hardware-specific or platform-dependent builds. For years, the community has navigated the complexities of installing sc...
In collaboration with PyTorch, NVIDIA, and Quansight, we've released an experimental build of uv with support for wheel variants.
If youβve ever installed an NVIDIA GPU-accelerated Python package, youβve likely encountered a familiar dance: navigating to pytorch.org, jax.dev, rapids.ai, or a similar site to find the artifactβ¦
Thanks for the warm welcome. Thereβs nothing in particular I was looking to contribute to so any available task would be fine. Maybe thereβs documentation work that could be done?
@coarse mantle are you looking to contribute as part of your free time or with Anaconda ? Janis seemed to have great ambitions for WheelNext, maybe have a chat with him. Otherwise happy to give you ideas where help would be much appreciated π
I had been vaguely following along with the wheel variant work (since CUDA stacks are definitely a challenge where venvstacks is concerned), but I hadn't realised you were that close to running a full scale concept demonstration. Very cool!
Thanks Alyssa. We have all major build backends implemented (except Scikit build and maturing WIP)
We didn't believe we could write a solid and functional PEP that fundamentally change so many pieces of the ecosystem without actually putting it to the test.
So this is the last step before we can finally finish the PEP draft and submit it officially to the community.
I think we needed to convince ourselves it can work and well first with some of the most complex scenarios.
Thanks for helping clarify some of the points in the discussion!
I have been following the discussion on discuss and we had some very similar discussions with Henry(cibuildwheel) and NVIDIA folks including Andy Terrel at SciPy this year. I also have serious concerns about an arbitrary plugin model that could be installed without user knowledge as this creates software supply chain security risks. I think that the discussion in the post experiment phase will need to focus in around this blind installation of variant selectors that cannot be inspected for provenance. My sort of two cents on this is that for the use cases I represent with Amazon, that requiring explicit declaration of the variant selector plugins is the model that I see as a feasible solution even more so for any use case where builds and installs are occurring in network isolated build environments.
With all that said, I am very excited about WheelNext and think it does solve a big problem for the community!
Thanks a lot @modern spruce there are a lot of ways to make that "plugin system to work", there are mitigation that can be applied (like using a static file as "analysis of the system" instead of "dynamic analysis" => a program running the analysis.
I'm not sure it's mentioned anywhere but we plan to enforce that plugins must NOT have any dependency period. If you need smthg - must be vendored. Greatly reduce the security exposure to supply chain attacks.
Some people and environments will probably prefer different things.
If we want to go there - it's possible that tools adopt a special security model for said plugins (like they need to be on a green list for example). To be on the green list your package need to be publish through trusted publishers, with attestation and get a reviewed at each release. I don't know I'm just throwing ideas.
If you're interested in helping us to build ideas to reinforce the security model of "plugins". I'd love to have you on board or contribute ideas. Anything you can think of.
I am going to spend some time this weekend taking a look at all the work in the github repos and start to digest where things are at so I can start to reason about additional ideas to help ensure the security model of plugins.
Really suggest reading the 4 blogs - I think it's probably the most currated "content" we have
They all have a slightly different angles.
https://astral.sh/blog/wheel-variants
https://developer.nvidia.com/blog/streamline-cuda-accelerated-python-install-and-packaging-workflows-with-wheel-variants/
https://pytorch.org/blog/pytorch-wheel-variants/
https://labs.quansight.org/blog/python-wheels-from-tags-to-variants
If you want to get more on "rough documentation": https://github.com/wheelnext/pep_xxx_wheel_variants/tree/main/docs
In collaboration with PyTorch, NVIDIA, and Quansight, we've released an experimental build of uv with support for wheel variants.
If youβve ever installed an NVIDIA GPU-accelerated Python package, youβve likely encountered a familiar dance: navigating to pytorch.org, jax.dev, rapids.ai, or a similar site to find the artifactβ¦
And obviously if you have questions / doubts / anything, you can ask here or open a Github issue on the repo above
On the ensuring-variant-names-are-not-valid-existing-wheel-names front, did the idea of using a field-internal separator with a currently disallowed character come up? Specifically thinking of name@label or platform@label. I know @ already indicates a direct URL reference in a full dependency declaration, and we generally prefer to avoid symbols that require escaping in file names and/or URLs, but neither of those seem like deal breakers. We permitted ! and + in the less common epoch and local version identifier fields after all.
(Posting here rather than DPO because that thread is long enough and I don't think the question is high priority at all)
yes, the discussion about that is happening in https://github.com/wheelnext/pep_xxx_wheel_variants/issues/5
Help me please understand something about wheelnext and UV at the same time
Currently I have 2 requirements files, one for AMD/intel gpus and one for nvidia
I do a bit off guessing if Nvidia-smi is available, if so I just install the Nvidia requirements.txt if not install the one for AMD
Ofc with the new pytorch 2.8.0 and cu129 they dropped support for pascal GPUs like the 1000 series from what I am gathering
In this case my approach, that installs torch 2.8.0 cu129 by default, completely bricks my software for users who own a pascal GPU ( or just falls back to CPU )
Would UV pip install torch automagically install cu126 that still has support for pascal?
This is what I have gathered from here, but I just wanted a 2nd opinion
π Hi, I'm the lead for the Spack project. Here to try to keep up with wheelnext, mostly wheel variants. I am not sure how easy that will be. I put a couple comments on discourse. As I think people know:
- we have variants in Spack;
- they can have single or multiple values;
- they're known to the solver; and
- you can use them in dependency specifiers, requirements, conflicts, etc.
I can at least talk about pitfalls, combinatorial issues, package evolution issues, etc. we've encountered with variants they way we do them. Also we're working on ABI compatibility constraints in Spack. That might also be of interest to folks here.
I would eventually like Spack to rely much more on PyPI metadata for python packages than it does, as I do not think we can scale to support all the packages users want. I am not sure we will ever want to use PyPI binaries, but it would be nice to be able to consume wheel variant info and use it to update spack packages for things that require native compilation.
i'm really curious about how spack solves this, are there docs describing the variants?
See, e.g.:
- https://spack.readthedocs.io/en/latest/spec_syntax.html#variants
for the syntax and whatβs allowed
Here for an example of a package with variants: https://spack.readthedocs.io/en/latest/package_fundamentals.html#spack-info
variants can exist or not exist depending on other variants (e.g., cuda_arch only exists when cuda is enabled), so spack info there is showing what variants there are for mpich and when they exist.
source for the package where this stuff is defined:
and the base package for cuda with common constraints: https://github.com/spack/spack-packages/blob/develop/repos/spack_repo/builtin/build_systems/cuda.py
Iβm not sure what else to point to but can try to explain moreβ¦.variants are a fundamental part of specs (package descriptors) and you can ask whether an installation of foo satisfies foo+cuda or foo cuda_arch=90. We also have a notion of microarchitecturesβ¦ you can ask if a spec satisfies, say foo target=x86_64_v4: which will get you anything with avx512 (using the libc generic arch names there but you could also refer to cascadelake or zen4 specifically).
Packages can define preferred values for variants, and the solver will prefer those in the absence of other constraints if itβs building, or itβll prefer what you have installed by default.
Is that what youβre looking for? I could talk about how we model variants or how this is done in the solver too.
Spack's community package recipes. Contribute to spack/spack-packages development by creating an account on GitHub.
Spack's community package recipes. Contribute to spack/spack-packages development by creating an account on GitHub.
A detailed guide to the Spack spec syntax for describing package constraints, including versions, variants, and dependencies.
Learn the fundamental Spack commands for managing software packages, including how to find, inspect, install, and remove them.
You can see variants used in the torch package as well β for things like conflicts and for dependencies on particular features: https://github.com/spack/spack-packages/blob/develop/repos/spack_repo/builtin/packages/py_torch/package.py#L133
π
sorry if that was TMI
If the user installs uv pip install on their machine, yes, the correct cuda126 variant will be selected.
Complexity arises if you are creating a Docker container and you are packaging torch on the user's behalf: uv will choose the variant according to the GPU of the builder machine, which can be potentially newer than the user's. You will need to explicitly instruct the nvidia plugin to choose a lower CUDA variant.
@restive vessel Can you chime in?
Yes absolutely. Wheel variant install detect SM support and install the right version of PyTorch for your machine. That englobes everything from Driver to SM (GPU) version.
@terse lintel interesting name btw π
@fringe birch I'd love to chat with you! When are you available? I'm so impressed with what you guys did with archspec and we are trying to make the scientific python community build on top of archspec for CPU / SIMD.
I'm sure you came to realize we took heavy design inspiration from you guys. The Spack project was hitting a lot of nails right on the head!
It would be super interesting to have a chat π
Maybe I can invite you to our bi-weekly meeting - every 2 weeks - I'm sure many would appreciate your perspectives
I made a pull request that never got merged and I never got mentioned for it π
So I am nothing but a Fake
But thanks for the help you bunch
no this is great, thank you! i only have to find the time to read and process those
@restive vessel @prime marsh Hi there. I tried the example of uv wheel-variant from the post on my macbook air m2. I had expected that it would install numpy along with torch. That is the behavior that I would see with venv / pip install.
Is it expected that it will default to 3.12?
P.S. No rush in responding. I'm not feeling 100% so am logging off for the day.
I'm not sure. A short example would be helpful.
@willow sparrow we didn't ship a mac os build of pytorch (it wasn't very interesting given that they dont have ARM optimizations for M1...4)
I believe pytorch does not have explicit dependencies on numpy.
I believe we have a variant build of numpy somewhere that is optimized for M processors - let me see if I can upload it for you π
Actually there are many gaps in archspec around Apple M processors - I hope to be able to work with @fringe birch to address them
I would like to do this in my free time so I can learn more about whatβs going on outside of Anaconda and not work specifically on conda stuff.
At the moment we are deep down into the PEP writing process and probably for a while ... And it's not exactly the most exciting bit of the process (though just a beginning).
On top of my head, I see 3 potential code contributions that would be appreciated and interesting:
scikit-build-coreneeds to be integrated withvariant logic=> @frank hound is the creator and maintainer, also part of WheelNextmaturinneeds to be integrated withvariant logic=> @bronze bear is the creator of the project and part of WheelNext.archspecneeds improvedARMsupport (namely Apple Silicon) => Maybe @midnight marsh would be interested to give you a hand on this one (maybe ?) (he's also part of WheelNext)- We use this in our X86 & ARM variant plugins
condaalso depends on it for the same reason
Any of them are of more interest to you ?
Maybe the ARM support for archspec would be a good one for me to work on. I own an Apple Silicon laptop.
Can you send me your github handle? I'll add you to the github org and try to describe the gap
@restive vessel , I took a look at archspec yesterday. It looks like adding support for M4 is more-or-less just adding a new entry in this file? https://github.com/archspec/archspec-json/blob/master/cpu/microarchitectures.json
Looks probably correct. Just check with the project leads. I'm honestly not an expert. I tagged you in a Github issue
Will do. Thanks!
I'd like to say, I'm not a big fan of the proposed changes to wheel filename schema to accommodate variant tags. Specifically I don't like the idea of having to parse the components of the name in order to try to decide which kind of component they are. It seems brittle and likely to cause problems in the longer term.
My idea is to have "optional" components of the name (aside from the build number for legacy reasons, but this could presumably be deprecated on a long enough timeline) be explicitly tagged, for example like foo-1.0-py3-abi-platform-variant=.... Alternately, leave optional components blank (with consecutive hyphens).
In the latter case one could even imagine "defaults" of -none-any- being abbreviated to ---; that would fail on older tools, but only for new packages.
(By my reasoning, the handling of build numbers was already not good, but at least it can be isolated...)
Basically what I'm concerned about is a future where someone has another good idea for something to add to the wheel filename metadata, and packages want to use that but not a variant, etc.
Why use the file name at all? Yes, current tools do so. Conda stores metadata inside the package, and loads it to build an index. The file name is only for ensuring that files do not clobber each other. PyPI has some capability here with the JSON API.
Relying on only the file name for all metadata causes needless conflict and terrible hacks. And no, you donβt have to download the whole package to get at this metadata in conda packages.
Presumably, because the filename is visible before downloading starts?
While it is possible to fetch metadata w/o downloading the whole distribution, you still need to download a whole file for each distribution under consideration. Whether that would be problematic in practice with variants, I'm not sure, but it doesn't scale well. It is quite fast to be able to fetch a package index page and filter out the vast majority of the distributions by simply their wheel names.
We could extend the simple API though to include metadata fields, though then that becomes a balancing act of including enough but not too much metadata in the simple responses.
another consideration: how likely is it that the variant-selection process causes rejection of a candidate?
because if you basically already know that you want some variant of the current wheel...
Presumably, because the filename is visible before downloading starts?
I think that you are assuming a PEP 503/691 simple index. PEP 503 is very much based around filenames, and I understand your complaints in that light. PEP 691 has more room for flexibility, with potentially new metadata in each file's dictionary.
What I'm talking about is much more like PEP 691. The main index file in is "repodata.json" - https://docs.conda.io/projects/conda-build/en/stable/concepts/generating-index.html#repodata-json.
Conda filenaming is described here: https://docs.conda.io/projects/conda-build/en/stable/concepts/package-naming-conv.html#package-naming-conventions. You may notice that the filename is a key in the repodata.json, whereas PEP 691 has an array of equivalent dicts. It isn't used for anything in conda. Making it a key just enforces uniqueness.
Regarding Richard's point, scale is exceedingly important. That was a millstone for conda for a long time, because the repodata grew unbounded and never fell off. The same is true for PyPI and other simple indexes, barring other software behavior (artifactory?), but PyPI handled it much better by serving the file index for one package at a time. Conda used the entire package collection. This has since been improved (https://prefix.dev/blog/sharded_repodata), so I'd expect conda and PEP 691 indexes to have metadata that's pretty comparable in scale. Conda might have more data, but it'll be a factor of 2 or 3 difference in metadata download, rather than the extraordinarily bloated whole-system repodata.
Filenames have served really well for a long time, but I think they are at their limit. Paul Moore said as much a while back: https://discuss.python.org/t/selecting-variant-wheels-according-to-a-semi-static-specification/53446/98
I drank the conda kool-aid a long time ago, but I'm not trying to say "just use conda" or "conda did it right, do it that way". I'm saying that I think filenames as the bearer of metadata are too limited, and we need something better. Conda has an example of something else, but there's bound to be other good ways, too.
Variants donβt really use filenames to convey much information. Itβs basically just an arbitrary identifier. Filenames probably have to be unique beteeen variants anyways otherwise static hosting gets more annoying / harder
btw - would it be possible to do something like shipping a pure pthon version of a lib and the mypyc/cython based speedups for that wheel
Shipping with and without accelerator libraries doesn't need variants, as the existing tags cover that (ship a none-any wheel in addition to the ones with compiled code)
what i wanted to avoid is having to have n wheels that ship dupes of the source
by the way if people are interested in @restive vessel and myselfs talk from PyTorch Conference about WheelNext it just got posted to youtube!
Lightning Talk: Hardware-Aware Python Packages ~ PyTorch and WheelNext Grab the Wheel! - Jonathan Dekhtiar, NVIDIA & Eli Uriegas, Meta
The PyTorch ecosystem thrives on innovation and a vibrant open-source community. PyTorchβs reach continues to evolve, fueled today by specialized hardware, variations within CPU architecture families, dedicate...
i just tried uv-wheelnext (0.9.9) to install torch on DGX Spark following https://astral.sh/blog/wheel-variants, and got this failure:
TRACE Resolver derivation tree before reduction
term root==0a0.dev0
root==0a0.dev0 depends on intel-variant-provider>=0.0.2, <1.0.0
intel-variant-provider not found in the package registry
TRACE Resolver derivation tree after reduction
term root==0a0.dev0
root==0a0.dev0 depends on intel-variant-provider>=0.0.2, <1.0.0
intel-variant-provider not found in the package registry
TRACE Error trace: Failed to resolve requirements from `variant.providers.requires`
Caused by:
0: No solution found when resolving: `intel-variant-provider>=0.0.2, <1.0.0`
1: Because intel-variant-provider was not found in the package registry and you require intel-variant-provider>=0.0.2,<1.0.0, we can conclude that your requirements are unsatisfiable.
error: Failed to resolve requirements from `variant.providers.requires`
Caused by: No solution found when resolving: `intel-variant-provider>=0.0.2, <1.0.0`
TRACE Resolver derivation tree before reduction
term root==0a0.dev0
root==0a0.dev0 depends on intel-variant-provider>=0.0.2, <1.0.0
intel-variant-provider not found in the package registry
TRACE Resolver derivation tree after reduction
term root==0a0.dev0
root==0a0.dev0 depends on intel-variant-provider>=0.0.2, <1.0.0
intel-variant-provider not found in the package registry
Caused by: Because intel-variant-provider was not found in the package registry and you require intel-variant-provider>=0.0.2,<1.0.0, we can conclude that your requirements are unsatisfiable.
Is this a configuration error (e.g., trying to fetch intel-variant-provider from pypi) or is just the torch package missing support for aarch64 (Grace CPU)?
@restive vessel @wind cosmos
https://pypi.org/project/intel-variant-provider/ is non-existent
in the debug log, it also says:
TRACE Attempting unauthenticated request for https://download.pytorch.org/whl/variant/nvidia-variant-provider/
DEBUG Traceback (most recent call last):
DEBUG File "<string>", line 6, in <module>
DEBUG import priority as backend
DEBUG ModuleNotFoundError: No module named 'priority'
TRACE Request for https://download.pytorch.org/whl/variant/intel-variant-provider/ failed with 403 Forbidden, checking for credentials
maybe is this the cause?
Is it a goal of wheelnext to handle arbitrary types of variants besides hardware and driver-based? As a simple example, would it be possible to ship both debug and O3 builds as variants?
The Wheel Variant PEP is Live: https://github.com/python/peps/pull/4740
Now copy editing phase and soon DPO (aprobably January-ish)
Yes absolutely ! Totally doable π
Hi everyone, I'm a developer from Huawei's Ascend Ecosystem team. We are reaching out to discuss adding Huawei Ascend NPU support as a new variant provider in wheelnext.
Do you know how to contribute a provider plugin to the WheelNext community?
Hi @thorn spade ! Absolutely π
Welcome in !
-
Open a PR to this file adding Huawei: https://github.com/wheelnext/wheelnext/blob/main/docs/who_are_we.md (make sure you got proper approval internally)
-
Send me your Github Handle, I'll add you to the WheelNext Github Organization
-
Create a repo inside the WheelNext org for your plugin, you can take example on the followings: https://github.com/wheelnext/pep_817_wheel_variants/tree/main/plugins
-
Do you have a package in mind for which you'd like to test that it works ?
@restive vessel Thanks for your help!
This PR adds Huawei to who we are: https://github.com/wheelnext/wheelnext/pull/115
We have two members in charge of this.
Github Handle: zhihangdeng and wjunLu.
We are pleased to join the WheelNext community!
We only made an experimental package for testing (simple, but it covers all the features we desired)
Amazing ! I'll review the PR tomorrow!
Welcome aboard
Hi @restive vessel , we have members woking on vLLM Ascend, a hardware plugin for running vLLM seamlessly on the Ascend NPU. We made some wheelnext variants of this package. In order to run some tests, we want to find an index to host these variants. Do you know what should we do?
Amazing work on the published PEP 817. Two errant details caught my eye so far (and hitting the second one prompted me to post them before I got distracted):
- the up front statement of affected specifications is incomplete (only binary archives are listed, and those certainly see the biggest changes, but source trees, source archives, environment markers, pyproject.toml, pylock.toml, and the simple index API are all also affected)
- the statement on how to handle variant environment markers is not correct, specifically this part:
If a non-variant wheel was selected or built, all variant markers evaluate to False.
That implies even variant_label == "" would be false, contradicting the preceding example. It needs to say the label marker is the empty string and the set markers are all empty (so "in" checks will be false and "not in" checks will be true)
And after finishing the whole thing, my only further comment is that we may want to be stricter on avoiding the use of the new variant markers in non-variant wheels, to the point of having PyPI disallow them outright (a null variant would still be free to include them).
Congratulations to all involved in putting that PEP together, it's an impressive piece of work.
Thanks Alyssa. You did a lot of the trailblazing on this, and your comment accordingly means a lot.
This case is actually even trickier than I thought, as I hadn't accounted for the index server metadata APIs (which are generally build independent). Expressing variant specific dependencies without confusing old clients is going to be tricky unless we do something like allowing variant-{label} as implicit extra names (I prefer the syntax in the PEP as the long term answer, so anything along those lines would just be a transitional mechanism).
I don't think you have to care about old clients TBH. As long as the mere existence of a feature (rather than a project using it) doesn't cause problems, then it's up to each individual project when they are willing to break compatability with old clients by using a new feature.
That's how basically every new, non-optional, feature works in software. If your library wants to use a new Python 3.N feature, you can't use it until you're happy breaking all your Python 3.N-1 users.
or you can make wheels that are version-specific despite having only python code
for visibility, we're adding two updates to the PEP https://github.com/wheelnext/peps/pull/45:
- Make value lists in
variantssorted, since the values are actually sets and this makes it easy to use==without having to convert everything back into sets. This matches the current behavior of variantlib. - Change the merging rule to say "result is the same irrespective of the order in which the wheels are processed". I think it's essentially a better way of wording the same goal (lack of ambiguity) than my original thought.
I've got a PR with variant support, gated by our experimental flag, in https://github.com/scikit-build/scikit-build-core/pull/1284 - it would be good if someone could try it out in the larger stack.
I will give it a try. Thanks!
Wheel 2.0 discussion is on topic here, right? Or should we make another channel so this one can focus on wheel variants?
Yes, absolutely
@stone lodge I updated the 'wheel greater compression' repository to support an inner .tar.zst. This is an LLM written streaming extract in Rust. With this formulation a client might unpack the inner tarball while streaming the download; store the rest of the ZIP to a sparse file or use other ZIP tricks so that the inner .tar is not saved to disk; then deal with the much smaller metadata portion. If the ZIP header does not look like .data.tar.zst then it should do whatever it was doing before. "Normal" users can still figure it out with the unzip command. My set of 63 big/popular wheels as determined from pypi bigquery (shown in repository) shrinks from 646M to 453M.
Does this work better than just allowing the wheel to use .zst directly? Regardless, that's good news (and slightly better than the results I was seeing with very rough tests using LZMA)
The point of putting an archive inside a ZIP is that the compression algorithm works across all the files in the inner archive, instead of each file individually. The advantage of ZIP's per-file compression is that you can read the files individually. Zstd is great because it compresses quickly and decompresses extremely quickly, so the total compress + send over network + decompress time is worthwhile. (Compare to a compression algorithm that is slower than downloading the uncompressed version of the file... we want to save time and space.) So if the metadata goes in the ZIP portion but the files go in the inner archive, we get good compression, quick metadata access, and follow most of the rules of wheel 1.
When unpacking the archive layer should feed the installer an iterable over (name, file object), so that the part of the installer that decides where files are placed on disk is ignorant of whether an inner archive is used, or not.