#packaging

1 messages ยท Page 1 of 1 (latest)

silk pagoda
#

tap tap Is this thing on?

#

Yes, it is.

dawn parcel
#

could you turn it up a bit?

inner yarrow
#

<enter> how do you turn this on <enter>

remote minnow
#

hello

<small>oops too loud</small>

hushed dome
silk pagoda
#

Closed -- thanks @hushed dome for the help there!

teal pivot
#

I also locked the issue

opal notch
#

interesting case we've just hit in uv: packaging sorts specifiers before emitting them, so when reading them pyproject.toml directly they may be different then when come from a wheel

#

e.g. if i say dependencies = ["anyio>=4,<5"], the wheel METADATA will say Requires-Dist: anyio<5,>=4

#

the core metadata spec says "If a field is not marked as Dynamic, then the value of the field in any wheel built from the sdist MUST match the value in the sdist.", but it never specifies what "match" means, is it semantic or string equivalence?

wide herald
#

I'd say it semantic

opal notch
#

this matters for us since we serialize specifier to the lockfile

inner yarrow
#

If you serialise, sorting (and normalisation in general) is not a bad behaviour IMO? It makes the output stable.

teal pivot
#

The intent of "match" in the PEP was semantics

teal pivot
#

But if you want an official answer I would open a topic on discuss.python.org so we can update the spec to be more specific

opal notch
dense yacht
#

The py code is all C โ€” it's a bit hard to follow ๐Ÿ™‚

inner yarrow
#

Itโ€™s not easy to follow even by C (and specifically Win32 C) standards. Mostly because itโ€™s essentially trying to cramp two executables into one. (The thing also serves as the python.exe proxy you get when you create a venv.

dense yacht
slow pagoda
#

It is invalid metadata. The name field is strictly required. I really dislike that, there are tons of reasons you might not wnat that, but it's required by the PEP.

#

pypa/build doesn't need this or validate this.

amber gust
#

Out of curiosity, what would be those reasons?

hushed dome
#

I don't know about a "ton" of reasons, but being able to omit name to mean "this file is here for development environment dependency management, not to define and publish a Python package" would actually be clearer than the status quo (where you sometimes have to make up a name to make some of the tooling happy). (I guess that becomes "tons of reasons" it you start trying to enumerate everything Python gets used for that doesn't involve publishing your own distribution packages)

#

After reading the linked issue, apparently "to let the backend define the name dynamically" is another use case.

teal pivot
dense yacht
#

What's the deal with .pypirc? Was it just made a standard by the PyPA or...?

soft mesa
teal pivot
soft mesa
#

yes

#

otherwise consumers have to maintain state themselves (probably also with a subclass) when working with editable installations of dependencies

slow pagoda
teal pivot
keen abyss
#

Hello guys do you have example how to easily package application, from my laptop, with specific platforms wheels?
Basically for my own needs I need to package golang inside a pypi package.

I was doing stuff manually with a setup.py, but found out it is now deprecated.
But by looking at pypa/build, i didn't found any way to specify the platform tag before building my wheel ๐Ÿ˜ฆ

#

Thanks !!

keen abyss
teal pivot
amber gust
#

It probably doesn't help the confusion that the URL for that says setup-py-deprecated. ๐Ÿ˜›

dawn parcel
#

easy fix, just officially deprecate it firLick

ruby narwhal
slow pagoda
# keen abyss Hello guys do you have example how to easily package application, from my laptop...

setup.py isn't deprecated, though there are other interesting ways to do things. There's not a native go-based builder yet, though if I was starting from scratch I'd probably try a hatching plugin or implementing the backend entirely locally over setuptools, which wasn't really made to be extended this way.

Then to build it, you should use cibuildwheel. That will ensure you are in the proper environment and run things like wheel repair tools.

summer crater
#

Hi everyone,

I'm currently working on packaging my Python project using PDM, and I'm facing some confusion regarding the best way to structure it. Hereโ€™s a quick overview of the situation:

My project includes three main sub-packages (monte_carlo, retrieve_vol, and pricer), which I intend to distribute as part of the Python package. However, I also have other directories (ib-gateway-docker and postgre) that contain Docker-related configurations for PostgreSQL and IB Gateway. These are used for infrastructure purposes and should not be included in the final Python package.

To keep things clean, Iโ€™m thinking of moving only the code that should be distributed into a src/ directory, while keeping the Docker-related files and infrastructure configurations outside of this folder. My current plan is to:

Move the code for monte_carlo, retrieve_vol, and pricer to src/tws_project/.
Keep ib-gateway-docker/ and postgre/ outside of src/ and exclude them from the package using a .pdmignore file.
Ensure that PDM builds the package correctly from the src directory and excludes everything not meant for distribution.
Hereโ€™s what Iโ€™m unsure about:

Is moving the code into a src/ directory the best practice for this kind of project?
Is there a better way to handle the separation between code and infrastructure (Docker files)?
How should I properly configure PDM to ensure everything is cleanly packaged and that nothing unnecessary (like Docker files) is included in the distribution ?

summer crater
#

Yes I think I would need help as I really don know how to organize my subpackages

inner yarrow
#

Moving source into src is almost always a best practice regardless of your other setup

summer crater
#

I deleted the setup.py which was a module from another try with setuptool

inner yarrow
summer crater
#

And @inner yarrow do you know if I can host my own Artifactory on my Raspberry Pi?

inner yarrow
#

I know you can but I am not familiar with how

summer crater
#
[tool.pdm.dev-dependencies]
editable = [
    { path = "src/tws_project/monte_carlo", develop = true },
    { path = "src/tws_project/pricer", develop = true },
    { path = "src/tws_project/retrieve_vol", develop = true }
]

This does not seem to work

finite temple
#

Is there any ordering recommended for compressed platform tag sets?

#

I realize to the spec they are unordered but are there any other practical reasons to order them? I'm generating wheels with both new and old manylinux tag ordering, and I'm curious if there is a common choice for ordering those

silk pagoda
#

I've definitely seen projects that generate things in both orders.

finite temple
solar wind
slow pagoda
#

I'd like it for PEP 639 too. ๐Ÿ˜‰

teal pivot
silk pagoda
#

I'm visiting family right now, but I should be able to cut a release some time in the coming days.

brazen gust
#

I'm not sure if this is packaging or pip question.. What I would like is if someone could point me in the right direction. I've built python for a unusual arch, arm32 windows and there is no version of msvc available for this platform. I would like to be able to configure pip to build packages with llvm-mingw instead of msvc. I believe this is something to do with setuptoold/distutils? but I'm not sure. I belive this was possible with older versions of python <=3.4? Anyone got any ideas how i would be able to configure pip so that when a package is built it would use a compiler other than msvc on windows?

wide herald
#

that would be something in setuptools or any other build backend that the package uses

silk pagoda
brazen gust
#

thankyou both for the timely response

vast sky
#

Currently it only defaults to mingw if you use a mingw built cpython

#

Afaik llvm-mingw contains a mingw python

dense yacht
#

I'd be happy to help out, if there's a need for more maintainers.

teal pivot
mental locust
ancient lake
#

Hey friends, I want to check whether a wheel is architecture-specific or not - given a filename of the wheel, and for that I am using parse_wheel_filename . I am a bit surprised that it returns tags as a frozenset which indicates there could be more than one.

Would

architecturespecific="any" != set(tags).pop().platform

be sufficient, or is possible that more than one tag is returned, some of them architecture-specific and some not? Which would mean I need to iterate over all of them.

I had a look at PEP 427/491, but after reading them, I am still not sure how there could be more than one tag entitiy.

Thank you!

north karma
ancient lake
teal pivot
finite temple
#

The specification for the sdist format says that the top-level directory needs to follow {name}-{version}. Additionally, it specifies that the name must match the metadata. But since the separator is a dash shouldn't the name be lowline normalized and correspond to, rather than match, the metadata entry?

amber gust
#

I imagine it's "match" in a loose sense. As in, they must normalize to the same thing.

wide herald
#

I think there was a PEP that said about normalized sdist names

finite temple
#

It might be good to specifically call out it should be normalized in the same way as binary distribution names

sinful raft
finite temple
#

Hm, that's unfortunate

sinful raft
finite temple
#

Maybe with wheel 2.0

slow pagoda
#

As long as the metadata name is not normalized, so users can customize the display on PyPI, etc. I'd be happy to follow recommendations if there was a clear set of recommendations, and I think other backend authors would too.

#

name in wheel is already normalized, and most backends normalize it for sdist too (scikit-build-core does if 0.5+ is used / set)

#

I would be fine to update scikit-build-core to normalize it in sdist top-level and dist-info names if that's a good idea. Version numbers is a bit more disruptive - 1.0.0 turning into 1 is what you are referring to, I'm assuming. I'd probably want it in a few more backends before applying it to scikit-build-core, or a official recommendation somewhere.

finite temple
#

No actually the scenario that occurred is I had a package name with dashes but the delimiter is a dash

slow pagoda
#

That's why I said "name in wheel is already normalized", you can't use dashes there so all the build backends normalize it to _. (Not sure about . and caps)

sinful raft
#

hm, names in wheels still aren't normalized on case.

#

but it's better than sdists ๐Ÿ™ƒ

opal notch
#

We had a lot of trouble with all the different name variants across wheels and source dists

ocean grove
#

wheel names are normalized where it counts, which is the separators. You can split them on - and rely on the part order and number being what you expect.

opal notch
#

Is it possible to assume a 1-to-1 mapping from sys_platform to platform_system?

#

Basically, this table with platform_system, sys_platform (reference):
Android, android (PEP 738)
iOS, ios (PEP 730)
Windows, win32
Darwin, darwin
Linux, linux
Java, java
AIX, aix
FreeBSD, freebsd
NetBSD, netbsd
OpenBSD, openbsd
SunOS, sunos

glad acorn
#

regarding sys.platform and platform.system, here's a recent discussion on the distinction:
https://discuss.python.org/t/clarify-usage-of-platform-system/70900/

hoary pecan
opal notch
#

Would you expect this to be "fixed" with the adoption of PEP 730?

hoary pecan
#

no, I think their semantics are different, sys.platform is targeted mainly for compilation support, while platform.system is more granular

#

I gave the example of iPad, but looking at the docs, but some Unix OSes will have the major version number appended in sys.platform

#

in fact, that should be the case for FreeBSD, NetBSD, OpenBSD, and SunOS in your example

opal notch
#

interesting, thanks!

odd steppe
#

But maybe those sys.platform checks are just wrong

#

(My guess is those checks are just wrong, and you're right)

hoary pecan
#

On Unix systems not listed in the table, the value is the lowercased OS name as returned by uname -s, with the first part of the version as returned by uname -r appended, e.g. 'sunos5' or 'freebsd8', at the time when Python was built. Unless you want to test for a specific system version, it is therefore recommended to use the following idiom:

if sys.platform.startswith('freebsd'):
    # FreeBSD-specific code here...
#

I'm trying to install FreeBSD in a VM to confirm though

hoary pecan
#
ffy00@freebsd ~> python
Python 3.11.10 (main, Oct 31 2024, 01:10:40) [Clang 18.1.5 (https://github.com/llvm/llvm-project.git llvmorg-18.1.5-0-g617a15 on freebsd14
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.platform
'freebsd14'
>>> import platform
>>> platform.system()
'FreeBSD'
#

so, yeah, those checks seem to be wrong

hoary pecan
#

just to confirm, this is still the case on main

#
ffy00@freebsd ~/g/cpython (main)> ./python
Python 3.14.0a3+ experimental free-threading build (heads/main:be8ae086874, Dec 17 2024, 19:27:56) [Clang 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2 on freebsd14
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.platform
'freebsd14'
odd steppe
#

Nice, thank you!

opal notch
#

Is there any place to look up the PEP 508 values of different platforms, e.g. for BSDs and non-{amd64,arm64}-arches?

#

I'm mostly looking for fields to pick the right compiler target triple (not things like platform_version)

#

Or put the other way round: Given a python interpreter, how do i find out my target triple?

hushed dome
#

Oh, joy, I've never tried to look up the official definition of gcc target triples before. And here I thought wheel tag parts were ill-defined... (well, they are, it could just be worse).

#

And with that, I'm (belatedly) going to bed. Good luck with the search for compiler clarity!

#

Actually, it occurred to me that for non-Windows builds, the SOABI config var might be what you're looking for:

$ python3 -c "import sysconfig; print(sysconfig.get_config_var('SOABI'))"
cpython-312-x86_64-linux-gnu
#

Since that includes the compiler target triple that was used to build the runtime itself (at least for CPython).

hoary pecan
#

though you might want to construct it manually depending on which compilers you have available

#

since there's some ambiguity, like the vendor field being missing, set to unknown, pc, or none, all of which mean the same thing

#

and you might want to fallback to a toolchain with a vendor (like buildroot) if others aren't available

opal notch
#

Thank you!

random depot
#

is it legal for a package's metadata to contain: ```text
Requires-Dist: pip @ https://github.com/pypa/pip/archive/1.3.1.zip


<https://packaging.python.org/en/latest/specifications/core-metadata/#requires-dist-multiple-use> would seem to imply that URL references are not allowed, except that it then links to [PEP 508](https://peps.python.org/pep-0508/) "for full details of the allowed format", and PEP 508 allows URL references
teal pivot
random depot
#

Hm. I'm surprised that's legal. It seems odd for a Requires-Dist to specify a particular URL when package frontends don't record the URL from which a package was installed. It makes sense to me that I should be able to specify where to install a package from when I install it, but less sense to me that one of my dependencies should be allowed to say where I install its dependencies from...

random depot
hushed dome
inner yarrow
random depot
random depot
#

oh - well, I stand corrected, then. I've never noticed that file before

hushed dome
#

sigh I was about to say that file was relatively new. I checked, and Pradyun accepted the PEP nearly 5 years ago, and the pip implementation landed not long after ๐Ÿ˜›

#

What even is time?

inner yarrow
#

Iโ€™d admit 2020 does not sound a long time ago

glad acorn
amber gust
#

I thought it was still September of 1993. ๐Ÿ˜›

wide herald
#

people born in 2007 are adult this year

wicked hare
dense yacht
#

Just thought this was interesting

#

Maybe worth calling out that any shouldn't be used there

teal pivot
# dense yacht

I think it depends on your definition of "fitting". What does a wheel have that is specific to CPython 3.13 and it's non-portable ABI but is OS and CPU agnostic? I.e. this is on purpose because when I wrote packaging.tags there were hardly any tag combinations like that and no one came up with a reason for that to exist.

#

I think there's an issue in the 'packaging' repo about it and I believe my final answer was, "someone can write a PR if they want"

finite temple
#

I think technically this could be a valid requirement to express but practically would never happen. If someone used FFI to access the non-portable ABI (via e.g. ctypes) it would be tied to the CPython ABI, but be platform independent

teal pivot
teal pivot
finite temple
#

Ah, they take the approach of just bundling all the libraries in one wheel. Not really "any" but it is somewhat of an approximation, so I understand where they are coming from.

teal pivot
sinful eagle
#

is my read of PEP 425 correct that the "python" tag and the "ABI" tag in a wheel filename are essentially free-form? Currently PyPI and packaging do no validation of these tags and users can put whatever they want in the filename for them.

#

(aside from general rules about what characters can be where in a wheel filename, of course)

finite temple
#

@sinful eagle The limitations from my reading of the PEP are:

  1. There are set prefixes for the major implementations, other implementation should use sys.implementation.name, which is freeform. I think you can do stricter checking if the tag is one of the set prefixes
  2. The PEP implies that there needs to be a version component in the Python tag (e.g. you can't just have pp-pp-any wheels), but you can have just a Python version (e.g. 2, 3). The implication from the PEP is that the version should be a sequence of [0-9_]
  3. the implementation prefix of the tags must match between the Python and ABI tags (e.g. you can't have py3-cp310-manylinux...). I wouldn't be surprised if this is violated somewhere though
#

Oh and from the sys module docs (emphasis mine):

sys.implementation

An object containing information about the implementation of the currently running Python interpreter. The following attributes are required to exist in all Python implementations.

name is the implementationโ€™s identifier, e.g. 'cpython'. The actual string is defined by the Python implementation, but it is **guaranteed to be lower case**

So it looks like you can assume that the implementation prefix is [a-z]

sinful eagle
#

yeah, there's quite a bit implied in the PEP but no strict guarantees ๐Ÿ™‚

#

fwiw I don't think there's any harm in allowing something like foo-1.0.0-blah-blah-any.whl to get uploaded, and it seems pretty unlikely something invalid would be generated by build tooling unless users are DIY

finite temple
sinful eagle
#

but I think any sane installer would just ignore the presence of that file on an index, right? or are you saying if the user explicitly tries to install that file

teal pivot
#

Basically, as @finite temple said, the only standard on things is the short names are reserved. Otherwise it's currently convention on how they are formatted.

hazy vector
#

I see not much has changed in packaging since the previous version, but if possible I'd like there to be a release soon to enable Android support in pip 25.1.

#

Let me know if there's anything I can do to help progress this.

teal pivot
#

We just need time. Feel free to open an issue asking for a release.

hazy vector
#

Thanks, will do

#

but the PR needs to be merged first

hazy vector
#

@teal pivot You reviewed the similar PR for iOS recently; are you able to take a look at this one?

teal pivot
#

Don't know; all depends on time. I'm ramping up on a new project at work, kid just turned 1 and so is hitting milestones which makes parts of life a little challenging, etc. I will say it is in the queue, though.

hazy vector
#

Understood

teal pivot
hazy vector
teal pivot
hazy vector
#

Thanks!

iron light
#

...Didn't the wheels for packaging previously include "stub" executables for Windows installers? Am I misremembering something? (I know they're in installer, but I thought they were in some other PyPA projects too.)

mental locust
#

distlib does, which pip vendors, I don't think packaging does

teal pivot
iron light
#

Ah, I think it must have been distlib I was thinking of. Thanks

tranquil ocean
#

My requirement is basically to host my own PyPI server and it should have search functionality too. So I thought of going ahead with Warehouse

mental locust
tranquil ocean
#

my bad, apologies, newly joined. Thanks for pointing it out. Much appreciated. Have a good day ahead.

valid knoll
#

Hi there! We're trying to pip-install a bunch of packages on a Windows Server 2022 and one line in our requirements.txt file reads

pyobjc-framework-accessibility==10.2 ; platform_release >= '20.0' and sys_platform == 'darwin' # via pyobjc

The platform_release (actually, platform.release()) happens to be 2022Server, which raises an exception (https://github.com/pypa/pip/blob/b73fc049e47e7c422015d08323d413adceb8435e/src/pip/_vendor/packaging/version.py#L202). This strikes us as a bug, but I can't find anyone else having reported the problem, which has us stumped. Has anyone seen this before?

GitHub

The Python package installer. Contribute to pypa/pip development by creating an account on GitHub.

#

To clarify, the actual platform release string is not what the packaging regex expects, so it raises an exception. I don't know if "2022Server" is the correct string to describe the platform.

#

The requirements file was generated by uv pip compile --universal if it matters.

wicked hare
solar heron
wicked hare
#

Do you recall, if the expressions in markers short-circuit?

valid knoll
#

Ha! It's the same pyobjc thing ๐Ÿ™‚ Thanks ๐Ÿ™‚

wicked hare
#

But I donno if it short-circuits

solar heron
#

It's been so long since I investigated this issue. All I remember is that it was dependent on the operand ordering.

#

That's not really related to short-circuiting though.

wicked hare
#

The spec doesn't seem to define specific evaluation order so I guess it wouldn't be wise to depend on it anyway

mental locust
valid knoll
#

We worked around the issue by using uv to install the requirements ๐Ÿ˜ฌ

solar heron
#

Honestly a fair workaround ๐Ÿ˜…

mental locust
#

I would be interested in becoming a packaging maintainer, as a pip maintainer who focuses on resolution issues I am significantly invested in PEP 440 compliance of packaging (well the living version of the spec anyway), and would like to be able to review and merge PRs that push packaging towards greater compliance.

I don't know the state of active packaging maintainers, so am not sure who is best to contact.

sage marsh
#

I think @teal pivot and @silk pagoda do most of the work now?

silk pagoda
#

Yea, that

teal pivot
iron light
#

I noticed that markers.py explicitly exports EvaluateContext, the Literal type used for the context passed to Marker.evaluate. In the documentation, there's no mention of EvaluateContext, and the description of evaluate spells out in ordinary prose that context should just be one of whatever constant strings.

... I can sort of understand why people would want to apply the type annotation to their own code. But do annotations actually work that way, such that you can just import a type name? Does the type-checker somehow simulate an import at type-check time to get the name?

iron light
#

(Also, Environment is not exported this way...)

teal pivot
#

And not documenting it is an oversight

#

Please open an issue and/or submit a PR

iron light
#

I'm working on my own fork but I can definitely do that too

#

and I did eventually find an example of such an import in the code :)

slow pagoda
#

I'm looking at https://github.com/pypa/packaging/pull/939 and it's looking pretty good. Currently every marker is being turned into a SpecifierSet (which can be costly!), and legacy version support is being relied on to keep things like extras working. But if an extra is parsable as a Version, things behave very strangely. AFAICT 939 looks like the right fix following the spec. (CC @mental locust ) (Looks like markers have quite a few issues, this looks like a pretty straight forward one, though)

GitHub

According to the environment markers, most markers are strings, with only a small subset being use to handle versions. As such, this PR changes the behaviour to use version comparison only on those...

mental locust
slow pagoda
# mental locust I've not looked at markers yet, there seemed to be competing ideas on if the spe...

I think this is pretty safe, the spec says that only three of these have Version types, and I can't see why you'd want the others, which are clearly not Python versions, to have Version types in opposition to the spec. Maybe you could take advantage of it for platform_release, but that doesn't follow PEP 440 and only might be parsable on some specific platforms - it's basically an arbitrary string. All the others aren't even remotely versions.

mental locust
#

I see, sounds good then, but I'll have a quick look at it before we merge it

slow pagoda
#

Needs to be rebased anyway, I can do that and then leave it up to you if you'd like

#

To avoid any user-facing changes, we could leave platform_release as Version too (with a note), since it looks like some poeple are trying to use it as such (#889). This might be why people are interested in lazy evaluation, need to check

mental locust
#

Yeah, that's my main concern, I don't want it to be a breaking change for something commonly used

slow pagoda
#

(For now, that is, longer term this probably should be addressed)
Yes, that's also the reasoning behind lazy evaluation in 877. I think we could include platform_version as a Version for now to avoid any behavior changes. It doesn't have the right behavior - it's basically unusable with > due to lack of lazy evaluation, but this would make it impossible to use with > (which does match the spec stands currently)

#

I think the spec needs changing to make platform_release work, but leaving it alone, fixing all the others seems safe, would be faster, and would help resolve some issues

#

Comparisons in marker expressions are typed by the comparison operator and the type of the marker value.
Oh, this is the tricky part, I think. I think the correct form then is this should depend on the expression used. So the fix should only be applied when == is used.

#

No, == is in version cmp? So according to the standard typing-extensions; extra == 'v0' should compare as a version and not a string, even though it is typed as a string?

#

Okay, I think a spec change is required, sadly

#

Not sure what the point of that table with string/Version, etc is then.

mental locust
#

Parts of the spec are self contradictory, I've not done a careful reading of markers yet

slow pagoda
#

I proposed an edit that would make it possible to move forward in https://discuss.python.org/t/100287

#

But I do think the spec needs changes first.

#

Oh, interesting, this situation used to raise an InvalidVersion error:

("platform_release >= '6'", {"platform_release": "6.1-foobar"}, True),

But now it is False (in contrast to the spec, which claims it should fall back to a string comparison, which is True)

distant void
#

Does the spec specify Python string comparisons for markers? (I'm about to go read it) This feels really similar to the bit of wording that we just clarified for arbitrary equality.

#

Oh, it actually looks like it is, in this case!

The <marker_op> operators that are not in <version_cmp> perform the same as they do for strings or sets in Python based on whether the marker value is a string or set itself.
#

That surprised me.

slow pagoda
#

The spec says any version comparison operator (which includes == and !=) makes the comparison try Version rules first. Before a recent change, that meant it might throw InvalidVersion, for example, in packaging. Now == and != always work, since they are defined for legacy versions, but < and similar have now changed to always return False instead of thowing an error, which is just as non-compliant, but not as obvious as an error.

distant void
#

Oh, maybe I'm reading the wrong part.

slow pagoda
#

version_cmp includes == and !=

mental locust
#

We can restore the behavior by calling Version on the version string before passing it into SpecifierSet, this will punt the spec question down the road

slow pagoda
#

Could we just do it on platform_release? ๐Ÿ™‚

slow pagoda
#

I'll make a PR, but I'm honestly rather okay with always returning False too, I don't like this part of the spec and throwing an error isn't following it either. ๐Ÿ™‚

#

Does == work for legacy versions? I know === would.

#

I guess it must or tests would fail

mental locust
#

Yeah, down sides on any choice, I'm too tired to think about it right now but I'll take a look.

Pretty sure == doesn't work on legacy versions since LegacyVersion got removed.

slow pagoda
#

Oh, intersting, it appears that older versions of packaing (with LegacyVersion) did always evaluate this to False too (from what I'm reading)

mental locust
#

Yeah, I think the transition would have been a lot smoother when LegacyVersion was removed if the logic of handing non PEP 440 versions had been put in Specifier/Set contains and filter

iron light
slow pagoda
mental locust
#

Thanks, I won't be able to reply today, but I'll add my thoughts tomorrow. I suspect Paul is going to want a PEP though.

slow pagoda
#

I'd be fine with just the first two points, which I think wouldn't require a PEP, and then making packaging follow the spec, then followup with a PEP later.

#

I think this should be solved before a packaging release

iron light
#

Trying to understand what's going on at https://github.com/pypa/packaging/blob/main/src/packaging/_manylinux.py#L185

It seems like this try to import _manylinux dates back to https://github.com/pypa/pip/pull/3497, and got migrated to Setuptools and then to packaging... and I also found https://github.com/pypa/pip/issues/3689 which suggests that it might have been added as a deliberate user hook?

But that seems like a really strange way to implement it, and I can't find documentation for the idea nor discussion of the design.

Was there a plan for something else to provide a _manylinux module in some circumstances?

iron light
#

so the mechanism really was standardized like that ahead of time. x.x
... It seems like the packaging test suite doesn't consider the possibility that the dev's machine might have such overrides in place.

#

(or rather, assumes that it doesn't)

slow pagoda
#

Would anything be better?

#

(That is, is there a way to improve handing when it does in the test suite?)

iron light
#

well, if people are able to test anyway I guess that's fine
but maybe it's worth mocking a stdlib _manylinux?

teal pivot
iron light
#

Because by my read of it, the "generate known list of tags" code is deliberately skipping tags deemed incompatible by an inserted _manylinux.py.

teal pivot
distant void
#

I've been tinkering with a tool for introspecting package metadata for a while, and started to flesh out support for various fields. I've found extras to be... surprising. I'm not sure if I've missed something, so here's my situation and question:

I'm trying to provide an interface to read optional-dependencies which defaults to project.optional-dependencies and falls back to build.util.project_wheel_metadata if it's dynamic. I'm okay with the output not being perfectly uniform (I might play more with that; the whole project is partly a toy for me to poke around). But is there a reliable way to take a Marker and check if it's tied to an extra? My initial try at this is simply "check if it contains extra == "foo"" but that will potentially capture weird stuff people might do with the marker language. Is there some more elegant way to ask this question?

teal pivot
#

You can evaluate the marker with extra set

distant void
#

Does that not evaluate the other parts of the marker expression? I ruled that out because (on a Linux box) this was what happened when I tried it:

>>> m = Marker('platform_system == "Windows" and extra == "cli"')
>>> m.evaluate({"extra": "cli"})
False

But if there's a way to do that correctly for this case, that would be great

teal pivot
#

None that I can think of as you're asking for partial evaluation

distant void
#

Okay, cool. As long as I'm not looking straight past something, I'll go back to having fun hacking around with it. ๐Ÿ˜

slow pagoda
terse stream
#

also, thank you for the shoutout!

#

this image doesn't render for me

slow pagoda
#

The image will only show up in the post. I probably should just delete the "per version" part, never bothered to get the logic right, never used it.

#

Dropped "Per version", fixed typo, thanks!

teal pivot
iron light
#

Should I expect these speedups in 26.0?

teal pivot
slow pagoda
#

I'm adding some links now

#

I might wait to publish it till the first RC is out

#

@mental locust Version used to be created 3.5 million times in pip's suite, do you have a more recent number than that?

mental locust
#

Looks great to me! Really hoping we get this out in time for pip 26.0 (late January) but even if not when it does land it should be a big improvement.

Once it lands I think I need to go through how pip uses packaging, I think there are some parts where pip duplicates work or has workarounds for old incorrect packaging behavior.

slow pagoda
#

Last I see is upper 500K's?

mental locust
slow pagoda
#

Damian also implemented a series of speedups related to reducing unnecessary
object creation, such as [making some computation lazy][pr-989], [caching
related versions][pr-985], and using the [cache in more places][pr-1005]. These
aren't less important than mine, it's just that I'm writing the blog post and I
have more to say about mine. ๐Ÿ™‚ Also, since his work focused on making pip's
resolver faster, some of the speedups are related to comparisons and containment
checks, which won't show up on my simple profiling. For his resolver benchmark,
pip was originally creating Version's around 3.5 million times, and combined
with changes he is also making to pip, it's now around 300 thousand.

mental locust
slow pagoda
#

I'll probably wait till the first RC, so the call to action is better. I can remove them for now, just in case.

mental locust
#

When I get a chance I'll take a look at using the statistical profiler on this benchmark also, I've not played with it yet, only used the cProfiler

#

Which is great for call counts, but can lead you astray in getting an idea of real time taken

iron light
#

To measure this, I started by just asking ChatGPT for some versions valid in
Python, it gave me 10 or so, then I multiplied that by a large number and that
gave me something I could run

Aren't there already a bunch in the test code?

#

(a few things don't seem to be rendering properly for me here; I'm assuming these are GitLab-specific Markdown extensions powered by JavaScript... oh, weird. Apparently there are separate gitlab.com and gitlab.net domains serving scripts.)

mental locust
iron light
#

mm

#

... regarding the stripping of trailing zeros, it's hard for me to imagine that multiple trailing zeros, or even a single zero, are the common case. So I don't think I buy the argument about throwaway lists; I would expect the performance benefit has more to do with avoiding all the reversed and itertools.dropwhile overhead for cases where it doesn't do anything

#

... oh, it would be generating one small list from the original tuple, every time. That's bad, yeah.

#

I think I would also try

while release and release[-1] == 0:
    release = release[:-1]

since that also avoids constructing a range and I would expect more than one loop iteration to be rare

#

... oh, actually I did see the SpecifierSet change in the main branch

#

In terms of editing, the only thing that really stands out to me is the use of https://en.wikipedia.org/wiki/Apostrophe#Greengrocers'_apostrophes for domain concepts; e.g. rather than "Version's" I would write "Versions"

The apostrophe (โ€™, ') is a punctuation mark, and sometimes a diacritical mark, in languages that use the Latin alphabet and some other alphabets. In English, the apostrophe is used for two basic purposes:

The marking of the omission of one or more letters, e.g., the contraction of "do not" to "don't"
The marking of possessive case of nouns (a...

#

The bit about the cost of the _Version NamedTuple is surprising for me, and makes me want to review some other things I've written...

#

but I guess that's not so much the type itself as the extra amount of name lookup... ?

unique flint
#

whoa, awesome to see all this perf work

#

i completely unrelatedly also happen to be making something in packaging run a little faster

#

coming here because i made https://github.com/pypa/packaging/pull/1019 , but i'm not sure about the docs error. doesn't seem specific to my PR but last commit to master was green with the same sphinx version โ€” seeking advice ๐Ÿ™‚

mental locust
#

Thanks, merged the fix

slow pagoda
#

Great, I've rebased the PR. @terse stream finding re.compile inside a function might be something easy to automatically catch.

slow pagoda
#

@teal pivot , @mental locust , and @silk pagoda Since pip 26 is coming out next month, and it would be really nice to get a recent packaging in it, and we are blocked on the packaging.python.org update, I'd like to propose the following plan:

  • Go ahead and merge #939 - only parse versions on certain keys
  • I'll review #893 - Publish to PyPI via GitHub CI (I've not used that specific method of displatch for deployment before)
  • We make a 26.0rc1 release
  • We try to get the packaging.python.org change merged before a final release.

That gives people a chance to try out this and our other changes before a final release, and helps us be ready to release as soon as that change is finalized, hopefully enabling pip to pick up the new version. I'd like an RC out for at least a week, so if we wait for that to get finalized, we might not make it out in time for pip. We also have pylock and a bunch of other things - I think this is already the longest packaging has gone without a release.

mental locust
#

I'm happy with that, but it's also not the end of the world if packaging misses the pip release, we're all volunteers here, schedules are all best effort

teal pivot
slow pagoda
#

We can commit to not making a final release till at least then, that's a week from today. ๐Ÿ™‚

mental locust
#

Yeah, I'm travelling until Jan 4th, so while I can approve some smaller PRs I can't do any big reviews or make any PRs myself

robust hollow
#

Love seeing the cool convos around performance improvements - I suspect that while it'll take a while for these changes to ripple through the ecosystem, it'll prove net positive over time.

slow pagoda
#

Tricky part would be setting up a benchmark suite

mental locust
#

And it being consistent enough between runs to rely on, which I've heard is basically impossible on free GitHub runners

solar heron
#

You can do differential benchmarking on GHA runners, but you can't keep track of absolute performance over time using GHA infra.

mental locust
#

I've heard even that can be very noisy if you're doing anything other than sub 1 second micro benchmarks

robust hollow
mental locust
#

I think most of these PRs have been optimizing with particular goals in mind, each one of them would probably require a different benchmark. And while there have been a bunch of PRs this last couple of months I'd be surprised if it continued.

finite temple
mint urchin
#

CPython core has a benchmark runner; Thomas Wouters is managing it right now I think?

opal notch
#

we're using codspeed for continuous benchmarking

#

they support both instruction counting (indepedent of the underlying runner) and walltime benchmarks

mental locust
#

Both CPython and Astral's benchmarking infrastructure were set up by full time employees for projects that receive many PRs per day, and both required analysis and fixing issues. Not saying it couldn't be done for packaging but I'm skeptical of ability and need.

slow pagoda
#

I don't think packaging changes that rapidly in core areas - while it's in "nice to have", I don't think it's worth much effort. I'd rather see a downstream test added, so we can see the effect of changes on major downstream projects (pip, not sure what else uses us really heavily?)

#

It would be nice for the work we just did, but most of the time, Version, etc. doesn't change much, mostly we add stuff (like pylock).

mental locust
#

pip could definitely do with some benchmark to see the effect of changes, in fact there are a couple of repos that tackle that from different approaches (including one of mine) but it's a lot of effort to set up and get working

slow pagoda
#

If there was a pip benchmark we could run with our main branch (or PRs), that would be very interesting ๐Ÿ™‚

mental locust
#

It would take a lot of effort to get that set up in a way that reproduced consistent numbers (even on a single machine that wasn't shared and within the same run), it's dooable, it'd just need someone to dedicate a lot of time to doing it, and probably a few PRs in pip to improve consistency

slow pagoda
#

How hard would a "downstream pip" check be, do you think? Clone pip, swap vendored packaging for the current checkout, then run (a subset?) of pip's test suite.

#

(No perf testing, just setting up a test)

wide herald
#

It's not that hard to do, we use similar workflow for testing between poetry and poetry-core, it's not even a 100 lines of YAML

slow pagoda
#

I've set up similar tests before, I was asking more about how hard setting up the pip part of it would be. ๐Ÿ˜‰

#

I've messed with the vendoring tool in pip a little, so I think not too hard

#

Do we have anyone that can set up the PyPI side of the trusted publishing PR with Brett out? @sage marsh or @silk pagoda ? (Or Paul, who I don't think is on here)

#

Should we enable release immutability in GitHub's settings?

mental locust
dense yacht
slow pagoda
#

I've been playing around with ASV. Here's a few benchmarks I tried, covering the lifetime of the project (by mistake, didn't really mean to run everything).

#

And a zoomed-in look at the spec constructor one.

#

The first big slowdown was Canonicalize version before comparing specifiers (#283) in 2020, and Refactor canonicalize_version (#793) in 2024 was the second one.

#

henryiii/chore/bench in my fork, for the curious

#

Ideas for more micro benchmarks welcome

terse stream
mental locust
slow pagoda
#

Should I include the Version construction time there?

#

I'll go with not including it for now.

mental locust
#

I think both would be interesting

#

Short resolutions would almost always have to construct them, longer resolutions would have then cached

slow pagoda
#

Here's the full run (the added one does not include one with the initial Version time, I had started it already)

slow pagoda
#

(ignore the version __str__ one, I mistakenly am measuring str(str) this time after a refactor, last time was correct)

slow pagoda
#

What do other maintainers want me to do with my asv based benchmarking? I could make a PR adding it to the repo (it's currently designed that way, in a branch), or I could make a new repo that just holds the benchmarks. There are benefits to both.

slow pagoda
#

I've applied all the perf/fix's I have open: (1022, 1024, 1028, 1029, 1030, 997), this is what it looks like (starting from 25.0).

iron light
#

Nice! I take it "ASV" is the tool generating the graphs / running benchmarks / ... ?

robust hollow
#

Nice work, @slow pagoda ! Very cool to see it work so well in a short time.

slow pagoda
#

Finally found a way to get __str__ noticeably faster (10%), I inlined base_version in it.

slow pagoda
#

#997 (__str__) and #1028 (renomalization) are still awaiting review. I've mentioned our plan for #939 in the packaging.python.org PR.

iron light
#

... is that actually called frequently for versions during resolution?

mental locust
#

It used to, because specifier classes used to do a lot of Version -> str -> Version as part of comparison, but we just got rid of that and we now have a super fast Version -> Version path

#

So, now I dunno how much str is called

#

Pip also used to do Version -> str -> Version to support Versions from distlib pkg_resources backend into packaging Versions, but I also just got rid of that when we're using the packaging backend

solar heron
#

pkg_resources rather*

mental locust
#

Oh yeah, that, sorry

iron light
#

Aha, of course, that was also discussed recently

#

26.0 seems like it'll be a big improvement

slow pagoda
#

It's not huge due to all our other changes, and pip is being adapted too, but it might help other packages that still use the string conversion instead of __replace__. And it's still used a bit here and there, a 10% savings shows up as a 3-4% in a couple of the other benchmarks I think.

distant void
#

Version.__str__ definitely gets used a fair amount inside of pip-tools. It's relatively small in magnitude because the only things we're handling that way are the inputs and any existing requirements.txt data, but I see no reason not to celebrate a little more perf given that pip-tools gets used by dependabot and other automations.

slow pagoda
#

So (primarily @teal pivot ), does merging 939 (only parse versions on certain keys) and the release process one (893) and making a RC release sound good? There's some comments about === on Discord about the packaging.python.org change, so that's still going to take time

slow pagoda
#

Whole life of project, with the fixed __str__ benchmark.

#

And zoomed in version of __str__ so you can see where version releases are.

slow pagoda
mental locust
mental locust
# slow pagoda Did you get a chance to look at this? I'm just finishing an update to the post w...

Okay, so the benchmark I was running I didn't set up in a reproducible way, and the number of Versions is non-linear to the number of packages being checked. Tonight I did a similar benchmark and using cProfile on pip main I see ~4.8 million calls to Version.__init__, and switching to packaging main I see ~400k calls to Version.__init__. I ran multiple times to confirm these numbers were stable on tonights benchmark.

These numbers will vary wildly for specific resolutions, but service to say in long resolutions users can realistically expect significantly less time spent instantiating Version objects.

mental locust
#

@slow pagoda FYI I just ran all pip tests against packaging main and I only needed to fix the new spacing around @ in a requirement string, actually a little surprised there wasn't some implicit dependency on non-spec behavior of packaging that got fixed since 25.0

slow pagoda
#

Is it okay with you if I include those figures in my post?

mental locust
#

Yeah sure

slow pagoda
#

I think all the behavior that changed is very much in edge cases, which means it will show up in the real world, but probably not test suites

mental locust
#

I guess so

slow pagoda
#

#1035 tightens up security on the CI, which would be good to get in before trying an RC release. For the RC, someone needs to run the nox -s release locally and upload the tag and make the GitHub Release, the environment only requires sign-off for deployment (which is manually triggered after those two steps). Do you want me to do those first steps, or does someone else want to? One of the PyPI admins will need to approve the deployment regardless.

slow pagoda
#

Session release aborted: Invalid arguments: non-integer segments.
Hmm, there's that. Slightly ironic that it can't handle version numbers. ๐Ÿ™‚

#

Also, I think I'll change the git push command to a printout telling the user what to do, I don't like a nox command pushing, and the remote name is assumed, too.

#

Also, personally, I'd prefer the release be a PR, rather than a push to main. Splitting it up like this would allow that, too. (Edit: it does several things in a row, at least it would be easier)

slow pagoda
slow pagoda
glad acorn
mental locust
#

Who needs to approve a deployment? On pip side we don't really distinguish between contributors and admins, so all contribtors can do everythings, so I'm not too familiar with the permissions structure of GitHub

slow pagoda
#

When creating the environment on GH, I can select up to a certain number (5, I think) (edit: it's 6) that can approve. Since there are four people that have ownership on PyPI, I just copied the same list to the environment on GitHub.

#

I think the release is not part of the deployment, actually; the instructions say to make it first, but we could wait until it's deployed to make the release, I think. That's also when the tag gets locked and immutable.

#

(I'm much more used to releases triggering deployments, but that's not how this is set up)

#

Here's the environment settings.

glad acorn
#

you can also add a team, so if needed can be more than 6

slow pagoda
#

FWIW, I rather like this setup so far, it's nice to be able to change the CI after the tag.

teal pivot
glad acorn
teal pivot
#

Not the most intuitive UI to get to it from the deployment list

#

Approved!

glad acorn
#

and it's uploaded!

slow pagoda
teal pivot
slow pagoda
#

Because that's where I triggered the CI from

teal pivot
#

On PyPI?

slow pagoda
#

It thinks that's what it deployed, but it's actually deploying a tag

#

Oh, wait, that's weird

#

url: https://pypi.org/project/packaging/${{ github.ref_name }}

#

That would be right if we were not deploying based on a sha

mental locust
#

@slow pagoda Are you going to make a DPO announcement about the release candidate? I assume you want to users to give it a try, I was going to post in a couple of public places

slow pagoda
#

I can, I'd like to make a quick adjustment to my post (and publish it) first. I'd like to say the final release is planned for 1 week (Jan 16) if no blockers are found, is that fine?

mental locust
#

Fine by me

slow pagoda
#

Where were you thinking of posting it? I could post in the packaging section on discuss.python.org, I think

teal pivot
slow pagoda
#

Probably missed adding that link, thanks, willl fix! Also forgot to link to the release notes.

#

I don't think that's the right number, actually, maybe that's why

#

986 (actualy 985 is better) is what it was supposed to be

mental locust
mint urchin
#

I just shared it on lobste.rs, hope you donโ€™t mind

slow pagoda
#
#

Sure, feel free to share. ๐Ÿ™‚

iron light
mint urchin
#

Heh, todsacerdoti runs a bot that reposts from lobste.rs to HN.

slow pagoda
#

Nice. ๐Ÿ™‚

hidden condor
north karma
#

i know i'm usually not very active in this server, but wanted to pop in and say the perf work is awesome!

#

also thanks for putting out the metdata 2.5 changes in the new beta; that allows me to begin testing them on warehouse ๐Ÿ™‚

lucid crag
#

The perf writeup is really good

slow pagoda
#

Thanks for testing! I'm hoping if there are any issues (especially with the edge case handling, though anything else is good too) they get caught before 26.0 is out on Friday. ๐Ÿ™‚

robust hollow
#

Not exactly a packaging issue, but: seeing some incompatibility with packaging_legacy's LegacyVersion (PyPI needs to handle older versions)

../lib/python3.14/site-packages/packaging_legacy/version.py:30: in __init__
    self._key: LegacyCmpKey = _legacy_cmpkey(self._version)
    ^^^^^^^^^
E   AttributeError: property '_key' of 'LegacyVersion' object has no setter

I know, I know packaging_legacy imports _BaseVersion - is there a better, stable thing to import?
If there's a better way to handle this, I'd love to know!

#

I think it might be as simple as removing the _key property from the subclass, but I'm not certain

mental locust
#

I'll try and take a look later, the packaging internal mechanics were significantly reworked in this release, anything based on them instead of the public API is likely to break

robust hollow
mental locust
#

The ideal solution would be to use public only APIs, I've never used this library, so I'll take a look later.

Also packaging 26.0 does better support legacy versions, when you pass them into filter or contains of specifiersets it no longer throws an error, it filters them appropriately, so maybe some use cases of packaging legacy won't be needed. I'll have a look at the warehouse code also to see if there are improvements that could be made with 26.0+

slow pagoda
#

We now assume _key is a property, while packaging_legacy just tries to set it.

#

We could either drop that assumption (which is there mostly for typing), or legacy_packaging could also make it a property

mental locust
slow pagoda
#

Ahh, yes, without typos it does

#

Second attempt. ๐Ÿ™‚

mental locust
#

I'm good with this approach, it's an internal property and it's only in the base class for our type checking

slow pagoda
#

Iโ€™m also interested in re-running the publishing infrastructure to make sure it works, so I think we could do another RC

mental locust
#

Sounds good to me

slow pagoda
#

I've tagged 26.0rc2, but the publish workflow has an error. I think #1051 fixes it.

mental locust
#

Well, good idea to do release candidates and iron these issues out

slow pagoda
#

@teal pivot (or others): the RC is awaiting review. I'll publish the release after this is approved (or you can, it's a draft release in https://github.com/pypa/packaging/releases so you should just be able to click publish once it hits PyPI)

GitHub

Core utilities for Python packages. Contribute to pypa/packaging development by creating an account on GitHub.

robust hollow
glad acorn
#

in CPython we have a three-pronged attack against GH workflows...

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: check-yaml

  - repo: https://github.com/python-jsonschema/check-jsonschema
    rev: 0.36.0
    hooks:
      - id: check-github-workflows

  - repo: https://github.com/rhysd/actionlint
    rev: v1.7.9
    hooks:
      - id: actionlint
slow pagoda
#

I often use check-jsonschema, but that wouldn't have caught this, I think actionlint might have, from what I can tell

teal pivot
terse stream
slow pagoda
#

Might do another RC with the Python 3.11.0-3.11.4 regression fix later today, if there are no objections

mental locust
#

No objections from me, very glad this was caught before release

slow pagoda
#

Me too. ๐Ÿ™‚

slow pagoda
#

Can someone ( @bleak stratus, @sage marsh , @sinful eagle, or @silk pagoda ) add me to the readthedocs (henryiii) so I can retrigger builds? And/or I'd like https://app.readthedocs.org/projects/packaging/builds/31005603/ retriggered, failed to clone. Edit: pushed another commit simplifying things, and that built fine, so no rebuild needed now.

iron light
#

glad to hear it enabled an improvement blobflushed

#

(is the inlining also significant?)

slow pagoda
#

I think that's the main improvement (I went back to using an integer to keep the diff small, I think that's faster than recreating tuples). I was relying on the return, your version didn't need a return.

#

FYI, I just checked, and if you ask ChatGPT how to speed up the original expression (the one in 25.0), it gives exactly the new version in that PR, which is rather embarrassing.

mental locust
#

Maybe it already read the PoC ๐Ÿ™ƒ, also my experience is just because ChatGPT says something is faster doesn't correlate well with if it is faster

slow pagoda
#

It does do while i, I did while i > 0, otherwise identical.

mental locust
#

I do wonder if there's an elegant way not to do a string copy when no slicing is needed, and if that makes any kind of difference

slow pagoda
#

I wouldn't be suprised, while doing advent of code in typescript I asked it for a library to do something specific and it pulled up a library less than 24 hours old that was written for the problem I was working on. ๐Ÿ™‚

#

Oh, I see that. It's a tuple, but yes, that's a tuple copy.

mint urchin
#

you used tachyon for the performance optimization work, right? How do you like it? Iโ€™ve been meaning to play with it but I always reach for samply.

slow pagoda
#

Yes, I did. I liked it a lot, having it built-in was really handy, and it has nice output.

#

I talk about it a little in my writeup

slow pagoda
# mental locust I do wonder if there's an elegant way not to do a string copy when no slicing is...

It's small, but I think it's measureable

| Change   | Before [944430f1] <henryiii/pref/simplerzeros~1>   | After [b699ea06] <henryiii/pref/simplerzeros>   |   Ratio | Benchmark (Parameter)                             |
|----------|----------------------------------------------------|-------------------------------------------------|---------|---------------------------------------------------|
|          | 950ยฑ5ฮผs                                            | 938ยฑ5ฮผs                                         |    0.99 | version.TimeVersionSuite.time_hash                |
silk pagoda
slow pagoda
#

I think it would still be useful for the future, but not pressing. I ended up pushing more commits.

silk pagoda
#

I was able to click the buttons for it on my phone to make you a maintainer. :)

You should have an invite now.

#

Idk how to feel about the fact that I mostly only have time to look at OSS stuff during my commute nowadays. ๐Ÿ™ˆ

slow pagoda
worldly acorn
slow pagoda
#

Yes, the release job in nox changes it

#

(Not in the PR, though)

#

Not sure if I like it, but thatโ€™s the way it was set up

worldly acorn
#

Oh I see, makes sense. Thanks for making the new RC, it'll help my coworkers who are on 3.11.2 for Reasons ๐Ÿ˜„

slow pagoda
#

Tag pushed, deployment awaiting approval, draft release saved.

#

First time we didnโ€™t have to fiddle with the release job. ๐Ÿ™‚

slow pagoda
teal pivot
slow pagoda
#

Thanks!

#

Should we do a final release tomorrow around this time?

mental locust
#

The ._version compatibility missed rc3 but would be in final, right?

slow pagoda
#

Correct. I don't think we need an RC for that, unless someone wants to test with it. I manually tested hatch.

mental locust
#

Sounds good, I'm working on a PR for hatch right now

slow pagoda
#

I'd rather people testing see the missing error vs. an ignorable warning. ๐Ÿ™‚

mental locust
#

Makes sense

slow pagoda
#

I ran wntrblm/nox against RC 2, it's fine.

worldly acorn
slow pagoda
#

If we wait till Monday, does that still give enough time to pip?

worldly acorn
#

oh sorry I am not aware of constraints with pip

#

just feels like a short time interval after an RC

mental locust
#

Yes, that's enough time for pip, I'm planning to do pip release on 30th

slow pagoda
#

Okay, let's target Monday

mental locust
#

There's a few things that need to be done after vendoring packaging, so it would be good to have at least one week

teal pivot
iron light
#

Those are remarkably small wheels for a project that actually includes C code. Nice to see that small .so files can be made easily enough (with plain old Setuptools, apparently!) when there is only a small amount of code.

#

(I feel like I should have just tried this for myself, years ago...)

slow pagoda
#

C (also C++, though less so) code does produce small wheels. It's dependencies that make it large.

iron light
#

Makes sense. (Maybe in the long run, PEP 725 can help with that by enabling more dynamic linking to the existing system... ?)

slow pagoda
#

I was playing around with lazy imports, and at least for the two files I was looking at, there's not much we could gain, though I was able to keep our utility file from being imported when getting a Version. Bairly measurable, though. The biggest savings would be if we could avoid a typing import, but due to TYPE_CHECKING and type aliases, that's somewhat involved, and would require third party code to also do it. And we couldn't avoid re, which is the main thing that typing is importing that's a bit slow, so not too much could be gained there. Relative imports aren't supported with __lazy_modules__, need to check to see if that's intentional; that ruins it for packaging since we are often vendored.

#

Also, I profiled for all Python versions, and the canonicalize_name speedup is oddly version dependent. It's faster on 3.8-3.11, a bit slower on 3.12 and 3.13, and much faster on 3.14. It seems there was a speed regression for str.translate for those two versions, perhaps? 3.14 is the fastest version for the new canonicalize_name, normally 3.11 is the fastest version. I think I'm using uv's Python everywhere, but it's possible asv isn't picking the right Pythons. But the rest of the benchmarks look fine, and this one looks fine before the move to translate changed the order.

#

No new issues have come up, so I'll prepare the changelog for 26.0.

#

The fastest version does depend on the benchmark, actually, 3.13 is fastest in quite a few of them. The colors change between plots, I didn't notice that before.

#

(Here's an updated __str__ benchmark, which now includes Version construction time, for example.)

#

And Version construction

distant void
#

Since [this short discussion](#general message) got a positive reaction from Brett, and nobody piped up to object, I started working on moving dependency-groups into packaging.dependency_groups in the last hour or so. I figure that for this, starting with a PR with no prior issue is okay? I'm sticking almost entirely to a lift-and-shift but I might make some minor improvements like having more specific error classes.

slow pagoda
#

Yes, no issue required.

distant void
#

Back on the fun benchmarking topic,

It seems there was a speed regression for str.translate for those two versions, perhaps?
It certainly looks like it; I assume you checked this before mentioning. I just tried a quick timeit benchmark and it looks like 3.14 is 5x faster than 3.13 and 3.12, but only 2x faster than 3.11.

#

Given that str.replace is much faster than str.translate, and str.lower seems to be quite fast, is the translate usage the fastest option here? I think it's more readable, but if we really want to micro-optimize, might name.lower().replace('_', '-').replace('.', '-') actually win out?

slow pagoda
#

I can check

distant void
#

I was testing with a big string, where that chain of methods is faster, but it may look different with a more realistic set of strings for the input

slow pagoda
#
Change Before [d5398b8b] <main> After [712338f5] <henryiii/perf/doublerepl> Ratio Benchmark (Parameter)
- 5.98ยฑ0.02ฮผs 3.09ยฑ0.01ฮผs 0.52 utils.TimeUtils.time_canonicalize_name [PU-H2WF61JRQ6NY/virtualenv-py3.10]
- 5.06ยฑ0.01ฮผs 2.36ยฑ0.02ฮผs 0.47 utils.TimeUtils.time_canonicalize_name [PU-H2WF61JRQ6NY/virtualenv-py3.11]
- 9.51ยฑ0.04ฮผs 2.42ยฑ0.04ฮผs 0.25 utils.TimeUtils.time_canonicalize_name [PU-H2WF61JRQ6NY/virtualenv-py3.12]
- 9.64ยฑ0.01ฮผs 2.37ยฑ0.01ฮผs 0.25 utils.TimeUtils.time_canonicalize_name [PU-H2WF61JRQ6NY/virtualenv-py3.13]
- 3.96ยฑ0.04ฮผs 3.09ยฑ0.05ฮผs 0.78 utils.TimeUtils.time_canonicalize_name [PU-H2WF61JRQ6NY/virtualenv-py3.14]
- 5.91ยฑ0.01ฮผs 3.11ยฑ0ฮผs 0.53 utils.TimeUtils.time_canonicalize_name [PU-H2WF61JRQ6NY/virtualenv-py3.8]
- 5.88ยฑ0.02ฮผs 3.16ยฑ0.03ฮผs 0.54 utils.TimeUtils.time_canonicalize_name [PU-H2WF61JRQ6NY/virtualenv-py3.9]
distant void
#

Discord doesn't like your markdown ๐Ÿ˜‚
But that looks like an improvement to me!

slow pagoda
#

It does look like it, expecially one 3.12-3.13. Not so much on 3.14, but still faster.

distant void
#

On the flipside, maketrans + translate is very tidy to read.

slow pagoda
#

Interesting, since this makes 3 strings where translate only (in theory) would need to make one.

terse stream
#

str.translate is very nice indeed

distant void
#

Yeah, my intuition is that this should be much slower, but translate felt subjectively pretty slow to me when I timed it, which gave me the idea of testing lower() against it. And that was a full order of magnitude faster than translate, which starts to make you consider it.

slow pagoda
#

28-item translate is faster than two-item translate + lower, but I guess str.replace is much more optimized (it does get used more, I guess)

terse stream
#

maketrans just builds the mapping which translate uses

slow pagoda
#

translate only supports one-item replacements, so I'd assume it's different code than replace, but I think it should be faster

distant void
#

This is the implementation of replace, I think? (I don't poke at CPython internals much)

slow pagoda
#

4x slower than chained replacements is really bad in 3.12/3, and 2x slower before that, but it's still 30% slower than chained replacements in 3.14.

terse stream
distant void
#

I'm sort of guessing that lower() and replace() can be intelligent about the fact that the characters in question are single-byte ASCII text, but that translate has to do a slower walk to handle encoded text

terse stream
#

but yeah for small stuff i guess it doesn't matter too much

slow pagoda
#

It looked like there's a one character replace optimized impl?

distant void
#

Oh, yeah, I see that now. unicode_fast_translate. How is it so much slower than the chained methods then?

terse stream
#

method chaining probably involves extra lookups in python land

distant void
#

Yes, but we've demonstrated that the method chaining approach is actually faster. So all of my expectations about what should be faster here are being subverted. That's why I'm now staring at this. ๐Ÿ˜‚

terse stream
#

there is supposed to be a perf hit between those 2 versions which are made up in 3.14 with the JIT work, not sure if that's whats at play

slow pagoda
#

The JIT is not activated by default in 3.14. But regardless of version, the replace chaining is always faster, 2x, 4x, then like 1.3x. In theory, the translate version should be faster, as it's one method, one new string instead of three, and guaranteed to be 1-1 character mapping.

terse stream
#

wait doesn't str.replace use memchr under the hood?

slow pagoda
#

STRINGLIB_FAST_MEMCHR it looks like

distant void
#

Is there a possibility that CPython optimizes such that lower() and replace() do their work in-place, rather than allocating new strings, when the original is being discarded? Looking at the replace implementation, I notice that it's defined as an in-place operation, whereas translate always allocates a new string.

#

No, I just checked the simplest way, by doing a benchmark which preserves both strings. That's not what it is.

slow pagoda
#

It could be that replace makes a copy the modifies it, which takes advantage of vectorized copies, while translate is building it one byte at a time.

#

And doing a hash table lookup every time, vs. just checking for one character.

#

Having a dedicated one char replace fast path is a large part of the performance, I think.

#

(You can see the new datapoint in that PR)

terse stream
#

drive by approving

slow pagoda
#

@teal pivot or @mental locust, if I can get one more approval, I'd be fine to drop this in for 26.0, as it fixes a performance regression found when looking at the full range of Python versions, and keeps from adding packaging.utils._canonicalize_table.

mental locust
#

Approved

slow pagoda
#

Thanks!

glad acorn
distant void
#

I'm not sure how much people are enjoying poking at this, but I find it interesting.
I don't think the speedup is solely around single-char behavior. replace() seems really fast in conditions that I think disprove that (py3.14.0):

>>> timeit.timeit('"abbc".replace("bb", "xx")')
0.04331376205664128
>>> timeit.timeit('"abc".replace("b", "xx")')
0.04566288506612182
>>> timeit.timeit('"abc".translate(bxx)', setup='bxx = str.maketrans({"b": "xx"})')
0.09225080406758934

(I'm not as scientific as Henry. ๐Ÿ˜œ )

glad acorn
#

here's benchmarking str.translate on macOS with PBS releases

distant void
#

Oh, ๐Ÿคฆ I did that on 3.13 . Let me fix it with numbers from 3.14.0. It makes it less dramatic.

glad acorn
#

and here's a comparison of benchmarking the normalise function with str.translate vs name.lower.replace.replace

glad acorn
#

and with the original re.sub version

iron light
#

I did much less formal testing, but I can't reproduce str.translate being slower in isolation than two .replace calls
unless perhaps you aren't able to cache the str.maketrans for some reason?

#

if you did it with an inline dict I suppose that wouldn't be a constant in the bytecode, so...

#

oh, I can with sirosen's test...

#

it seems like str.translate really does not like changing the string length:

$ python -m timeit --setup 'bx=str.maketrans({"b":"x"})' '("abc"*1000).translate(bx)' # this is 3.12 
100000 loops, best of 5: 2.81 usec per loop
zahlman@ZBOX:~$ python -m timeit --setup 'bx=str.maketrans({"b":""})' '("abc"*1000).translate(bx)'
1000 loops, best of 5: 273 usec per loop
zahlman@ZBOX:~$ python -m timeit --setup 'bx=str.maketrans({"b":"xx"})' '("abc"*1000).translate(bx)'
1000 loops, best of 5: 306 usec per loop
#

which reads to me like an algorithmic issue

#

(okay, it's still linear time, but something is definitely weird)

slow pagoda
glad acorn
# glad acorn and with the original `re.sub` version

btw I also ran this on Windows and Linux* and got the same pattern, so the str.lower.replace.replace should be good across the board
(* on GHA, which I know isn't best for benchmarking, but the benchmark is slow enough and the results were very clear)

slow pagoda
teal pivot
slow pagoda
#

So far the release has seemed pretty successful. PDM's version bumping broke, it was setting ._key, but @dry terrace already shipped a fix (using __replace__). I think pdm-backend was safe. Does look like a filter(..., key=) might get added.

mental locust
slow pagoda
#

Yes, I like the look of that API a lot.

teal pivot
#

I'm also good with it

mental locust
#

I have a bit of a weird SpecifierSet.filter optimization, which probably makes it 0-3% slower overall, but in the most common case significantly speeds up the latency on the first result, I'm going to stash it for now and check down the line if it will speed up situations like boto3, especially when we are using key and doing a one-pass with filter on candidates

dry terrace
# mental locust I have a bit of a weird `SpecifierSet.filter` optimization, which probably makes...

I would prefer the .filter() method to be completely lazy, but now it still iterates over elements in certain situations.
I have a data structure for this for reference: https://github.com/frostming/unearth/blob/6af9a63c27cbfdbe6936fd2aa7a41713920769cd/src/unearth/utils.py#L187
It consumes the iterable while caching the items in memory

GitHub

A utility to fetch and download python packages. Contribute to frostming/unearth development by creating an account on GitHub.

mental locust
iron light
#

that reminds me, one of my long-on-the-backburner ideas was a generic caching data structure of that type. one of the neat proof-of-concept things I wanted to showcase with it is (an analog to)itertools.product on unbounded sequences.

mental locust
#

Hmm, packaging isn't really sans-IO, it at least calls a subprocess and reads it's stdout

teal pivot
mental locust
#

Yeah, I suppose it's also platform specific code, so isn't going to trigger in places that don't support it

mental locust
iron light
#

(personally I'm biased somewhat against "fancy data structures")

terse stream
#

just slap lru_cache and be done with it /jk

iron light
#

honestly that sometimes is all you need

mental locust
#

Dependency resolution algorithms often need to keep getting the same results over and over again, things like lru cache are your friend

mental locust
#

It's really annoying when you think you have a good idea for performance improvement and profiling it prooves you're completely wrong

slow pagoda
#

Yes; I tried writing custom comparisons and getting rid of the comparison tuple that we compute and cache a while back, it was ~5x slower. ๐Ÿ™‚

iron light
#

(just spitballing, what if it were the other way around? a subclass of some namedtuple, with properties for things that aren't in the tuple. then comparison doesn't have to retrieve a cached value; it already is the value)

#

(how often will the user do performance-critical things with a version that aren't comparison?)

mental locust
teal pivot
mental locust
glad acorn
wide herald
mental locust
#

I don't see setuptools specified, are we inheriting it from the docker image?

#

actions/setup-python is pinned and I last I checked they had an old version of setuptools installed and weren't planning to update it (because it causes more breakages than they can deal with)

glad acorn
#

I don't actually see anywhere tasks/check.py is run from?

wide herald
#

maybe it's some sort of legacy script?

glad acorn
mental locust
#

It's purpose seems to be that all PyPI versions are PEP 440 compliant, I think we should just delete it

sage marsh
#

Is that something I wrote

glad acorn
#

yep, old invoke stuff

sage marsh
#

That might have been when I was testing PEP440 rules vs pkg_resources to figure out what PEP440 should say to try and minimize things changing in interpretation between โ€œoldโ€ and โ€œnewโ€ (at the time)

#

If so, itโ€™s probably fine to delete

#

Might be useful as a historical document? But even then I donโ€™t think it was very interesting

#

Yea looking at it now. Thats exactly what it is

#

So itโ€™s not even a useful script unless youโ€™re using an old enough pkg_resources that wasnโ€™t just itself using packaging, and even then, pretty much only useful in the specific context of โ€œwhat if we changed pep 440 to accept X, how would that change how compatible we are to pkg_resourcesโ€

I doubt anyone besides me has ever even invoked it

#

Donโ€™t ask me why I thought that was important enough to commit it ๐Ÿ˜…

ocean grove
glad acorn
#

5 years with increasing warnings: 2 years in docs -> 2 years of DeprecationWarning -> 1 year of UserWarning https://mastodon.social/@hugovk/116039913462961125

Setuptools deprecated pkg_resources in docs for ~two years, then with a DeprecationWarning for ~two years, then a UserWarning for ~one year.

2021-04: Deprecate in docs (v56.0.0)
github.com/pypa/setuptools/comโ€ฆ

2023-03: Officially deprecate with DeprecationWarning (v67.5.0)
github.com/pypa/setuptools/pulโ€ฆ

2025-05: Promote to UserWarning with earliest removal deadline of 2025-11-30 (v80.9.0)
github.com/pypa/setuptools/pulโ€ฆ

2026-02: Remove (v82.0.0)
github.com/pypa/setuptools/pulโ€ฆ

#Python #setuptools #pkg_resources

ocean grove
iron light
teal pivot
mental locust
slow pagoda
#

FWIW, I noticed that the repr of Version is <Version("1.2.3")> - the extra <> is odd, and keeps the repr from being able to reproduce the object again, which it otherwise could. And Version is hardcoded. <> probably not worth changing, but it's a little odd. (Adding some docs and doctest made me notice that). Most classes do that that have manual reprs, except Infinity and errors. Usually <> indicates you can't reproduce the repr by pasting it in (like if a memory location is present, or it's not complete) ("Most" being about 5 classes: Version, Specififer{,set}, Node), Markers, while Tag uses <> the way I'd normally expect it to be used (though I'm not sure why it prints its memory address).

sage marsh
#

That feels like something I would have done

#

I donโ€™t think I would have had a real reason for it

#

I might have just decided I didnโ€™t like repr reproducing Version for some reason

#

Or I cargo cultโ€™d it from somewhere

slow pagoda
#

All the examples in the stdlib I can think of print themselves without the <> if they can be reproduced (like collections.Counter(["a", "b"]) -> Counter({'a': 1, 'b': 1}), and <> only if they can't be, like object() -> <object object at 0x102fa8880>.

distant void
#

Is it not worth changing to drop <>? Even as the kind of person who might have projects which would be impacted, I'm just like "yeah, repr isn't something we should expect to be 100% stable across versions"

#

I would maybe check if pip's testsuite passes without it and call it a day if you want to remove it ๐Ÿ˜

slow pagoda
#

We can do that with our nox downstream job. ๐Ÿ™‚

#

Which is in a PR, actually, I think

sage marsh
#

Thereโ€™s a big chance I just did repr on a random object I had in a repl and copied it lol. Or I was mad about repr for some reason

#

Both pretty likely

distant void
#

I don't actually have much of a stake in this, FWIW. But I always consider repr to be an implementation detail and not part of public interfaces (other than it must not error and must produce a string). So to my mind you should be free to change it if you like.

slow pagoda
#

Is Node one of the first classes? ast is a place where they tend to show up, since generally all children are not included, so the repr can't reproduce correctly.

sage marsh
#

(This all to say that for whatever itโ€™s worth, I canโ€™t think of a reason not to change it besides if you care about compat)

slow pagoda
#

(You are brining back vivid memories of how Python 2.7 broke the entire 5+ million line HEP stack many years ago because of the repr of floats changed the number of digits displayed....)

#

I might take a stab at it after working on docs, see if others like it.

sage marsh
teal pivot
teal pivot
ocean grove
#

People shouldnโ€™t rely on being able to eval reprs, but the convention is to use <โ€ฆ> when you donโ€™t try to create valid-ish python code inside, e.g. <FooBar object with id=3>

So for consistency you should remove the <>.

slow pagoda
#

I'm wondering how many people show version objects in their doctests and how mad they will be if we break that. ๐Ÿ™‚

sage marsh
#

one way to find out!

slow pagoda
#

I'll try it after I get the downstream job going. pip reports: FAILED tests/unit/test_vcs.py::test_ensure_svn_available , that's annoying in the unit tests. I think I can just skip it.

#

I wish I could get #hatch working in downstream tests, we have tended toward breaking hatch before ๐Ÿ™‚

mental locust
slow pagoda
#

I see a check for CI, but I'm running in CI ๐Ÿ™‚

#

It would be nice if there was a marker to use

#

I manually skipped that one test, looks like it's passing otherwise

#

setuptools takes 4 minutes, otherwise it looks pretty good. I'm just running pip's unit tests and running it in parallel, so it's 33 seconds which is nice.

#

Hopefully the unit tests include enough use of packaging ๐Ÿ™‚

mental locust
#

When I have some time I'll look at what unit tests to make sure we are capturing all the places pip uses packaging

slow pagoda
#

1049 (ci: add downstream testing) is ready for review.

slow pagoda
#

Downstream tests pass with the repr change, btw. ๐Ÿ˜Ž If there are doctests, that could fail, but they'd have to be outputing packaging reprs, which I'd assume isn't that common.

slow pagoda
#

@distant void do you have a new version of the dependency-groups PR using errors.py?

distant void
#

Not yet. I'm going to rebase and work on it this week. What kind of deadline should I think of for getting it in time for v26.1?

slow pagoda
#

I don't think we have a specific timeline. Before pip 26.1, which is ~2 months away. But could be less.

slow pagoda
ocean grove
distant void
#

I ran into the fact that pytest.raises(...).group_contains(...) expects that on py3.10 and lower, an exception group inherits from exceptiongroup.BaseExceptionGroup. I was about to start hacking up my tests to work around this, but I'm thinking more... Would it make sense to have packaging.errors.ExceptionGroup inherit from exceptiongroup.BaseExceptionGroup when that's available?

mental locust
#

I'd rather not add dependencies if we don't have to, but that's mostly me being annoyed of having to vendor them into pip ๐Ÿ™ƒ

distant void
#

I tried it quickly and it doesn't quite work -- exceptiongroup.BaseExceptionGroup raises attribute errors when we try to assign message...

#

I was just thinking that this is potentially going to be annoying in different ways when downstream consumers start getting a packaging.errors.ExceptionGroup but it doesn't get treated (by pytest, at least) like a real exception group on certain pythons

#

I could take it up with pytest -- that they should allow their group helpers to be used on shim objects like the packaging one -- but if I were a maintainer over there I'd politely refuse that feature request.

#

I guess just no nice group helpers from pytest for people handling packaging exception groups, until we drop py3.10 ๐Ÿ˜

mental locust
#

Most uses of packaging are going to drop Python 3.10 pretty soon anyway

#

But I don't know what packaging's timeline isn

distant void
#

Probably largely responsive to pip, no? pip still has 3.9 right now, IIRC

mental locust
#

Yeah, but packaging supports 3.8, I thought it was more of a "let's keep supporting until the CI breaks" policy

#

I'm pushing for pip to drop Python 3.9, I'll actually redo some statistics on that shortly

distant void
#

Given the way that requires-python works, I feel like dropping 3.9 is not so harmful to those users. I know you know this. Just feels like the burden of proof, once you've done one release after EOL, should be on anyone who wants pip to keep support, to show why it's important/beneficial.

mental locust
distant void
#

So all I need is a botnet to download 3.14 a bunch of times? Cool! ๐Ÿซ 

wide herald
#

Personally I am more and more a fan of SPEC0... Most people don't upgrade their pip anyway

distant void
#

SPEC0 doesn't fit everyone though. It's okay to be doing something different. e.g., At work we have some projects which support old Pythons because we're meeting our users where they're at, rather than where we'd like them to be.

glad acorn
mental locust
#

Yeah, I think pip should support at least all versions of Python the PSF supports, more restrictive than that doesn't really meet the goals of pip as a project

distant void
#

TIL that pip CI doesn't check pypy.

#

I was waiting for it to simply fade out, but maybe I should remove it from more places.

mental locust
#

Pip's functional tests are so slow to run it's a bit of a fight to add more distributions, I do want to add older bugfix versions of CPython though

distant void
#

You could do more versions but not on Windows. I see that pip CI is already there.

distant void
#

I just marked the dependency-groups PR ready for review. I think it's pretty okay. I'm happy to take feedback not only on implementation details but also on the API. Once it's in packaging it's going to be hard to adjust it.

#

(also the Windows pypy3.8 build failed, which feels weirdly topical)

mental locust
#

The pypy3.8 runner is flaky, I'll rerun it

mental locust
dawn parcel
dense yacht
#

I got really confused haha

dawn parcel
#

-# probably because Iโ€™ve looked it up before

mental locust
#

Yeah, it was my top result on Google also, but probably because Google hyper targets me and knows what eat for breakfast

dense yacht
#

No "SPECO" vs "SPEC0"

mental locust
#

SPECO is what a technical writer names their DOGO

mint urchin
#

leggo my SPECO

#

Iโ€™m possibly dating myself

slow pagoda
#

I personally dislike SPEC 0. I really don't like that the scientific community is separating itself from the general Python community with a different support set. It came about (as NEP 29 originally) because the Python 2 official support timeline (10 years!) was too punishing for projects like numpy to support. As far as dropping Python versions, SPEC 0 is fine if you have a stable project (like numpy), but it can cause issues. The only thing that causes minimal issues is if everyone drops at the exact same time.

#

(An example of an issue is that a default install of matplotlib is now issuing warnings (which is an issue if you have warnigns as errors in your tests, as you should) due to pyparsing releasing a 3.9+ release and matplotlib dropped 3.9 and 3.10 already, maybe 3.11)

wide herald
#

I like SPEC 0 cause often people want to support old pythons for way too long and it blocks usage of new features and forces backport libs etc

#

And it is pretty much the only spec in that field that is widely used

void hare
mint urchin
#

Iโ€™m looking at packaging to add experimental support for the abi3.abi3t abi tag and I noticed that packaging doesnโ€™t run nox tests on a free-threaded interpreter. Maybe it should?

wide herald
#

well, free-threaded interpreter wasn't stable until recently, so I guess it could run on 3.14t

mint urchin
#

thereโ€™s also a deprecation warning in pygments on 3.15 ๐Ÿ™‚

slow pagoda
#

Yes, that would be fine. 3.13t would be fine too, I think it was more that it wasn't likely to affect anything so it never came up as something to add.

slow pagoda
ocean grove
mint urchin
#

Is there a convention to use for branches in support of PEPs? I have some minor patches to the tag logic to support abi3.abi3t. It will only make sense to upstream if PEP 803 gets accepted. Should I open a draft PR or something?

ocean grove
slow pagoda
#

Draft PRs are fine. I think that's pretty common in support of PEPs.

polar nimbus
#

The initial draft of PEP 825 (foundations of Wheel Variants) has been merged. We'd appreciate your feedback here or on DPO (the thread is kinda silent so far, hopefully a good sign).

opal notch
#

@heady prism fyi we changed it to having the variants information inline in lockfile; we currently have to assume mutability of variants.json, which gets updated as new files are uploaded, plus this avoids the extra network request

heady prism
opal notch
#

I will take a look but that was one of

mint urchin
iron light
#

btw, are there performance tests in the repo now (especially for the Version stuff), or what exactly are the key test cases?

slow pagoda
#

FYI, @teal pivot (or maybe @mental locust if you have OSS time), #409, #1102 (replaces #470), #795, #944, and #1065 are ready. #1049, #1059 could use a review too. Since #944 isn't mine and I didn't push to it, I can merge that eventually, but since it's new API, would be nice to get one more set of eyes on it.

iron light
#

... Is sorting/comparison the only place where performance matters for Versions?

...Also, I'm sure I'm missing tons of history, but does there really need to be a _BaseVersion? Are clients meant to subclass that, despite the leading underscore?

slow pagoda
#

That subclass is used by packaging_legacy, I did look at removing it once. Pretty sure I ran into something else when trying to remove it, don't remember what though. Maybe it's just that there's a test that tests it directly. ๐Ÿ™‚

#

I'd say the most important place performance matters is in Version creation, since you might create a lot of them.

mental locust
#

Yeah, there's a lot of places on doing installation where you might just construct a version and then never do anything with it, like in the simple API or wheel filename

iron light
#

Ah... does it need to be validated up front?

#

(although, deferring it would probably be easier in the calling code...)

#

Also, it seems the only place _TrimmedRelease now gets used is to implement utils.canonicalize_version... ?

#

I... have some ideas, but it would be a pretty major re-architecture

mental locust
iron light
#

ok, that's simpler for my plan then actually

mental locust
#

Ever since looking at that benchmark of complex specifiers were very slow I've been thinking on it, I think I've found a way to improve the speed by N times, where N is the number of specifiers

iron light
#

oh?

mental locust
#

Yeah, SpecifierSet.filter creates an iterable of Specifer.filters, themselves iterables, with no awareness of the work they are each doing. The loop can be unrolled and a negligible amount of pre-work can be done to save doing work in the specifiers.

#

I might have to split across a few PRs, but shaping up to be another big improvement in performance for 26.1

distant void
#

Is the regex matching in specifiers costly, or not worth optimizing? I'm looking at it (I wanted to see if I could find the perf gain; I could not ๐Ÿ˜) and have the idea of splitting it into two branches, depending on whether or not arbitrary equality is present. Substring matching is fast enough that it ought to be negligible to check if "===" in string. I might try this out to see if I can get those ASV benchmarks running locally

mental locust
#

Yeah, there's almost certainly some significant performance gain to be had there, be reasoned improvements or fast pathing. The only problem with fast pathing is the check itself has to be really fast, or the overhead of checking eats up all the gains.

distant void
#

I'll see if I can get a result with if "===" not in thing: regex = the_good_one . It seems fun.

mental locust
slow pagoda
#

Isn't === extremely rare? It's only a compat thing for classic versions?

sage marsh
#

Itโ€™s possible nobody has ever used it tbh lol

distant void
#

Yeah, that's why I think that it could be worthwhile to run a regex without it?

sage marsh
#

It existed entirely because we couldnโ€™t represent some versions that existed

#

I think one of the TZ packages was the main offender

distant void
#

Is there an easy way to run asv compare (from the branch with the benchmarks) to compare HEAD vs HEAD~1? Last time I experimented with asv I found, as I am finding today, that the CLI confuses me a bit

#

(Got it. I hadn't done asv run correctly...)

#

My change is either noise or worse than baseline. ๐Ÿฅฒ
On the bright side, I'm now setup to be able to look for improvements.

mental locust
#

Agreed asv is confusing, I keep calling it wrong

distant void
#

Of course it's pytorch related. I think pytorch exists solely for the purpose of trolling pypa people ๐Ÿ˜†

mental locust
#

Well, they're not using epochs yet at least

sage marsh
#

Not sure I understand what that guy is saying, or my memory is wrong

#

Arenโ€™t local versions supposed to match ==

#

Oh they want the non local version

mental locust
#

I didn't read it in detail, I assumed the issue was ==1.0 matches a local version but ===1.0 doesn't

sage marsh
#

I was reading it backwards

distant void
#

I think that was a user with a custom index, where they have built versions of pytorch? And they wanted to use === to select the right one?

sage marsh
#

It seemed like a good idea at the time ๐Ÿ™ƒ

mental locust
#

I might make a change to packaging that might break their use workflow. Currently in packaging === normalizes valid versions even though the spec is pretty clear it shouldn't, so ===1.0 matches 1.0.0, I'm not sure fix that and make it spec complaint and risk breaking users with unexpected uses of ===

iron light
#

it'd be a lot easier to figure out what breaks (or fixes) those users if we had any idea who they were ๐Ÿ™ƒ

mental locust
#

I've found the cause for the huge variance in SpecifierSet filtering when running benchmarks

sage marsh
#

It's neat watching y'all pick apart the version and specifier stuff, itneresting to see some of the stuff I wrote years ago get made better ๐Ÿ˜„ (I mean, I presume it's changed at least somewhat since I wrote it, but I recognize the general shape still ๐Ÿ˜› )

distant void
#

The amount of picking apart I've done -- which is not much -- has been very fun! But not very productive. ๐Ÿ˜…

mental locust
# sage marsh It's neat watching y'all pick apart the version and specifier stuff, itneresting...

Now packaging is widely used and users expect complex dependencies to "just work" you can really see the impact of small tweaks or API changes make big performance impacts. I was running some pip benchmarking last night on a ~300 transitive dependency install and even with the significant improves in both packaging 26.0 and pip 26.0 to stop needlessly making new Version objects and to make Version objects much more efficient, Version is still constructed ~225K times causing about ~1.2M memory allocations.

#

In pip 25.3 I believe that same resolution would have constructed Version ~2-3M times

distant void
#

It's not that often that you get to do optimization work where improvements of 1 or 2 percent are still high impact. I'm going to keep sniffing around for stuff I might find, but probably not this weekend

mental locust
#

Agreed, it scratches a particular itch to be able to think about this kind of performance problem in pure Python

distant void
#

The fact that "write a dedicated parser in C or Rust" is not an option has definitely been on my mind

mental locust
#

Also the library boundary layer adds an interesting difficulty, were packaging code all in pip the simplest solution to the Version construction cost problems would to be add a global Version cache, but a global cache doesn't make much sense on packaging side, and a lot the Version construction places are inaccessible to pip, they happen inside methods or functions that packaging owns.

distant void
#

Could we add a new construct, e.g. CachedVersionParser? That would be fun to design if suitable.

mental locust
#

I don't think it needs a new construct, the issue is packaging needs some global flag that tells it whether to cache versions or not, I don't know what that would look like

distant void
#

When I'm not on my phone, I'll look at how the CPython multiprocessing context works. That's a global setting with a nice API

mental locust
slow pagoda
#

If CPython had a couple of utility functions written in C for us, we could get big performance gains. ๐Ÿ˜‰ "split by character and apply int" would be really handy.

#

Yes, some way to toggle using a global cache would make a lot of sense. Might be better single threaded than in multithreaded, though. I've thought about moving def parse over to cache, since otherwise it's literally just a function wrapping Version. But it wouldn't help all the other ways versions get created via specifiers and such.

wide herald
mental locust
#

Maybe mypyc is mature enough for a library like packaging now? Won't help pip, but would help pdm, hatch, etc.

slow pagoda
#

Some users embed packaging, distribution is harder, etc. Iโ€™d use Rust I think, especially if you could use stable ABI

wide herald
#

I mean, it could be a separate package, offering the utils needed

#

so embedding wouldn't be impacted

#

btw, anyone knows if there was ever a plan or effort to merge distlib and packaging? they seem to overlap in many places

opal notch
#

i wish python was more ambitious about performance instead of pushing people into other languages or DSLs with very limited features when they need performance

hidden condor
mental locust
# opal notch i wish python was more ambitious about performance instead of pushing people int...

The resolver loop benchmark in the packaging CI we just added becomes about twice as fast between Python 3.9 and Python 3.13, that's not nothing. I think the Faster CPython project showed that while there's some low hanging fruit it's a non-trivial problem and needs dedicated and skilled people working on it full time. And even when improvements do come the benefits are muted because the ecosystem largely outsources performance critical code.

sage marsh
mental locust
#

I think currently pip just relies on entry point mechanics from distlib, but as packaging is currently intended to be sans-IO that stuff can't move into packaging

#

So, I don't see how the libraries can merge

#

Perhaps someone could build a packaging-io that depends on packaging and implements all the IO parts of the standards, but it doesn't seem like there's a lot enthusiasm to write these kinds of package shuffling mechanics, it's a lot of headache and not much reward

finite temple
#

Is it stated somewhere that packaging is meant to be sans-io? It does call subprocess for musllinux detection

#

(not saying that's incorrect but curious if there is a policy I could look at)

mental locust
#

More an aspiration by current core maintainers than something documented I think

iron light
iron light
#

ah you addressed the caching

#

but I guess if pip expects that creating version objects is expensive enough to care, it could do the caching on its end?

mental locust
#

I believe on that resolution there are way over 100k distribution URLs that need to be processed, each with a version in the filename, and if selected a version in distribution metadata

iron light
#

Oof.

#

The ecosystem continues to exceed my expectations.

mental locust
#

I seem to remember that caching Versions in packaging reduced peak memory by ~10 MBs, so from about 180 MBs to 170 MBs after the memory leak fix

distant void
#

I took a few minutes at my desk to try to do a better job of measuring my regex tweaks. I think it's actually showing an improvement on 3.13 and 3.14 (for the relevant benchmark), but it's below the ASV cutoff of 10% -- which strikes me as pretty conservative. If anyone has the time to take a look, I'd be very curious to know if my results are reproducible.

#

I also wonder if that asv continuous job should have a threshold lower than 10%? My instinct would be 5% because then I can pretend I'm a scientist and say I'm "p-hacking"!

mental locust
#

I replied to the PR

distant void
#

Oh, is the 10% default just for the compare command? I like that this tool has so many knobs and whistles, but I do get mighty confused (as I was saying yesterday)

#

Wait... If you set it to 5% , wouldn't the run on 3.13 which came in at 0.93 show up?
Maybe I should just wait for Henry to tell me what I did wrong ๐Ÿ˜…

mental locust
#

I've not looked into how asv counts threshold, it could be 5% both ways?

#

I just picked it because the docs said it was the default and the previous config was too noisy locally

mint urchin
#

yeah. itโ€™s 5% by default

#

asv has lots of room for improvement btw, itโ€™s kind of a mess of a codebase IMO

#

i sort of want someone to write a new thing that combines a benchmark runner that runs a single benchmark with an orchestrator that can do comparisons across versions and generate graphs

#

asv conflates those two purposes and the internals are more complicated than they need to be because of it

mental locust
#

I believe it, it's missing at least a couple of things I'd want out of a benchmarker, and steps are pretty confusing about what exactly they do, even after reading the docs

finite temple
# slow pagoda If CPython had a couple of utility functions written in C for us, we could get b...

I think things like this might be a hard sell, but one alternative which I am curious what people here think would be could we put some very stable parts of the packaging spec into CPython? For example, the version specification has been extremely stable since PEP 440 was accepted, other than a few errata. It seems like having a native implementation of version parsing in CPython would make a lot of sense to me if it is a significant bottleneck for pip. Similarly, metadata parsing is already done in importlib.metadata, but having a standalone module based on packaging.metadata that parses email metadata would also make sense.

#

@teal pivot I'd be curious what you think

slow pagoda
#

email metadata might get replaced with JSON metadata (https://peps.python.org/pep-0819). But I think a importlib.version (with some C helper functions that we could also use!) might make sense. Version parsing happens to also be really common for third-party code, like if you need to use the results of importlib.metadata.version.

#

The core idea in asv is great, but yes, it's missing a lot from a software engineering perspective for modern Python packaging. Would be great to see someone work on it if they are interested in improvements.

mental locust
#

While Version has historically been very stable (to the point of where 3rd parties were happily relying on it's internal implementation details) we have recently been innovating the internals and external API, I do wonder how that would work in the standard library

slow pagoda
#

I don't really trust it's thresholds that much, that's why I turn off the "only show significant" values feature and always look at the whole output.

#

We haven't done much with the external API, other than deprecating accessing some internals.

#

Yes, the "benchmarker" and "runner" should ideally be separate. There's also pytest-benchmark and something that's API similar (used by one of the online services), but that doesn't have all the runner features of ASV.

#

If we got the chance to re-write Version into a new place, I'd make things like .base_version return a version. ๐Ÿ™‚

#

I wonder if this could be something for the "optional Rust components" that has been proposed in a discussion / at the language summit. String parsing and regex in Rust is good.

finite temple
mental locust
#

Explaining that Python package wheels use RFC-5322 email headers for their metadata always gets a good laugh at tech talks anyway

finite temple
#

Well, I'd be remiss if I didn't point out there is not a specific RFC that Python metadata uses

#

(per the spec)

#

So it's actually worse than using RFC-5322 headers

finite temple
# mental locust While `Version` has historically been very stable (to the point of where 3rd par...

I think it is OK for the internal implementation to change, if the API and functionality is stable, that's what's important. I could imagine an implementation where a Python implementation of version parsing opportunistically uses stdlib implementations for performance sensitive parts but can fall back to a pure-Python implementation. Then the Python code in the stdlib could be either backported in packaging or packaging could be synced into the stdlib like is done with importlib.resources

#

hopefully that makes sense?

mental locust
#

Yeah it makes sense, and I see the advantages, I've just never really touched the standard library and had to deal with how it works

finite temple
#

Totally fair, if there's consensus it would be a good idea, I'm happy to work with folks to propose this to the standard library

slow pagoda
#

What I would do is have a very similar Version but with legacy bits changed/removed. It would have a parse_version_parts which would produce all of the parts from a string. That function could have a compiled implementation, and we could use that function if it exists for our Version.

finite temple
#

Yeah I think that could make sense

robust hollow
#

And please include legacy version parsing for those of us who still have to parse, compare, and serve old version strings ๐Ÿง“

mental locust
#

I think the prior here is that modern versions follow a spec that can be agreed upon and validated

mental locust
#

Implementing this cost based ordering for specifier filtering made me realize just how expensive wildcard operators are to all other operators, so now I have an optimization for them as well, will have wait till at least tomorrow before I make the PR

finite temple
#

Maybe if folks will be at PyConUS we can discuss a path for integrating parts of packaging into CPython there?

glad acorn
#

I reckon a tightly scoped PEP, showing clear benefit to key tools, plus the ecosystem more broadly, could even make 3.15 (feature freeze in 2 months)

mental locust
#

I'll be around at PyConUS and happy to talk

hoary pecan
finite temple
#

And that just parses the fields into strings

hoary pecan
#

a lot of them, yes

#

it parses METADATA, PKG-INFO, entry-points.txt, RECORD, SOURCES.txt, requires.txt, and probably some other stuff I am not remembering

finite temple
#

Ah yeah I see that now, interesting. It doesn't parse things like version strings as best I can tell though, so those would need to be added

iron light
#

(using the actual email module for parsing that metadata still rubs me the wrong way)

sage marsh
#

Is there anything besides version parsing thatโ€™s currently a bottleneck that would be useful?

#

The annoying part being finding a place to put it ๐Ÿ™

mental locust
#

Specifier filtering is fiendishly complex and slow. I'm currently working on massive overhaul of the internals

#

But it definitely looks like it's going to be a trade off, when calling SpecifierSet.contains for a simple specifier on a single version only once it will be a bit slower, everything else will range from a bit faster to significantly faster

slow pagoda
#

I don't think specifiers are stable enough yet to want to put them in, but versions probably are (plus there's a good argument for why those would be useful to have in std). It would be interesting to make a little demo "accelerator" library and measure how much having some compiled helpers would affect performance.

finite temple
#

Yeah I might look at that later this week

hoary pecan
mental locust
#

Wrt a C implementation of Version, it's actually a very complex object so I do wonder if it would actually speed up things if naively implemented. uv has a very clever implementation where the most common versions forms are stored as 64 bit int, and so equality and ordering are super fast because they are just integer equality and ordering. And then there's a fallback big form that more widely supports PEP 440 edge cases.

#

I have sometimes wondered if this approach could speed up a Pure Python implementation, seems unlikely but I might play around with it now we have a benchmark tool.

iron light
#

FWIW, microbenchmarked on 3.12:

$ python -m timeit --setup 'x = (1<<16) + (2<<8) + 3; y = (1<<16) + (2<<8) + 4' 'x < y'
20000000 loops, best of 5: 17.3 nsec per loop
$ python -m timeit --setup 'x = (1, 2, 3); y = (1, 2, 4)' 'x < y'
5000000 loops, best of 5: 45.6 nsec per loop

but you'd need to check if both versions are compatible with the scheme...

#
$ python -m timeit --setup 'x = (True, ((1<<16) + (2<<8) + 3), (1,2,3)); y = (True, ((1<<16) + (2<<8) + 4), (1,2,4))' '(x[1] < y[1]) if (x[0] and y[0]) else (x[2] < y[2])'
5000000 loops, best of 5: 45.2 nsec per loop

It still seems to have potential, given that the non-optimistic-case comparison will be harder than that

#

(as I think about possible implementations... ha! maybe BaseVersion is important after all!)

mental locust
#

Did a bunch of analysis, there are some small savings that can be made but most of the time in version comparison is dominated by ver.__lt__ dispatch

slow pagoda
#

I was thinking it would still be a Python object, but there would be a couple of functions compiled for specific operations.

slow pagoda
#

From playing around a bit with Rust with the stable ABI, using AI, I think I can get around 5-7x faster with a complex version. That doesn't build the version object, though, so it might be over estimating. For comparisons, I can do better (2x-5x or so) than computing the key every time, but ranges from 12% slower to 80% faster than comparing once the key is made. With more hand-crafting, maybe that can be made better. Using this, you never need to make the cached key, though you'd want to store the produced tuple, so there might be an access cost.

#

You also might be able to make a new base object with a minimal set of compiled methods, and build off of that, rather than a tuple, this design is probably easier to make optional, though.

amber gust
#

It seems counterintuitive to put stuff like canonicalize_name into a module called filenames, since it isn't directly related to filenames.

teal pivot
# finite temple <@377660079509340163> I'd be curious what you think

It's a possibility, but I have not put much thought into it. The tricky bit is when is something stable enough to go into the stdlib and will it upset anyone (e.g. conda). Plus, I know the question, "why can't packaging ship its own accelerator code?" will come up (and I know the reason, but you have to convince the SC it's legitimate)

finite temple
slow pagoda
mental locust
#

I have three PRs coming to speed up the performance for hashing Versions, which is actually a big bottleneck for pip

#

That said, SpecifierSet.filter(..., key=...) will remove one of those bottlenecks, so I don't know what the real world impact will end up looking like

mental locust
#

I think I'm going to want to keep adding more benchmarks over time, but they're already significantly the longest CI step.

Would it make sense to run each Python version in it's own job and then get a bot to post the concatenated results to the PR? With some kind of summary/detail layout and maybe a cool down to not spam the thread

slow pagoda
#

I also want it to remain reasonable to rerun all history and get a graph

#

Yes, that's possible, though.

#

I think summaries get merged if they are part of the same workflow

slow pagoda
dense yacht
#

You could use our version crate if helpful, it's quite stable

#

(and very optimized)

finite temple
#

Yeah, if we had Rust in CPython today I'd say we could use uv's implementation which has already been battle tested

#

Unfortunately, I don't think the current plan would allow such a module until 3.17

#

And it would still require a C implementation

iron light
#

Re #1116, I presume that the changes to the test cases are incidental to the actual fix? (Also something like this did occur to me but for whatever reason I didn't mention it...)

I have something more radical in mind for the Version type that I might work out over the weekend, when I can get myself in a mental state to work on someone else's code and run their tests

#

(In the long run it'll be nice to have access to Rust in CPython, but for something like this I think porting it to C is more realistic for getting it in faster. But something like this also has to be vendored into pip and I'm not sure how I feel about platform-specific wheels for pip)

#

(not that my opinion particularly matters there anyway, but)

solar heron
#

We have no plans to begin shipping platform-specific wheels for pip.

slow pagoda
#

Iโ€™ve been meaning to test that version crate, I could just wrap it in a Python interface to match. Not sure itโ€™s as useful for a proposal but might be a good upper bound on perf.

mental locust
#

I almost have this re-implementation of Specifier logic using intervals done, I don't have a public API yet and I don't plan to design one yet.

To make both the internal machinery performant and the public API consistent and somewhat understandable, they basically can't be the same objects, we'll have to have an API boundary layer that converts internal component to public objects, and if there's ever a need, public objects to internal components

slow pagoda
#

I've tried asking AI to optimize the branch (copilot in VSCode auto model); it keeps making all the other paths faster (complex is down to 0.06x); it was able to get the cost down to 1.16 cold/1.23 warm after quite a few iterations and guidance (it doesn't like that you have to commit the changes to run the comparison benchmarks). Compared to the current status of the branch, it's about 0.7x less time. Have not looked at the code really, as I was doing other things, but if I can check it a bit I'll drop it as a diff on the PR and you can see if there's any good ideas that can be pulled from there. I've also asked it to summarize the changes.

#

Mayne some of those ideas will be useful. It would be nice if we could get this close enough to previous performance to avoid anything more complex, like switching implementations based on complexity, but simple specifiers are quite common so even 1.2 is likely too much.

mental locust
#

The branch isn't intended to be performant, I know I can get the cost down to that level, I already had it a lot faster and then simplified it and took it in the direction of a public API

#

Once the mechanics land for is_unsatisfiable then the diff will be a lot smaller and I planned to start working on performance

#

For very simple specifiers we can avoid building intervals if they're not already built from some other method, and then performance will be 1x

slow pagoda
#

I though that it would be easier if it's a single implementation, but if it doesn't need to be, then there's not a worry (other than judging when to switch impls)

mental locust
#

I don't think that will be possible if we want to avoid any performance loss

slow pagoda
#

That's why I was seeing how much faster I could easily make it ๐Ÿ™‚

mental locust
#

Building an interval is just more expensive than executing an operator on a single version

#

Well, a simple operator anyway, some operators are actually more expensive than building an interval

#

(Sorry if I'm coming across a little blunt tonight, not feeling well, and not sure I'm modulating my tone correctly)

slow pagoda
#

No, you are not. ๐Ÿ™‚

#

Hope you feel better soon!

#

(not coming across blunt, that is, guess I need to specify which part of that I'm responding to)

mental locust
#

I strongly recommend you get the free Claude OSS subscription if you can, if you throw Claude Code Opus 4.6 against it, you can easily explain to it how to use asv, it should give you much better and real results than copilot does

slow pagoda
#

I haven't had a chance to properly look at it. I believe is_unsatisfiable would allow the questions I'd like to ask of SpecifierSets to be possible, by taking python_requires and joining it with something like ==3.9.* or >=3.14, etc, then asking if it is satisfiable.

#

I've got access to Claude Opus 4.6 in copilot. I left it on auto, and it picked GPT 5.4 this time, which seemed fine. It was running asv just fine, it just needs me to commit it before it runs (i could probalby ask it to commit, but I'd rather control that)

mental locust
#

I have copilot also, it's just not as efficient as running is via Claude code and the results are not as good, and the tooling and explanations just seem to work better

slow pagoda
#

I'll have to try it sometime then! I've heard quite a bit about it, and one of the RSEs I work with has the max subscription (actually two, one for work and one for personal, the personal one is the highest level and the work one is the level below it). I'm still in baby steps compared to that. I wouldn't know how to keep an agent(s) busy enough for that!

mental locust
#

I definitely don't think I've ever gone over 15% limit ยฏโ \โ _โ (โ ใƒ„โ )โ _โ /โ ยฏ

slow pagoda
#

I'll probably merge direct_url and dependency_groups soonish if there are no last minute reviews.

#

Claude OSS: 5K stars or 1M+ NPM downloads. Little disappointed it's only NPM not PyPI. ๐Ÿ™‚

mental locust
#

I'm sure you can put in a comment about packaging's PyPI downloads, it took a couple of weeks to go through btw

slow pagoda
#

Packaging only has 716 stars. ๐Ÿ™‚ (pybind11 is 17k, so I'm okay)

#

Star counts are a really weird measure of popularity. Itโ€™s basically a measure of how flashy it is, not how many people use it.

sage marsh
#

the curse of being a library that important things use but less random people would have a need for ๐Ÿ˜›

slow pagoda
#

I personally often use stars as a way to remember something that I want to look at later, I wouldnโ€™t have any trouble remembering something like โ€œpackaging โ€œ ๐Ÿ™‚

sage marsh
#

I assume they're just trying to prevent someone from throwing up a simple printer of lists with a BSD license and asking for their free claude which is... a hard problem to do generically

#

maybe they could ask AI to do it for them

#

(the simple printer of lists reference might be dating me badly)

slow pagoda
#

I never think to ask students to star my teaching repos, I saw somebody teaching doing that and they had I think 3000 or so. That was a couple years ago, so they might be over 5000 now. That would have a new use now. ๐Ÿ™‚

#

Haha, found it, 6.8k now ๐Ÿ™‚ (slightly more years ago than I thought, 2019)

glad acorn
#

โ€Don't quite fit the criteria If you maintain something the ecosystem quietly depends on, apply anyway and tell us about it.

wide herald
distant void
#

So I've been thinking about the conversation from last week (? week before last?) on and off, about how to expose some caching of Version objects, and maybe other things, as part of packaging. At the time, I mentioned the multiprocessing context as a nice API. I've looked at that API, and I have some vague ideas about how it could be applied, but I'd need to play with it a lot. multiprocessing re-exposes the entire surface area of that module as surface area on the context object. All of the module level methods are just shims for def do_thing(): return default_context.do_thing().

Would this be interesting to pursue for packaging? And if so, at what level: is it a context API per packaging module -- not all modules -- or one giant top-level context API?

#

In principle, I love the idea of allowing users to do something like...

version_context = packaging.version.CachingContext(default_max_size=16_000)
parsed = version_context.parse("4.0.0dev2")
mental locust
#

My idea was there was some context object that allows you to set cache for different objects, and the in the __new__ method for the objects they would check whether to look up the cache, so the API in general wouldn't change, it would just be set context and forget

distant void
#

So it would feel more like a decimal context? Set it and you've "set state for the process"?

mental locust
#

I guess, I've never spent a lot of time with either of these. My priorities would be to keep it simple and backwards compatible, you want this to work with code you don't control (e.g. dependencies), and it doesn't have it's own noticeable performance overhead.

distant void
#

Yeah, I'm onboard with those goals. I wasn't sure if transparently changing the behaviors of packaging as used in a dependency would be considered good (applications get control) or bad (action at a distance).

#

I hadn't thought of intercepting things at __new__. I may give it a shot.

slow pagoda
#

It also can't interfere with multithreading in free-threading performance, or break on pyodide, etc. ๐Ÿ™‚

distant void
#

I was kind of hoping that I could skate by with it being managed in contextvars. I'm an optimist at heart. ๐Ÿ˜‚

#

I'm prioritizing performance in a single thread, since I think that will have the highest impact for pip.

mental locust
#

Yeah, when I said it shouldn't have a noticeable overhead, I meant when it's not enabled, it might be unavoidable to that an extra look up will cost some extra time when it misses, that's a choice the user of the library will have to make

distant void
#

I played around with a custom __new__, using a dict cache set in a contextvar. First quick run timing under 3.14 is very good with the cache set:

no cache | cache   | ratio | benchmark
1.80ยฑ0ms | 621ยฑ2ฮผs | 0.35  | version.TimeVersionSuite.time_constructor

So this works! But I'm not sure if you'll like the API I've put together here. I'm going to toss it up in a draft PR for discussion.

mental locust
#

Can but discuss !

distant void
#

Hrm. Locally I only saw a 10-20% penalty with the cache disabled, which was "high but maybe acceptable", but in CI the benchmark shows as much as 50%+ , which is, by my reading, just way too big. It's late here; I'm putting it down for now. But the idea needs to be reworked to be viable.

mental locust
#

We need to spend a bit of time looking at and figuring out scenarios

#

Maybe this is or isn't the right approach, we can figure it out

iron light
# distant void I played around with a custom `__new__`, using a dict cache set in a contextvar....

ah, my general idea was to use custom __new__ to choose either an int-based subclass (where possible) or a tuple-based one
and have the base data fields be directly comparable within the classes (and give the int one a conversion back to tuple), and rework the @propertys to restore the original interface. you don't need to cache a sort key computed in terms of other stuff, if you instead compute the other stuff in terms of the sort key, and have the object already be the sort key.

distant void
#

What I wanted to do was relatively unsophisticated -- just support caching in a MutableMapping[str, Version] (typically dict). You can see what I tried over here. Most of CI is mad because I didn't do any new tests yet (so coverage < 100%), but the benchmark job shows that just checking the contextvar (to see if a cache is available) dominates.

#

I'm sort of tempted to store a separate Version._cache_enabled bool, to see if eliminating the call to ContextVar.get() brings this back to a reasonable place. I still like the contextvar but it's just too costly used this way.

iron light
#

I think what I describe is relatively unsophisticated, but probably a lot more disruptive / more potential to mess something up

distant void
#

I got part-way through adding a bool and realized that it runs into trouble for the exact reason that I wanted to do it in a contextvar. Thread safety. ๐Ÿคฆ
If the cache is per-thread then you need to track it being enabled/disabled per thread.

mental locust
#

I'm not very familiar with threading code, can't we have a global cache rather than a per thread, then the bool can be global, and we can have a fast path reading the global cache, and a slow path writing to it with a thread lock?

distant void
#

Yeah, a global cache works. We can even do it without locking overhead if we prefer the possibility that you get two matching Version objects constructed in parallel (and "last one wins" for who goes into the cache)

mental locust
#

Oh yeah, that's true

distant void
#

I'm just all pouty because my clever solution doesn't work. ๐Ÿ˜†

mental locust
#

Ahaha

distant void
#

I made another attempt, but I'm not really confident that it's satisfactory...

slow pagoda
#

Should we require full SHA action pinning in settings? Also, do we want discussions and projects? Those are currently empty tabs.

#

Is there a way to validate pre-commit SHA #frozen tag in a pre-commit file? Similar to zizmor for GHA? (Validate that the SHA is also the tag in that repo)

mental locust
#

pre-commit config is very underdeveloped, I assume prek does, or will, allow us to specify a requirements files with hashes or a pylock.toml file, and then we use dependabot or uv-pre-commit to update that file over time . But it would almost certainly mean we can no longer use pre-commit and must use prek.

slow pagoda
#

They both support --freeze, I'm just wondering if we can validate those. An attack vector is to fork a repo, then make a PR with your hash and a fake tag listed in the human readable comment. Github can resolve hashes across repos.

#

(For the dependencies within the environment, yes, those are unpinned)

#

pre-commit/prek support many languages, which is why pre-commit won't add python specific locking support. Prek probably will.

mental locust
#

I don't quite follow, I'd need to sit down and think about it

glad acorn
#
  - repo: local
    hooks:
      - id: check-pre-commit-revs
        name: Check pre-commit revs are SHAs
        language: pygrep
        entry: '^\s+rev:\s(?![0-9a-f]{40}\b)'
        files: ^\.pre-commit-config\.yaml$
slow pagoda
#

I meant checking this:

 - repo: https://github.com/pre-commit/pre-commit-hooks
   rev: 3e8a8703264a2f4a69428a0aa4dcb512790b2c8c  # frozen: v6.0.0

The check should verify that 3e8a8703264a2f4a69428a0aa4dcb512790b2c8c does in fact match v6.0.0 in the specified repo.

(Though the above is also useful!)

#

As I could fork pre-commit/pre-commit-hooks, make a new commit, then pretend that's v6.0.0, and GitHub will resolve my commit. I'm not sure the rules for the resolution are (maybe I need to make a PR), but @quick glade showed this was possible.

#

I guess (A)I could write a little script to check it

#

Was hoping it might exist, but didn't see it in prek/pre-commit. prek validate-config doesn't do it.

glad acorn
#

but the way GH works, commits on forks also appear as commits on their upstream

#

there's a warning in the UI:

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
but can you programmatically check that? not sure

solar heron
#

this is cursed

slow pagoda
#

Yes, that's what I'm trying to protect against

#

GitHub Actions were susceptible to this, pre-commit/prek might not be if they just clone the repo, actually.

#

Nope, it is:

prek -a
...
cat /Users/henryfs/.cache/prek/repos/a063007234921320/README.md
# i like spam
sage marsh
#

can you push a branch named the same as a commit hash ThinkFoil

slow pagoda
#

Branches are not resolved from someone else's repo, just like tags

#

@north karma zizmor is almost doing this already, is it something that could be in scope?

north karma
#

@slow pagoda sorry, which part? zizmor already checks for impostor commits so that part should be covered, and IIRC github rejects branches and tags that would overlap with the object namespace (i.e. it won't let you push a branch or tag that looks like a SHA1 hash)

slow pagoda
#

I think it currently only processes actions, workflows, and dependabot; this would be a .pre-commit-hook.yaml with rev: 3e8a8703264a2f4a69428a0aa4dcb512790b2c8c # frozen: v6.0.0

north karma
#

ah, gotcha

#

hmm, i'm open to it, but i think you're right that pre-commit/prek shouldn't be vulnerable to this since a clone won't include those impostor commits

slow pagoda
#

No, it is, I just checked both

north karma
#

uh oh

slow pagoda
#
repos:
  - repo: https://github.com/henryiii/cibuildwheel
    rev: 58a0b274ea29c1e7899d45ab324b4ccdfc78d17d
    hooks:
      - id: mine

(It fails due to no hook file, but the repo it checks out very much does have the fake README Hugo wrote)

north karma
#

well, that's terrible, thanks github

slow pagoda
#

Space saving ๐Ÿ™‚

north karma
#

i'm not super familiar with pre-commit, this would affect both pre-commit-config and pre-commit-hooks, right?

slow pagoda
#

This would affect .pre-commit-config.yaml, since that's where you pull hooks from. It would also affect prek's new prek.toml, but that's not in heavy use for now (and dependabot just started supporting the yaml one). The hooks file can't pull from other hooks, so it's fine.

#

It's repos: [repo: <url>, rev: <sha> # frozen: <tag>]

north karma
#

OK, gotcha. i'll need to think a bit more about whether this is a good fit for zizmor (since it'd be the first non-GH specific input + TOML isn't something we do at the moment), but another option here would be me factoring zizmor's impostor commit logic out into a more general library that both zizmor and $newtool could use (but also i hate tool proliferation...)

slow pagoda
#

You don't have to worry about TOML for now, though it's something that might become interesting later if you go this direction, if prek wins out and poeple move to prek-only config.