#pip

1 messages Β· Page 5 of 1

finite perch
#

Thanks for the work! I knew the many file path tests were messy for 3.14.

hidden flame
#

Hahaha, oops, the new spinner from nested builds are leaking out on my inprocess-build-deps branch.

#

--use-feature=inprocess-build-deps is also quite a bit faster. 20s -> 12s. That's good to see!

finite perch
#

Yeah, doesn't hatchling have some insane nested build bootstrapping process? I imagine that would be significantly faster.

timid stag
#

(are there any known specific cases where a separate process would be needed for correctness?)

#

(or clear, concrete theoretical reasons?)

hidden flame
#

For example, pip has quite a few in-memory caches. The resolver is a good example. If the resolver was shared across the main install and the build environment install, there would likely be buggy behaviour/"interesting" solves.

timid stag
#

ah, I imagine it isn't designed to instantiate separate caches...

hidden flame
#

Well, for basically all of pip's existence, it was fair to assume that the install logic was linear and wouldn't be called more than once (not literally, but in one shot for the same environment.

#

In general, the codebase is structured well enough to support concurrent installs (I mean, keeping some separation is necessary for unit testing), but you do have to be careful around caches (which means generally not sharing them at all).

cosmic pebble
#

could make the caches thread-local I guess, no idea if that has any performance impact

#

also no idea if that will ever be a goal for you all

shy echo
#

I expect it would have some penalties since the caches are in the hot loops (for things like requirements and tags, which definitely take longer to parse and validate than a dict lookup on a str to get an existing object).

hidden flame
#

We don't use multithreading directly. The only threading pip does is through Rich which uses threads to manage live updates.

cosmic pebble
hidden flame
#

Yea, gonna say that's probably not going to happen any time soon.

shy echo
#

The main thing that would be interesting for that is actually the resolver and... yea, I don't think we have anyone interested in doing parallelization of that. πŸ˜…

hidden flame
#

I did publish a PR to parallelize .pyc compilation but that got bogged down in a discussion about whether it even makes sense to continue compiling .pyc files in the first place πŸ˜…

#

(which fair enough, no hard feelings there!)

shy echo
#

Last time I'd benchmarked it, pip was mostly I/O bound except for when handling wheel unpacking + all the metadata parsing. It's been a while tho, and I know more about software benchmarks/profiling now than I did back then lol.

cosmic pebble
#

remembers the time someone broke a parallel filesystem on a supercomputer via the import system. Lots and lots of nodes simultaneously trying to read and write pyc files…

hidden flame
hidden flame
shy echo
#

I was focused on the resolve side, hence why I missed all the pyc thingies. πŸ˜…

hidden flame
#

ah, sorry

timid stag
#

does wheel unpacking benefit from multiprocessing somehow? (does the standard library make that feasible?)

hidden flame
#

see, I refuse to touch the resolver with a 10ft pole.

#

that thing is a magical black box that I still haven't learned about

shy echo
#

Thou ist smarter than I.

hidden flame
shy echo
timid stag
#

FT?

shy echo
#

(I haven't benchmarked anything)

shy echo
timid stag
#

ah

hidden flame
#

anyway, my current priority (after managing the 25.2 release) is to get inprocess-build-deps landed.

#

I still have a few more hours of work left, mostly related to writing more tests, but it is close to being ready for review.

timid stag
#

well if you're grabbing multiple things and parallelizing at the level of entire wheels, I can certainly imagine it

#

also I would have thought that if multiple wheels are providing the same file you are going to have a bad time no matter what

hidden flame
timid stag
#

from my testing, the email parser just does a lot of stuff that's irrelevant to how the packaging formats use it

shy echo
#

Huh, I didn't think of that.

shy echo
hidden flame
#

(maybe switching to JSON would make everything faster since CPython has a C implementation of json IIRC?)

timid stag
#

for one thing, it sets up a "streaming" interface with a small page size and adds complexity around chunking the file, when most metadata files are fairly short anyway

ripe shoal
timid stag
#

and then after it's parsed all the fields, it does a bunch of validation that's based on expecting certain fields to have specific meaning in the context of an actual email

dapper laurel
timid stag
#

where "pretty big" is still tiny compared to the rest of the wheel, though?

dapper laurel
shy echo
#

I mean, there's also a world where someone contributes faster email parsing in the stdlib. πŸ˜…

timid stag
#

I mean, I would call it a parser for the specific format used by python packaging core metadata, which simply happens to be based on old email standards

#

I did imagine that I might do something like that as part of PAPER, though.

shy echo
#

(looks at his calendar to see if he can slot an entire "write an email parser in C" project in there)

timid stag
#

honestly, I wonder how many people are actually using the email standard library outside of packaging

dapper laurel
#

why do I feel like I now know what I will be doing this weekend lol

cosmic pebble
ripe shoal
dapper laurel
#

(I mean, outside of making the installer release that was due 6 months ago 😳 )

hidden flame
dapper laurel
hidden flame
#

been like a year since I wrote any C

shy echo
shy echo
dapper laurel
shy echo
#

pip would be a very different project if we could have compiled code in it/its dependencies. πŸ˜‚

timid stag
#

but also just as a quick check, the total source code in the email package is something like 8 times as much as configparser.py, and I don't think that what packaging does with those files is that much more complex than an ordinary config file

ripe shoal
#

I would expect email to be way more code than configparser, the email message format has a lot of hidden complexity compared to ini, and a lot more considerations around encodings and such

dapper laurel
#

yeah, I guess if we were to "just" take stuff that is needed for METADATA parsing, that would be pretty small and (relatively) fast package

timid stag
cosmic pebble
#

david hewitt gave a talk at the language summit about adding rust code to CPython and sadly it’s a nonstarter it seems like

timid stag
#

the email message format does have that hidden complexity. Core metadata, however, isn't expected to and IMO shouldn't exploit that.

hidden flame
#

we could just support email/json based metadata forever ... πŸ˜… .. πŸ™ƒ

cosmic pebble
#

at least until rust is better supported on obscure targets

shy echo
timid stag
#

yes, it would.

dapper laurel
shy echo
hidden flame
#

Honestly though. As long as Python keeps its email package and the old email-based metadata doesn't get updated after the json version is the default, keeping a legacy email based metadata backend wouldn't be that bad.

timid stag
#

but /me gestures wildly at all the packages that are still not using pyproject.toml for anything sensible, or are using setup.py for pure Python projects without a real build step

ripe shoal
timid stag
#

my experience has been that people often don't use new things unless they're forced to, and grumble about that force.

shy echo
timid stag
#

but yes, there's a clear enough way to do it and the opt-in shouldn't cause real friction for anyone. I'm nevertheless pessimistic about the process of getting a PEP through

#

in fact I, too, assumed pretty much that was what you had in mind

hidden flame
#

There will be enough transition with the new packaging governance :P

timid stag
#

but for example, people are going to argue about what should happen if both files are present and they disagree.

dapper laurel
#

if JSON file present use JSON, simple

ripe shoal
#

^ I would also say the PEP should require the contents between them to be equivalent

#

even if that isn't practically enforceable

timid stag
#

well, an installer could verify it by just parsing both files and raising an error if they differ. But now the experience is slower and less reliable for everyone. πŸ˜‰

ripe shoal
#

But I don't want to turn this into a PEP discussion channel πŸ˜†
I really should move this task up my TODO list enough to pop it off the stack...

timid stag
#

good luck, and thank_you for considering the issue at all

finite perch
finite perch
timid stag
#

(is there a good reason, in uv, that the compiled bytecode can't also be cached and hard-linked like the source files? All I can think of is that it'd have the paths to the cache versions of the files, used in tracebacks, but I'm not really seeing that as a problem)

finite perch
timid stag
#

Congrats on the 25.2 release by the way, even if you don't think it looks like anything groundbreaking compared to the previous one

hidden flame
#

Currently scheduled for 25.3 includes removing the setup.py bdist_wheel and setup.py develop mechanisms, plus two smaller deprecations.

hidden flame
timid stag
#

(actually, I had been wondering if it makes sense to expect alternate small and big releases going forward...)

#

I do recall the notes about 25.3 plans on the 25.1 release.

finite perch
#

I don't think so, I think big deprecation releases are more about did some particularly invested maintainer have some free time 6 to 9 months ago

#

And similar for features, I have two to three mid size features I would like to land in pip, but I've had no time this past few months

small cove
finite perch
small cove
#

btw if y'all have questions about parallelism in uv, hmu

hidden flame
#

the AI world has arrived to pip πŸ™ƒ

hidden flame
#

@finite perch hmm, I don't like the benchmark killerdog proposed:

class MockDistribution:
    ...

    def iter_dependencies(
        self, extras: list[str] | None = None
    ) -> Iterator[PackagingRequirement]:
        """Simulate expensive dependency parsing operation."""
        # Simulate some processing time for parsing dependencies
        time.sleep(0.001)  # 1ms per call to simulate parsing overhead
        return iter(self._dependencies)

    def iter_provided_extras(self) -> Iterator[str]:
        """Simulate expensive extras parsing operation."""
        # Simulate some processing time for parsing extras
        time.sleep(0.0005)  # 0.5ms per call to simulate parsing overhead
        return iter(self._extras)

They're using time.sleep instead of using actual distributions with actual dependencies.

#

I wouldn't be surprised if dependency parsing did indeed take a non trivial amount of time, but I would like to see actual scenarios. You're welcome to review it, but I'm ambivalent about spending my time on the PR since it seems to be heavily AI-assisted. The work is probably fine, but there is a distinct lack of "wait, is this change even warranted?" IMO

finite perch
#

I agree that it is likely AI assisted, but it does look about correct, and I know some real world scenarios to see if it really does help

hidden flame
#

Yeah, it's really a matter of patience. I do not have the patience to review work with this level of AI assistance because I don't want to discuss/argue (the technical kind, not the social kind) with an AI, even indirectly.

hidden flame
timid stag
#

yeah a priori someone who submits a benchmark of a simulation seems not in control of the PR

if the goal is to demonstrate something about how often cachable work is repeated, it should instead just... count the times that code is hit

finite perch
finite perch
#

It has empathy, evidence, and reasoning for what the issues are.

I'm hopeful some people will read that and learn from it, even if we never directly see the fruits of that learning.

I was dealing with an OSS maintainer just recently who was not willing to engage at all in discussion of a feature. I don't blame them but they have to accept that's why they get a high % of frustrated comments on their repo.

timid stag
#

(I'm a little scared that the AI can generate valid prior issue numbers, even if they're nonsense in the current context)

#

(also, the reply you got is textbook AI, right down to the exact tone of contrition)

#

in the future I'd recommend cutting this sort of thing off sooner. @shy echo absolutely made the right call there. AI users who are willing to switch away (or think they have a specific, reasonable justification for using it) will change their tone (very literally, since their natural writing style might be anything, but is almost certainly radically different) in the first reply. This one doubled down, and IMO became more obvious in using AI as the thread progressed.

hidden flame
#

I'm likely going to reuse my long reply as a saved reply. I'll tweak it to more compact so it can used whenever I need to close issues/PRs for being useless AI garbage.

hidden flame
#

Lesson learned, I suppose. It's a shame that is the takeaway from this, but my (previously unreasoned) hatred for AI was right. Β―_(ツ)_/Β―

hidden flame
#

This is not how anyone talks.

hidden flame
#

FYI, I'm going to take a break from pip (or at least typical maintainer duties) for a little bit. I'll be around to manage the 25.2 release cycle, but I do think I'm getting a bit close to burning out on pip, so I'll step away now to avoid hitting that valley.

#

(I know that I can do this at any time, but saying it out loud makes it easier to actually follow through on taking a break, haha)

finite perch
#

Makes sense, take a break, no expectations!

timid stag
hidden flame
#
error: unsupported-wheel

Γ— Wheel black-25.1.0-cp312-cp312-manylinux2014_arm64.whl is unsupported on this platform
╰─> You're on Linux x86_64, but wheel requires Linux arm64

I'm using my break from maintainership to work on error messages (which is always good fun). Currently working on improving unsupported wheel errors.

#

The tricky part is handling wheels that support several tags, which is quite common with mac/linux wheels.

#

I'm pretending that they don't exist for the time being, but it would be good to do better.

naive fractal
#

Trying to figure out a fix for a pip-tools bug (ref), I come to a question about pip.
How is pip handling -r and -c when the input is some URI-looking thing? It seems like http/https might be the only supported schemes, and everything else gets treated as a filesystem path?

#

Oh, I went down a bad path while spelunking! Just doubled back and tried tracing it out again, found it in _internal/req/req_file.py

SCHEME_RE = re.compile(r"^(http|https|file):", re.I)

tells me exactly what I wanted to know.

naive fractal
#

The fact that it's being checked in two different contexts in two different ways sort of feels like maybe there's some datastructure which could keep track of this info. But it's not obvious that there's anything worth adding. And I've now got confidence to go make pip-tools ape internal pip details more. πŸ˜…

hidden flame
#

The answer is almost certainly: pip is a 15+ year old project, the codebase is a mess πŸ˜…

naive fractal
#

I don't believe that critical global infrastructure maintained by a team of unpaid volunteers could be a mess! You can't fool me! There's some deep logic behind using a regex and url splitting for this, I know it. πŸ™ƒ

#

Although it feels instinctively messy, there's no easy win I see here. Maybe some abstraction which gets attached to PipSession like session.path_analysis[filename] could work, but I'd have to actually try it to see if it just makes things too opaque

finite perch
hidden flame
#

What's the problem with that?

finite perch
#

I mean, I guess it's fine, the branch will be deleted once the pull request is closed

hidden flame
#

mhmm

finite perch
#

Just was unintentional

hidden flame
#

Dependabot creates new branches on our repository and then deletes 'em. All is fine.

#

It's only if external users start to depend on our branches that we'd have problems.

finite perch
#

@hidden flame I've been thinking about installing build dependencies in-process & build/constraints. And I think I've got a good way forward. I will independently create a new "use-feature" called "build-constraint" that works in the new way: constraint only affects runtime dependencies, build-constraint will affect all build processes.

Then inprocess-build-deps will match this same behaviour (which I think will be much simpler for inprocess-build-deps than trying to match the existing behaviour), so in effect inprocess-build-deps will imply build-constraint

This will allow build-constraint to be accepted on it's merits and not be smuggled in via inprocess-build-deps , and will allow for an appropriate deprecation period for the existing behaviour .

hidden flame
#

I'm speaking my mind 100% here, but didn't we reject such a design for --resume-retries? I proposed that we use --use-feature to roll it out which was rejected for being too complicated. Looking back, I agree with that decision

finite perch
#

resume-retries didn't break existing workflows, changing the behaviour of constraints will

#

I've written out a backwards compatible way to implement --build-constraint here but it's non trivial: https://github.com/pypa/pip/issues/13300#issuecomment-2787887526

And I think it would be tricky to implement the existing behaviour of PIP_CONSTRAINT \ constraint with inprocess-build-deps? But if you plan to preserve compatibility for all uses of constraints with inprocess-build-deps then I can go back to that table.

hidden flame
#

It would be easy to get inprocess build deps to support PIP_CONSTRAINTS, but it would mean --constraints is also supported.

#

I could probably hack the CLI code to disable the forwarding if given as a flag, but I would 100% want that to be temporary as that's an awful hack.

finite perch
#

That breaks the existing behaviour that you pass runtime constraints with --constraint and build time constraints with PIP_CONSTRAINT

hidden flame
#

Yup.

#

Our options are:

  • Add inprocess-build-deps w/o constraint support at all and wait until --build-constraints is added
  • Add inprocess-build-deps w/ backwards incompatible constraint support (and then I guess go through a deprecation cycle once --build-constraints is available
  • Add inprocess-build-deps w/ equivalent constraint support so it doesn't affect the constraints deprecation cycle at all
#

Lemme see how easy it would be hack in support for detecting where constraints come from.

#

It's probably not too bad. Envvars are supported by setting their values as optparse defaults. I could track the value of PIP_CONSTRAINTS somewhere and then see whether the final constraints value matches during command initialization. It wouldn't be perfect as you could pass the same constraints file via the flag and envvvar at the same time and get a false positive, but that seems fine?

finite perch
#

I'd rather you not have to do any of that and just use a new --build-constraint but I don't know if the timing will work

hidden flame
#

We'll see what needs to be done once my PR leaves draft status.

finite perch
#

Okay, I have a local branch that implements --build-constraint as I described above, I need to do a bit of clean up but if I get some time tomorrow I should have a PR out

tribal mica
#

patch_check_externally_managed() in test_pep668.py doesn't seem to be working as expected (the externally managed error is not being raised) when using 3.12 think I've got a fix for it just waiting for the tests to run through

tribal mica
#

Maybe I was missing some nuance here as sunk more time into it than I was expecting to fairly new to pytest as work just uses unit test will take a better look later ended up calling it for a moment earlier

finite perch
hidden flame
#

I'll be back from vacation starting tomorrow, although I will have limited availability.

tribal mica
#

update @finite perch was actually related to an issue you raised back in 2023, (https://github.com/pypa/pip/issues/12329)

it appears I am blind and missed the note pointing this out on the docs here -> https://pip.pypa.io/en/latest/development/getting-started/#running-tests

GitHub

Description I have been trying to submit PRs to Pip recently but I have having problems running the test suite locally as I get ~14 failures on a clean branch of Pip running the following instructi...

fleet widget
#

how to force pip use UTF-8?

fleet widget
#

thx

finite perch
timid stag
#

"use" in what sense?

#

the data formats all mandate utf-8 in all the places where a text encoding would be relevant as far as I can think; and Pip's own output is plain ASCII except where it interpolates package-dependent stuff as far as I can think. So that leaves... file names?

ripe shoal
#

setup.py is not required to be UTF-8 on all platforms IIRC

#

(or generally any python code executed as part of the build backend execution)

timid stag
#

ah, true

#

but idk why you'd want to force the encoding there; if it doesn't declare one then Python makes it utf-8 in 3.x and if it does then overriding that would probably break it

hidden flame
#

@finite perch are you using πŸ‘€ as an indication that you've read my review comment?

#

I'm a little confused with all of the πŸ‘€ you're leaving

finite perch
#

Is that not what that emoji generally means?

#

FYI, I'm not going to be near a computer till at least Monday (I meant to grab my laptop so I could do a little work this weekend while I ride trains, and I completely forgot), so I was using this emojis as a place marker to me that meant "probably correct but I need to look at it when I have access to a computer and code again"

hidden flame
#

Interest is similar to looking at, but there's an implication that the thing of interest is notable.

hidden flame
finite perch
#

I thought emojis were unambiguously though πŸ™ƒ

#

(heavy sarcasm)

hidden flame
#

πŸ™ƒ I definitely don't hate you at all πŸ™ƒ

#

Possibly the worst emoji for having one universal meaning.

#

FWIW, I have no issue with it. If it helps you, I don't mind it.

ashen geyser
#

issue is that GitHub only has a handful emoji, so people overload them

hidden flame
#

@finite perch if you have any PRs you very much want me to look at, let me know

#

I have limited availability on an ongoing basis, but I have some time tomorrow if you have anything that's high priority.

#

I would look through my github notifications, but it's already a mess. I'm fine with being told what I need to look at :)

finite perch
#

@hidden flame thanks for letting me know but don't worry about it, no pressure from me, you get on with whatever life stuff you need to get on with.

My build constraints and uploaded prior to PRs are ready but I wasn't expecting them to be approved soon. I have more PRs down the pipeline, but nothing that would be ready by tomorrow.

finite perch
finite perch
#

And there may be a few other small PRs, if you get to them, but otherwise I will try and make some time before 25.3

rapid blaze
#

Technical details question: I just noticed that pip writes RECORD files with CRLF line endings on Linux, rather than with LF line endings. Is there any particular reason for that, or just an accident of history?

fallen scroll
#

It's likely due to how the csv module works.

lunar gyro
#

I don’t see how it would though, pip specifies the arguments as suggested by cpython doc afaict

fallen scroll
#

It's pretty easy to reproduce:

import csv

with open('test.csv', 'w', newline='') as f:
  writer = csv.writer(f)
  writer.writerow(['a', 'b'])
finite perch
#

This is because csv defaults to the "excel" dialect and this models Excel which always writes CRLF

#

I think it would be fine to specify the line terminator as \n in the writer constructor

ashen geyser
#

Regular reminder that almost every language existed before CSV was β€œstandardized”, so nobody can rely on anything and using CSV will destroy data.

For RECORD, it’s not that bad because at this point we have a lot of data and tests, but going forward I strongly recommend not using CSV for anything.

#

I’ve seen so many PhD students despair because they found out months after they started using a dataset that they (or someone before them) irrevocably destroyed some of their data, just because of CSV.

finite perch
#

Regular reminder that almost every

timid stag
#

it honestly somehow hadn't occurred to me that the RECORD format is CSV o_O

#

you know, just a bunch of lines with values that are separated by commas, nothing that merits a special name or anything clearly πŸ™ƒ

rapid blaze
#

The exact format doesn't come up very often, but I don't recall Tarek getting much pushback when proposing CSV, since it's a pretty natural fit for recording the size and hash of a whole pile of installed files (the PEP specifies the field delimiter, the quote character, and the use of universal newlines when reading, so the most egregious causes of CSV incompatibility were addressed up front).

#

It's awful when the field inclusions are more variable though, hence the use of other formats everywhere else.

hoary mist
#

I think if we were designing RECORD today we probably wouldn't use csv, but not for any reason other than it's kinda nice to minimize the number of different file formats you have to deal with

lunar gyro
#

And now I’m reading it, PEP 376 actually said RECORD should use os.linesep so technically pip is not implementing it correctly in non-Windows platforms.

ripe shoal
#

IIRC I also noticed that several tools did not support paths with backslash paths even though absolute paths with "the local platform separator" is listed as valid in PEP 376

timid stag
ripe shoal
#

It would mean packaging tools would need to handle both cases regardless of platform

fallen scroll
#

It's irrelevant anyway, as the current spec does not say this.

polar nova
spice owl
lofty cipherBOT
#

The format of the file is TOML.

Tools SHOULD write their lock files in a consistent way to minimize noise in diff output. Keys in tables – including the top-level table – SHOULD be recorded in a consistent order (if inspiration is desired, this PEP has tried to write down keys in a logical order). As well, tools SHOULD sort arrays in consistent order. Usage of inline tables SHOULD also be kept consistent.

polar nova
spice owl
#

After all, it's a file that will be reviewed by humans in PRs

stuck girder
#

it needs to be human readable:
https://peps.python.org/pep-0751/#rationale

The file format proposed by this PEP is designed to be human-readable. This is so that the contents of the file can be audited by a human to make sure no undesired dependencies end up being included in the lock file.

polar nova
#

Ahhhh

#

That makes more sense

polar nova
finite perch
polar nova
#

Ah, thank you

timid stag
#

other installers would have quite different needs. Many of pip's dependencies don't really relate to the task of actually obtaining or installing a package; e.g. rich used for pretty terminal output

lofty cipherBOT
polar nova
#

I ended up finding that someone published a library anyhow

timid stag
#

ah, I think I misunderstood what you were hoping to accomplish, then.

dapper laurel
polar nova
#

Doesn't pip rely on build/installer now?

dapper laurel
#

nope

#

there are some vague plans, but that is nowhere near happening right now

polar nova
#

Ah no, it uses resolvelib, dependency-groups, pyproject-hooks, and distlib though

#

All libraries I've looked into using myself for my own tool

dapper laurel
#

especially since pip often has some special cases (for historical reasons) that are non-standard now

dapper laurel
#

TIL there is a dependency-groups lib

polar nova
#

Heck I didn't even know about that until recently

#

It's interesting to see pip rely on both distlib and packaging

dapper laurel
#

*distlib

#

distlib offers some things that packaging doesn't IIRC

finite perch
#

Yeah, distlib does the whole entry point script stuff, packaging doesn't do that sort of thing

polar nova
#

And what does packaging offer that distlib doesn't?

polar nova
finite perch
timid stag
dapper laurel
lunar gyro
#

distlib is more like a rewrite that (mostly) never caught on

#

Everything newly introduced into pip after a certain point was done in a new lib and vendored into pip, but things that existed in pip prior were kept there and some of them have been duplicated into new standalone libs

pale epoch
#

i have just spent the past month rewriting package finding in several phases and produced a rewrite of several parts of the packaging library which is surprisingly inefficient (and i haven't yet addressed marker evaluation which is performed in this nested loop each time even though like the rest it should really be parsing from a string once and then sticking to it)

#

was just about to turn to resolution so very good to hear this about distlib. resolution is funny because unlike package finding you can give different yet not incorrect answers

#

people also will state that package resolution is NP complete but i've never seen a proof for that being the minimal upper bound or what refinements of the problem would be required to enforce that. which means SMT or ASP solvers would be less justified and an approach with greater explainability like resolvelib's could be weighed more easily against potential performance concerns

finite perch
pale epoch
#

spack (which designs its own dependency specification language) uses an ASP solver which is a very fascinating technique that i suspect to be more appropriate for highly sparse problems with mostly wrong answers like packaging tends to be due to the initial grounding phase. but i didn't have time to work on that when i worked on spack

small cove
pale epoch
#

i'm not sure that i agree with the boolean parametrization being the most general as versions have an ordering to them which is the kind of structure that boolean satisfiability elides by assuming the booleans are unrelated.

#

pubgrub is exactly what we were looking at for spack

small cove
#

i would go even further and say there's an ordering not just in the variables, but also in the solutions

pale epoch
#

we do something very interesting with spack by encoding error facts into an optimization solve

#

very very impressed at that page from astral

#

Modern package managers often use logic solvers (SAT, ASP, SMT, CDCL, etc) for dependency resolution. Logic solvers are highly efficient at solving NP-complete problems, but often give very little information when a solve is impossible. This talk explains the solver methods used in Spack to introduce legible error messages for users, including g...

β–Ά Play video
#

i was actually considering whether to develop a separate rust tool specifically for package finding since i'm now finding parsing to be a significant contributor to runtime and i think the caching assumptions you can make for the process might be useful to standardize.

#

remarkably good page on resolution from astral, better than any i've possibly ever seen. the particular focus on runtime as a matter of i/o is really important but it also treats the reader as someone who might be interested in the semantics. i doubt it would be useful to standardize resolver behavior but this has me thinking it could be possible and useful

pale epoch
pale epoch
#

i was very dissatisfied with PEP 751's lack of mentioning the years of work that went into install --report while introducing new syntax to marker evaluation but explicitly unspecified semantics which any subsequent specification would then necessarily be breaking so am hoping to propose a specification for that. but will be looking to expand its output to cover the cases of multiple interpreters and environments since that is also where i'm hoping to make use of it.

rapid blaze
# lunar gyro distlib is more like a rewrite that (mostly) never caught on

Sort of. distlib is of the same vintage as distribute and distutils2 (the former defunct after it merged back into setuptools itself, the latter defunct after the rewrite failed to land in Python 3.3), but it represented Vinay's (successful) effort to decouple some nice setuptools and distutils2 features from wholesale adoption of those libraries (since neither of them worked very well as "toolkit" libraries). It was one of the earliest examples of a "not pip or setuptools" consumer of the metadata interoperability standards.

finite perch
#

I'm guessing Anaconda has made an update recently where their base environment includes a package with an invalid version, we've got a few reports all within a few days

#

When I have time this week I'm going to investigate, shame I no longer have a paid support line through my employer

timid stag
hoary mist
#

virtualenv was ianb I think

#

venv was vinay IIRC

#

but my memory is terrible

rapid blaze
timid stag
#

ah

#

I must have misread something

rapid blaze
#

Pretty sure Vinay was one of the early venv maintainers, though (I don't know if he contributed to virtualenv, but it wouldn't surprise me given he did enough packaging related work to want to create distlib)

timid stag
#

yeah, I probably just saw the name in source code or something

cosmic pebble
#

oh wow virtualenv is older than I thought

#

i looked at the commit history - always fun finding 00’s vintage commits in Python projects

native obsidian
native obsidian
#

I'm guessing Anaconda has made an update

native obsidian
polar nova
rapid blaze
polar nova
#

I'm probably going to use it for a tool I'm working on (once I have the time again)

uncut sky
finite perch
uncut sky
#

If you have pip, why manually add these wheels for every release.

finite perch
#

Pip is supposed to install your package, it doesn't have to build it if it's already built. If a package has a million users, it makes sense to build it once for each platform rather than having each user build it individually, a million times.

#

If you want to build all your own packages you can pass pip --no-binary ":all:" in your install commands, it will be a lot slower and you will need to do a lot more prep work in your environments.

shy echo
#
uncut sky
#

I remember it made one when I use twine build dist (was it sdist or bdist).

shy echo
#

The only binary distribution in the Python ecosystem is wheels at this point.

#

You (ideally) generated both a source distribution and a wheel, which is the default behaviour for most of the build tools as well.

uncut sky
#

So, can we supply multiple wheels to pip?

#

Or do you HAVE to manually add it to releases and force people to download it manually to use it (not using pip).

shy echo
uncut sky
#

Example of what I'm asking about.

#

You see how they have wheels there as opposed to just a .zip and a .tar.gz.

shy echo
#

They're all wheels for different platforms -- you'll install only the one relevant to your OS + Python version.

#

(OS + Python version β‰ˆ platform, for this purpose)

uncut sky
#

You have to manually generate them yourself one by one?

dapper laurel
past pagoda
uncut sky
#

Is there a way to automate the generation of these and uploading them to pip in a workflow?

uncut sky
past pagoda
uncut sky
#

Thank you so much Doggo!

finite perch
#

I finally have fixes for https://github.com/pypa/pip/issues/13568, but I have to do some real gymnastics to get the pep668 working on Ubuntu 24.04+, does anyone know who I should ping when I raise the PR to fix these tests?

uncut sky
#

Oh nvrmind, you have it too.

finite perch
past pagoda
uncut sky
timid stag
# uncut sky I thought we could supply a single wheel to pip.

if you have only Python code (it's okay if your dependencies have non-Python code), then you can. Multiple wheels happen when you need to compile code for multiple platforms/ABIs, or when you really need to make different releases that are specific to a Python version (maybe because you're doing something advanced with Python bytecode)

#

We don't have .zip distributions any more.

uncut sky
#

Is there any harm in including them?

#

Or any advantages?

timid stag
#

Your build tool will normally produce a single wheel, with "tags" that indicate that every system can use it (subject to the python version restrictions that you put in metadata).

past pagoda
#

I wonder
Will cibuildwheel produce a single wheel for a pure-Python codebase?

timid stag
shy echo
timid stag
#

agreed, can the above get moved perhaps, sorry for distraction I just like to answer stuff

past pagoda
unreal jungle
#

Hi everyone, I’ve been contributing to pip since mid-July (a handful of PRs and some issues so far), and I’d love to get more involved by helping with issue triage. I’m not sure if now is the right time to ask, or if I should keep contributing a bit longer first, so I’d really appreciate any guidance from the maintainers.
Tagging @finite perch and @hidden flame since you’ve kindly reviewed my PRs from the start, hope you don't mind the ping πŸ˜….

For reference, my github is https://github.com/sepehr-rs

hidden flame
#

@unreal jungle just a heads up, the pip core team has very limited availability. I'll bring this up with the team and I'll let you know what we think!

finite perch
#

Oh yeah, I meant to say the same, I've been ill this weekend

unreal jungle
#

Thanks a lot @hidden flame and @finite perch, appreciate you both taking the time. No rush at all, I’ll keep contributing in the meantime. And hope you’re feeling better soon, @finite perch!

hidden flame
#

I'm now quite ill, too πŸ˜”

pale epoch
#

currently working on i think a sqlite db to store the result of finding python packages in a queryable way and it aligns a lot with some of the design ideas here

#

the incremental changelog part is super interesting. it was rejected for cargo in 2020 bc it didn't help cargo performance much. but i would be very curious about its effect on bandwidth usage

#

tensorflow's index page is 1.1M, setuptools is 766k. using http caching headers we can avoid refetching unless there's something new, but if so we still have to fetch the whole megabyte

#

the json API puts new entries at the bottom, so it could be a range request but that's so hacky

hoary mist
#

are you considering compression in that

pale epoch
#

that's plaintext json. you're totally right

hoary mist
#

I've been poking at adding zstd, and maybe zstd with a shared dictionary to the compression too

#

I dont' think those would help pip though until it can upgrade to a newer requests

#

and the latter would require work in urllib3 to add support for it

shy echo
#

IIRC the JSON compresses fairly well precisely because it's very repetitive/compressible.

pale epoch
#

ok the gzipped tensorflow index is 198K (compared to original 1.1M)

hoary mist
#

unless it's changed pip does cache those too (though it uses a max-age of 0 so it does a conditional get each time rather than just blindly using the cache)

pale epoch
#

my pip fork has a separate adapter class for index pages

#

the builtin one has semantics which i think are wrong

#

it overwrites caching headers for you and that's fine for wheel downloads but not when we're doing fancy footwork

#

i also rewrote package finding. i think it's really good and makes sense but it's a big change so i'm not trying to get it merged atm

#

i do however have some PRs to make to the packaging lib. it's very slow

#

my current idea learning from that work is to separate package finding from resolving entirely. so i'm making a rust tool for that which manages the index fetching and caching and exposes an IPC protocol

#

only problem is that if i want it to be a database of package metadata, it also needs to be able to download wheels etc. so that's a scope increase but i'm still hopeful

#

i noticed that pypi provides upload-time for each version and it would be so cute if we could just tell pypi to only send me uploads newer than a given timestamp. seems too easy

hoary mist
#

the problem with things like that is it requires dynamic computation

#

right now we rely heavily on CDN caching to scale pypi

shy echo
#

The last time I looked, the PyPI backend was handling ~1000 req/min, while the CDN was multiple orders of magnitude more.

hidden flame
#

FWIW, I'm willing to make time to cut a security bugfix pip release tomorrow. I'd literally just cherry-pick the PR on top of the 25.2 tag and cut that.

cosmic pebble
hoary mist
#

yes

ember shuttle
#

And that's only for the index, not for files

shy echo
#

I guess that's the exponential growth everyone keeps talking about with PyPI? πŸ˜…

finite perch
#

grumble grumble CVE grumble grumble

finite perch
rapid blaze
hoary mist
timid stag
#

(sorry, what are the bugfixes that would motivate a security release?)

hidden flame
#

I'm of the opinion that the current issue is worth cutting a security bugfix. The impact is not huge, but it's also annoying when the literal package installer has a CVE and is breaking your builds.

#

The main deciding factor though is that the pip project has extremely limited maintainer resources. We can't commit to always cutting security releases.

#

At that point, it also becomes a balancing act of whether the maintainer time spent on a security release is worth it or would have been better used on development or maintenance.

stuck girder
#

3.14.0 final is due on Tuesday, will CVE scanners complain about that too?

hidden flame
#

AFAIU, the CVE is linked to pip only. Even if you're using a modern enough CPython version that has a hardened zipfile tarfile implementation that isn't vulnerable, CVE scanning will still flag and fail the build.

#

There isn't a way to declare that the CVE is applicable to "pip but only on these python versions"

hidden flame
#

I still think the CVE is mostly meaningless though. Yes, it is a valid attack vector, but if you're using source distributions, there are so many other better and easier ways to get compromised. They can include a setup.py that deletes everything if someone wanted to.

finite perch
# hidden flame I'm of the opinion that the current issue is worth cutting a security bugfix. Th...

Every CVE detector is very clear that it's not a perfect system and users will need to review with nuance. That's just the nature of using a CVE detector.

I agree it's annoying for some users, though we've actually not got that much feedback all things considered, but that's what you sign up for by blocking your CI pipeline with CVE detection.

It's really a question of how much effort it is for you, because so far you're the only one who has volunteered to do a release for it.

finite perch
hidden flame
#

I dunno. I guess I've just read too many threads in the JS ecosystem where people are freaking out over CVEs

past pagoda
finite perch
#

JS ecosystem has different, expectations and preconceptions. JS code at runtime is in a sandbox, so not much can go wrong, build time is where the pain is because it's often not sandboxed.

Python code is rarely sandboxed anywhere , any malicious code can affect every stage of the lifecycle.

past pagoda
#

(Do containers count as sandboxes?)

finite perch
#

No, lol

past pagoda
hidden flame
#

They help, but they shouldn't be your only defense.

timid stag
#

...is it actually valid by the standard to have symlinks in an sdist in the first place?

#

(and what would happen on Windows without the symlink-creation permission bit?)

dapper laurel
#

IIRC no

#

but don't quote me on that. I just remember symlinks being a problem in packaging

timid stag
#

mm

ripe shoal
#

Symlinks work in sdists, but are not encodable in wheels.

fallen scroll
dapper laurel
#

ah, that might be it

timid stag
#

ah, .tar represents them because it's based on ancient unix standards, but .zip wouldn't... that much makes sense I suppose

timid stag
#

but that still entails giving access to it that may not be desirable

#

not sure that actually escalates from what setup.py can already do... ?

hidden flame
#

It doesn't. If you're using a source distribution, pip can and will run arbitrary code by design in many cases.

timid stag
#

right, but my point is also that "arbitrary code" run as a user can also access the same files that the symlink extraction process (run as the same user) can. Which I guess is obvious, but.

finite perch
#

It would be some really convoluted scenario to be a security risk, like you've done a static analysis of all the package, and the build backend, but you didn't consider it could symlink to outside the archive, and that somehow started a chain of unexpected behaviors that is exploited

hidden flame
#

There may be organisations that have their own known-good wheelhouse of build backends (none of which include ACE) so they can safely use source distributions, but I consider that unlikely given it'd be easier build/distribute their own internal wheels and just ban sdists outright.

#

Anyway, that would be a weird place to draw your security boundary.

finite perch
#

Perhaps an organization builds an sdist in a sandbox VM, they only allow the Python process to access a fixed number of directories , but the link that points outside the archive sets of a chain that allows the build process to escape the sandbox VM?

#

Like it points to a vulnerable implementation of sudo or some system util that implicitly sudos (because stuff like that happens in OSes for some reason, looking at you ping), and the process is able to escalate itself, etc. etc.

hidden flame
#

I have a little bit of free time for a few days. I will try my best to review as much as I can to unblock the release.

#

I've reviewed the build constraints PR again tonight. It looks pretty good, but there are some issues with the tests.

finite perch
#

I honestly struggled making tests for that one initially

unreal jungle
#

Hi @hidden flame and @finite perch, just wanted to follow up on my earlier message about helping with issue triage πŸ™‚
No rush at all if things are still busy, I’ve continued contributing and have informally triaged a few issues and PRs. I just wanted to see if there’s anything else I should do or keep in mind for when the team’s ready to discuss it.

finite perch
unreal jungle
#

Thanks a lot to both of you, @finite perch and @hidden flame :), really appreciate your support.

hidden flame
#

@Daylily I messaged this screenshot because I was blown away by the amazing traceback and was told you're the one to thank for it. Incredible work! Definitely going to make debugging easier for me.
@shy echo someone likes your --debug flag that you added.

#

Not sure how I got the compliments, but consider this my 307 response to redirect to you :)

ripe shoal
#

Reading over https://github.com/pypa/pip/issues/12712, I gather that zip extraction is somewhat of a bottleneck in installs given faster bytecode compilation. I recently landed a change to 3.15 that should make decompression 10+% faster for files >1MB (https://github.com/python/cpython/pull/139976). I wanted to offer I'm happy to work with folks on changes in core Python that would help with pip performance

GitHub

The Python package installer. Contribute to pypa/pip development by creating an account on GitHub.

GitHub

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

finite perch
ripe shoal
timid stag
#

I hope the person behind that PR did get in touch with CPython.

finite perch
#

Though I will say, it is a common pattern to see that people give up after receiving their first rejection, even when the rejection is "you should discuss this over on Y, or raise a PR on Z instead". I sympathize, but I don't know what else I can do about it.

timid stag
#

Probably nothing, realistically.

ripe shoal
#

So I have sketched a refactor of zipfile (under the hood) which should make it easier to reduce locking and minimize the seeking done during decompression https://github.com/python/cpython/issues/136741#issuecomment-3413333521

A couple of questions I had while designing this is if stdlib allowed overriding the default decompression methods (i.e. libraries could register for ZIP_DEFLATED their own code to run)

  1. Would you consider using such a mechanism? I could imagine using zlib-ng or zlib-rs would provide a significant speedup to install speed vs normal zlib. Allowing an optional add-on speed up pip would be interesting
  2. On the flip side, would pip need to protect against 3rd parties modifying the default deflate decompression for security purposes?
finite perch
ripe shoal
#

Gotcha. I was imagining a third party could replace the default deflate decompressor with a zlib-ng decompressor which would immediately speed up pip. This could be done transparently to pip

timid stag
#

meaning, the system library that zipfile interfaces to? πŸ˜• or else how would it be transparent to pip

#

oh, you mean, declare a dependency, but no code change?

ripe shoal
#

Right so basically users would be able to install e.g. zlib-ng and zlib-ng would be able to do something like zipfile.register_decompressor(ZIP_DEFLATED,...) and zipfile would use zlib-ng from the zlib-ng package going forward.

timid stag
#

ooh

fallen scroll
#

Why couldn't Python be compiled against zlib-ng in the first place?

ripe shoal
#

It can! (and is on Windows and I believe Fedora)

#

But that isn't true across all distros

spice owl
finite perch
#

Windows is not happy time zone datetime object from the year 3000 onward, and this broke a few of my tests on Windows, glad to see Microsoft is already planning to have Y3K support contracts

timid stag
#

Just saw the notifications re #13482 and #13534. Congrats! These were important issues to solve and I've been "following" the corresponsing bug reports the whole time

#

Definitely has me excited for 25.3 (along with the other stuff mentioned in the 25.1 release)

#

(also, why expand year fields to four decimal digits to solve y2k when clearly you only need three ;) )

finite perch
# timid stag Just saw the notifications re #13482 and #13534. Congrats! These were important ...

Yeah, I'm glad I finally decided that breaking backwards compatibility with PIP_CONSTRAINT used to defined build constraints was the way forward, it makes the UX a lot simpler, and allows us to improve stuff behind the scenes with a lot less headaches.

I almost implemented it in a backwards compatible way and I'm not sure anyone other than me would have understood how every combination of constraints and build constraints would have worked.

hidden flame
#

Sadly almost nothing I'd been working on will make it for 25.3 or possibly 26.0 either.

#

I really do want to get in-process build dependencies in at some point, though.

timid stag
#

:(

finite perch
#

Richard says that but if it wasn't for him we probably wouldn't have got build constraints over the line for this release, reviewing is as important as authoring

hidden flame
#

@finite perch congratulations on the release! (this time in the right channel)

#

I'm sad that I can't write a post for this release since it contains some noteworthy changes, but alas, I'm so busy πŸ™ƒ

finite perch
#

Thanks! Pretty harrowing there for a moment as I messed up the commit order and accidentally published 26.0dev0 🫠

hidden flame
#

I suppose we'll find out who is monitoring pip releases like a hawk sooner or later, hah.

#

Stuff happens. You got it fixed promptly. That's what is most important in the end.

finite perch
#

Yeah, fortunately I'm used to high stress deployments, I've done a lot of manually copying files in place and running a series of 40 steps for multi-billion dollar trading systems before, ahaha

hidden flame
#

I hope those 40 steps were documented at the very least.

finite perch
timid stag
#

well, there's no 26.0dev0 on the PyPI index (even yanked) so it certainly could have been worse

#

anyway congrats πŸŽ‰ I recall from the 25.1 post that this one was expected to have quite a bit of interesting stuff

finite perch
#

Yeah, I directly deleted it, I think less than 3 mins after it was uploaded, I made the decision that was the better choice than yanking if I did it quick enough.

#

In most cases I would prefer yanking, but because this had a higher release number than the intended one (even if it's a pre-release) and because I was catching it so early, quicker than hopefully PyPI mirrors would notice, I went that route

shy echo
#

Plus, it's unlikely folks would complain that they can't use a dev release. And, "don't do that" is a reasonable answer for us to give IMO.

finite perch
timid stag
#

Use a temporary directory in the wheel cache to build wheels, so the built wheel is always on the same filesystem as the wheel cache, and can be atomically moved into the cache.

Is this just a performance concern, or... ?

finite perch
#

Although performance should be improved in those cases also

timid stag
#

... shutil.copy2 doesn't lock the file it's writing to? o_O

#

... anyway I notice there's new debug output, too
old:

Γ— Encountered error while generating package metadata.
╰─> See above for output.

new:

Γ— Encountered error while generating package metadata.
╰─> from file:///path/to/<name>-<version>.tar.gz

it's a little strange to see a file URI (complete with percent-encoding), though; would it actually be possible that the source package isn't stored locally?

hidden flame
#

There are other places where you'll see a file URI.

#

Like this error message.

#

And yes, pip can fetch source distributions from a remote URL. I'm not sure whether pip would present the URI of the downloaded copy or the remote URL, but there's nothing requiring a local source distribution.

finite perch
# timid stag ... shutil.copy2 doesn't lock the file it's writing to? o_O

I don't remember the details, but back in the day when I used to do a lot of copying across Windows and/or network drives I had some utility functions, and they always ensured that first the file was written to a temporary directory somewhere on the same drive, with the same filename and permissions it was intended to have in the final location, and then atomically moved after that. This approach had the least number of edge cases in my experience.

timid stag
#

interesting
I guess the easiest way to be sure of "on the same drive" is to make a temp folder inside the destination folder, then move up a level...

#

anyway I guess that won't cause a problem for my own project but I'll keep it in mind

finite perch
#

I am a little worried about edge cases from that change, but I don't have a concrete example to draw from so I decided not to block it.

#

If there are we'll likely see early next week when people's CIs start running all sorts of weird configurations

hidden flame
#

We're mostly done with the deprecations, so this doesn't matter anymore, but the standard deprecation message is possibly a bit too crude.

finite perch
#

I've seen that a few times, language quality could definitely be improved

hidden flame
#

I wonder if we could implement a lazy importer for pip's own use. Regular lazy imports are unsafe, but if we use a custom importlib finder/loader that scans and reads all of pip's source code on startup, but defers the actual module execution until use, that should be safe?

finite perch
#

The cost should then only be incurred right before an install

hidden flame
#

I'm honestly pessimistic about the lazy import proposal.

#

I guess match case did get approved, so anything is possible, but I'm not planning on it.

finite perch
#

Yeah, I'm not doing any work yet, I'm not too pessimistic, there's a lot of pressure, there are multiple forks of CPython now because of no good support here

hidden flame
#

I was going to say that I'd like to also maintain lazy imports even during install, but that would be unsafe with our vendored dependencies. Even with a custom finder/loader as described above, our vendored dependencies could load/exec other Python code w/o import on module exec.

#

That's bad practice, but I also don't feel like assuming that our vendored dependencies are doing the right thing.

#

I'll experiment with a custom loader when I get some time. I'm curious to see if lazy imports are even helpful for our codebase. It's hard to optimise startup time nowadays since a huge portion from our dependencies, especially rich + packaging.

finite perch
#

My assumption is that things like pip --help could be optimized and everything else would basically have negligible impact on overall experience

#

I'm surprised that no one is complaining about pip 25.3 yet, I did see a few projects have pinned to 25.2, but none of them seemed to mind and understood that one of their dependencies was very legacy

hidden flame
#

I'm surprised too. It's a Monday and no one has raised an issue complaining their $very-important-work-ci is blowing up.

timid stag
#

not specifically for pip, mind, but pip with its current architecture seems like a pretty good test case

polar nova
finite perch
timid stag
#

I wonder what it would actually take to change that.

#

(for that matter, there are probably all kinds of places where pip is repeatedly running to re-install the same stuff into new containers, without a cache, rather than having those packages built into the container image....)

finite perch
timid stag
#

it's my best working theory, too.

#

and that seems all kinds of wasteful, and unfair to Fastly

#

and I don't know of a good way to track down major culprits, either.

hazy glen
#

is there any syntax for specifying extras with a plain URI in a requirements.txt file?

#

or does it require the standard foo[extra] @ https://something/something format?

shy echo
#

The standard one, yea.

hazy glen
#

I checked and with a filesystem path it stops parsing the path at [ and starts parsing an extras list

#

context: I'm rewriting the requirements.txt parser for PyCharm

#

and it's not clear what things the pip parser actually supports

dapper laurel
#

I think packaging/requirements is pretty much a reference implementation (pip has its own for legacy reasons)

hazy glen
#

I've been looking at that but it doesn't really help me with this particular question

dapper laurel
#

as Pradyun said, pip supports the standard syntax

hazy glen
#

yes, but it also supports other things like just giving a path or a URI

#

and I found that support was a bit inconsistent as plain URIs cannot have extras, but plain filesystem paths can

dapper laurel
#

I'd say going with the standard specified in the link I said would be the best path

hazy glen
#

how will that help me with writing a requirements.txt parser?

#

I need to handle everything pip supports

dapper laurel
#

if pip is doing anything special that is not in there, I am sure sooner or later it will be removed in favour of the standard spec

dapper laurel
#

and that is what your original question was about

hazy glen
#

and aside from pep 508 specifiers it also supports plenty of other things

#

like pip options, plain filesystem paths and plain URIs

#

my question related to whether these last two oddities support things like extras (and environment markers)

hidden flame
#

Let me look.

hazy glen
#

from my testing it looks like plain URIs do not support extras but they do support environment markers

#

so if the parser encounters a semicolon it stops parsing the URI

hidden flame
hazy glen
#

yeah which is why I need to support them properly

#

I just need to figure out what exactly is supported by pip

hidden flame
#

The answer is that it changes. pip 25.3 actually just changed the parser again to support Direct URL editables, but that is a standard syntax.

#

Sorry, I am a bit busy atm so it may take a bit to dig through the code.

hazy glen
#

yeah I'm fine with the standard syntax, that's pretty well documented

hidden flame
#

@hazy glen Actually, do you mind I get back to you in an hour or so?

hazy glen
#

sure, np

shy echo
#

@hazy glen I'm guessing you've seen https://pip.pypa.io/en/stable/reference/requirements-file-format/ by now? I'm erring on the side of overcommunicating -- that was written a little while back, when I was trying to decouple the documentation -- part of which ended up being trying to make it easier to figure out what exactly this format was.

hazy glen
hidden flame
hazy glen
#

the tests might be more helpful to me

#

I just need to know where to look

hidden flame
#

There are three entrypoints (generally):

  • install_req_from_line which is used in most cases which allows for pip's custom syntax
  • install_req_from_req_string used for where standard compliance is expected/required (it delegates to packaging's parser)
  • install_req_from_editable for editables. There is are two flows in this function, one for standard syntax and one for pip's extensions.
#

I'll take a look at the tests for those.

hidden flame
hazy glen
#

okay, so the extras can also be defined in #egg=...

hidden flame
#

Ctrl-F for the the function names I mentioned above can narrow down your search for examples (you can also try parse_editable and parse_req)

hidden flame
#

I will note that the plan is to phase out egg fragments. I'm not sure when we'll get around to doing that, but the egg fragment can be wholly replaced by the standard Direct URL syntax nowadays.

#

Editable VCS requirements used to need the egg fragment syntax to request extras, but that was fixed in pip 25.3.

#
    if is_url(name):
        link = Link(name)
    else:
        p, extras_as_string = _strip_extras(path)
        url = _get_url_from_path(p, name)
        if url is not None:
            link = Link(url)

This seems to imply that pip will accept extras for a bare URI, although I can't find a test case for that.

#

Correction: bare filesystem path, URIs don't actually accept extras due to that is_url check (well directly, anyway, egg fragments are a thing).

#

I'm going to stop there. Let me know if you need anything else @hazy glen.

hazy glen
#

I already tested and bare filesystem paths do support extras

misty vault
timid stag
#

although I didn't have this much architecture in mind, nor Rust.

misty vault
#

My bad, somehow didn't see that message. It's a neat idea.

hidden flame
#

Also not going to fly with our redistributors.

#

It may actually be more reliable to grab a page from the video game industry and create an atlas and use a custom finder/importer that pulls from said atlas. (Unless that is the idea?)

misty vault
misty vault
ripe shoal
#

I think this is roughly the same idea: store all of the data in a single file and pull the data out piecemeal

#

in games an atlas has all of the e.g. texture assets in one giant PNG (or whatever file type) and you load the whole PNG then actually use it by saying "this item has texture in this area of the atlas"

misty vault
#

Ah. Yeah, that’s pretty close.

timid stag
#

... who are major redistributors of pip, aside from ensurepip?

#

Just the linux distros? and they presumably would just reject such a binary blob

#

but what if they ran the (FOSS of course) packing tool themselves?

hidden flame
#

Our autocompletion involves calling a Python script which can be quite slow.

finite perch
#

Oh yeah, that's true

hidden flame
#

Although I haven't looked into whether our autocompletion could be easily optimized. My laptop has always been fast enough so it's really not a concern for me.

finite perch
#

Do you use autocompletion? I've been unable to get any of them working on my machine

hidden flame
#

I do!

#

It helps with the insane amount of flags we have.

finite perch
#

The only thing I wanted to do with auto completion is move the scripts into their own file and run linting on them, because I'm often not even sure if the syntax is valid for the versions of the shells we want to support, but given the recent stalling of multiple auto completion PRs I've held off

hidden flame
#

progress! πŸŽ‰

timid stag
#

(I'm still waiting for the first PR against one of my own projects... πŸ₯Ή )

#

(please don't artificially resolve that.)

past pagoda
#

What if I resolve it sincerely?

dapper laurel
stuck girder
timid stag
#

there are so many of these analysis sites that I only ever know about because someone links them...

hidden flame
#

I use my own :P

#

My goal is hit 90 open PRs by the end of the year. I doubt that'll happen since I'll be busy for a few weeks soon, but hey a maintainer can dream.

finite perch
#

I've just found a subtle bug in how binary/no-binary are passed to the build requirement installer, I'm just going to pretend I didn't see it and keep my fingers cross @hidden flame eventually has the time to do an in-process installer that will remove the bug

hidden flame
#

uhhhhhh

#

I think in-process build deps will remove the bug since I am reusing the package finder.

finite perch
#

Yeah, exactly, the way the arguments are reconstructed to pass to pip isn't right, because the arguments are actually order dependent

hidden flame
#

Oh, amazing.

#

TIL.

#

Timeframe-wise, I'll probably aim to get it ready for review during the holidays.

#

It's been long enough since I've last worked on it that I don't remember what needs to be done, but I think they are:

  • Tweaking the error reporting
  • Ensuring nested builds aren't horribly broken
  • Writing moar tests
#

The logic itself should be solid. Although I do need to fix it post the removal of legacy setup.py builds.

finite perch
#

Yeah, no rush on my end, but I am about to open anew PR that passes more arguments to the subprocess build installer, hence why I noticed this

hidden flame
#

Lovely.

#

I'd like to replace our janky build isolation mechanism with venv (which is something @shy echo was working on ages ago) after I get in-process deps landed, but this may be a case of me biting off more than I can chew.

#

In theory, it shouldn't be that bad since venv is battle tested, but I'm sure there are horrifying fun ways it can go wrong or break someone.

finite perch
#

Yeah, every time I look at venv logic I am totally confused on what is going on, I'll stick to the simple world of package selection and dependency resolvers

hidden flame
#

"simple"

#

I gave a private talk about pip not too long ago and I immediately said "the one thing I am not qualified to talk to y'all about is the dependency resolver in pip."

finite perch
#

Step 1: Take Requirements
Step 2: Resolve (???)
Step 3: Profit

hidden flame
#

Step 2.5: get lost in the math

#

I basically summed the process up as solving a bunch of systems of equations that only get progressively and exponentially more complicated/interwoven.

#

I mean, SAT solvers are a thing and I'm pretty sure used in ecosystems where dependency metadata is available all at once.

#

I guess it's really more boolean logic, but Β―_(ツ)_/Β―

hidden flame
#

(can you tell I know nothing about dependency resolution? πŸ˜€)

finite perch
#

At a super dooper high level pip's dependency resolver (i.e. pip + resolvelib) is quite basic:

  1. Collect user requirements into "all known requirements"
  2. Iterate through requirements in a DFS pattern
  3. For each requirement iterate though each possible candidate and check it satisfies "all known requirements" and it's requirements don't break a previously found candidate
    a. If good "pin" candidate and add it's requirements to all known requirements
    i. If there are unsatisfied requirements repeat from 2
    ii. If all requirements are satisfied SUCCESS
    b. If bad move on to the next candidate for that requirement
    i. If exhausted all candidates for that requirement "backtrack" and find a different candidate to fulfil the parent requirement
    • If tried all parent requirements FAIL
#

Pip and resolvelib aren't doing smart SAT solving things, for example, at step 3, if A is already pinned with "foo>3", and B has a requirement "foo<3", then the resolver will happily pin B and not hit a problem until it tries to pin foo, which might be MUCH later, causing catastrophic backtracking

#

I would like to implement a more intelligent SAT solver, but one of the problems with that is it's not actually always clear whether two requirements are contradictory or not, because the Python version specifier spec is non-trivial (to say the least). I am trying to lay the groundwork for this in packaging, by adding an unsatisfiability check (https://github.com/pypa/packaging/issues/940), I have most the code ready, I will try and raise a PR before the end of the year, but in implementing it I found a bunch of edge cases I've been fixing in packaging first.

shy echo
#

FWIW, incremental SAT solvers that don't need all of the information up front exist, and you can usually hack around non-incremental solvers to use them for incremental solves as well.

#

One approach is basically have variables representing the unknown things, and if the solver needs to set that to true, you fetch that and do another round with the unknown variable replaced with these known quantities, and the solver suggested to start with the assumptions+clauses from the earlier resolve step.

finite perch
hidden flame
#

I can only review a certain number of 700+ LOC PRs πŸ™ƒ

#

--uploaded-prior-to is next on my review queue, but the prerelease/final PR will take some time.

hidden flame
#

I didn't realize that we had a PR for colours in pip help until recently. This is cool. I'd like to clean it up and get it merged.

stuck girder
#

(if pip was using argparse you'd get this for free (in 3.14+))

finite perch
dapper laurel
#

btw I know that optparse was "undeprecated", but is it now accepting PRs etc?

shy echo
#

There's an issue about moving off of optparse that probably has all the context.

stuck girder
shy echo
#

Geez, 8 years!? I... I'm one of the old people now.

stuck girder
#

From a quick scroll, the blocker was dropping Python 3.6. And of course someone doing all the work...

shy echo
#

One of those happened... two years ago.

stuck girder
dapper laurel
#

but now that optparse was "undeprecated", is there really a need to switch?

stuck girder
#

Even when optparse was deprecated, there were no plans to remove it from the stdlob. And I get the feeling from the thread that the former deprecated status wasn't the main reason for a change.

dapper laurel
#

tbh I don't see what the original intention was

hidden flame
#

Out of everything we could work on, the value add seems to be minimal if not zero

finite perch
#

Due to argparse design issues it could actually end up being negative, depending how well a PR was designed and the author understood all the reasons argparse stalled out

#

My main issue with the current design is the options object is statically a black box, I would want any replacement to improve on that. And I'm not sure there are any good options right now, for the scale of options that pip has

cosmic pebble
#

have you played with click? IMO it’s pretty nice

hidden flame
#

I use click myself, but for pip, there isn't much value add from switching CLI libraries. If anything, things may be break due to subtly different parsing behaviours.

dapper laurel
timid stag
#

...isn't click built on top of argparse?

dapper laurel
timid stag
#

Huh. I should look into it more closely.

willow flicker
#

Click breaks stuff too often for me. It's worse if you use Typer, but even plain click breaks stuff in non-major releases. Pip would vendor it, I suppose, so that might not be an issue. I personally like argparse pretty well - it's not elegant but you can provide your own namespace for it to fill. I usually only use click if I want it's chaining context feature.

naive fractal
#

☝️ even as a huge fan of click, I agree that it's too breaky for pip. Plus it does all kinds of stuff with std streams which I don't think it should be doing. I use click preferentially if I want to just have fun, but argparse for stability and having fewer deps.

finite perch
#

Back to looking at call graphs to find performance improvements: https://github.com/pypa/pip/pull/13656

Was hoping to find a new heuristic improvement with recent grpc issues but nothing obvious, so just seeing where I can find redundant work in pip.

finite perch
#

Had an idea this morning that if it pans out could significantly improve long resolving times, probably won't have time to test it out today πŸ˜”

shy echo
#

What's the idea?

finite perch
#

It's a caching strategy for candidate lookups, once you get into long resolutions most the time is currently spent on checking what candidates are available for requirements already previously checked against

shy echo
#

That's a set that'll keep changing, no?

#

Unless it's "list of distributions available" rather than "list of viable options left" that you're referring to here. πŸ˜…

finite perch
#

If "numpy>100" returns no candidates it won't suddenly start to return candidates later, so we don't need to keep checking over and over again if each numpy version matches that requirement or not. But that's what happens because in a pathological resolution it keeps getting stuck on the same requirements over and over again.

dapper laurel
#

cc @thorny crypt (Poetry's solver master)

finite perch
#

I'm confident in the idea because I know pips bottlenecks, I just don't know until I start working on it if it would require a large refactor or if my idea to not explode memory consumption is viable

thorny crypt
#

This will surely speed up some cases significantly. I think such a cache has been introduced in Poetry as part of https://github.com/python-poetry/poetry/pull/5335 (I do not assume that this will help you due to too different code bases but anyway.)

hasty wave
#

So this a bit of an odd question but I am hoping that there are some in here that might have some ideas. Our internal build system where I work has generally encouraged our packages to take on a shape of only ever a single version available. We have started a move towards more idiomatic tooling ie hatch with pip or uv. One of the spaces we are concerned about is resolving conflicting version constraints where I as a consumer consume 2 of these single version available style packages that declare pypi dependencies that could not be resolved easily. An example would be Package Foo declares a requirement on Numpy > 2 and Package Bar declares a dependency on Numpy < 2. I need both Foo and Bar but I dont have an option to select different versions where the constraints would then line up. Are there any solutions available to this problem that do not require changing the Single Version Available of first party packages?

cosmic pebble
#

nope, all your dependencies need to have a consistent resolution; python isn’t like node where different dependencies can have independent chains of dependencies

#

as a person who is partially responsible for numpy 2, sorry!

finite perch
timid stag
hasty wave
hasty wave
finite perch
#

It has been an intentional choice in pip not to support this beyond installing Foo and Bar one at a time, or using --no-deps.

I would be open to the idea of a global override but it's not high enough on my list to implement it myself

timid stag
#

(it does seem like Numpy gets used as an example for this problem disproportionately often. Maybe just because it's easily accessible/understandable as an example, regardless of how often people are actually bit by it)

hasty wave
#

(I do also support a lot of scientists so the chances of them getting bit by something like numpy in this type of case does increase)

finite perch
#

So pip's .pre-commit-config.yaml isn't actually a valid yaml file

#

I was trying to see if we could easily more strictly lint it, or auto format it to be valid without it being ugly, but I dunno, I have a fairly minimal yamlfmt pre-commit working

finite perch
#

Experimented a little with performance improvements tonight, really getting the number of Version objects being created down quite a bit, need to raise a fairly simple PR with packaging for quite a big win.

One of the big issues though is because packaging is a vendored independent library it makes a lot of design choices that are not well suited for pip. We could make the API simpler and much faster if it only served pip, while keeping standards compliance.

I get why it's better for it to be independent, but really sways me to the idea for a truly good tool you want to reduce the number of external components.

small cove
#

i had to fork, patch and re-publish some scientific packages for a project i worked

finite perch
timid stag
#

(I'm interested in your benchmark, btw.)

finite perch
#

It's the first 2500 rounds of Apache Airflow 3.1.3 with the extra all. The full resolution doesn't currently solve on my computer, I left it for over 24 hours and it was still going.

I wasn't able to find a magic fix so I'm starting with a small number of rounds and seeing where the hot spots are. Next I'll increase the number of rounds and see what starts to dominate, and see if I can find new fixes.

hidden flame
#

I finally have no more unread notifications in my github inbox πŸŽ‰

finite perch
finite perch
#

I think I need to give looking at call graphs a break for now, the four PRs I've raised are all quite simple but I spent a long time trying different things and wondering why they weren't working to get to those four, it's a time consuming process of trying something, running a 5 min benchmark, seeing what happened, thinking on it for awhile, trying again. That said, I still have some more hunches for non-trivial improvements, hopefully I can work on them before 26.0.

finite perch
#

Try and get some PRs reviewed next

timid stag
#

what do you use to get call graph diagrams, btw?

finite perch
inland creek
#

i wish i had noticed these optimization efforts for packaging earlier

@willow flicker applied an optimization which the thing i am building; codeflash, could've found automatically

#

this is the optimization codeflash came up with, which looks very similar to what henry came up with

#

i will now run codeflash on the entire packaging repo with the latest main and see wha it can come up with

willow flicker
#

If it helps, you can run a statistical profiler (3.15a2 has one) and then give it the functions that are the slowest

willow flicker
#

Also, I bet it woun't find the atomic/possessive re speedup since models aren't likely to know those were added in 3.11.

inland creek
inland creek
timid stag
#

it's neat that tools are at this level of sophistication now but I still hate the writing style they use in describing the effort

pale epoch
#

i was unable to get to optimality without breaking the API to explicitly parse Version and Specifier instead of automatically coercing. if performance is a goal, i think that will be necessary. it will also likely help to avoid accidentally using string comparison vs version/specifier's comparison operator methods

#

a corresponding optimization is for url strings in the pip codebase itself

finite perch
pale epoch
#

that one is less helpful for performance but much more so for correctness. i was also able to improve url % quoting very slightly but i believe the stdlib needs a change to do that better which i've been working on

pale epoch
#

will keep an eye out for further changes

#

i also did a much bigger set of changes which performs package finding as a distinct phase before resolution. think this is a great architecture and potentially worth some sort of PEP to represent the normalized result of package finding so that resolvers can be swapped out etc. mostly because package finding is the one place where a compiled language becomes legitimately useful vs python alone to parse the simple repository API

#

do wish it were possible to get signoff on changes that have undergone multiple rounds of review over years and have become standard practice in other python tools like the fast-deps impl. pip was the first to support zip file metadata fetching after my prototype but after a version with poor performance was merged all my attempts to contribute one with the perf i demonstrated have been rebuffed. astral now claims to have invented the technique.

#

glad to see more perf work is happening and i would contribute to it if i expected it to ever be accepted

inland creek
#

@finite perch do you have any sort of end to end benchmark i can use as a baseline for my optimization efforts?

azure heron
#

it has warm / cold resolves and installs

azure heron
finite perch
# inland creek <@511013383101743110> do you have any sort of end to end benchmark i can use as ...

I don't have anything formal for the stuff you're looking at, I suspect. All my recent stuff has been by hand

I have this but it's very specifically measuring properties of the resolver that aren't related to wall clock time: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks. Like how many packages it had to visit and how many rounds it took, it's only useful for when checking the behavior of the resolver itself.

hidden flame
#

We used to have people who were experts in those segments of the codebase and could review large changes targetting those segments, but now there is essentially zero review capacity.

#

Also, additional complexity for performance wins is not as a clear cut choice as it may seem. I had my own idea of parallelizing the pyc compilation process fail because it wasn't clear whether the complexity was worth the performance uplift.

#

I recognise that your PRs are (I think?) moreso refactorings, but then again, the problem of how do we review such large changes comes up again.

#

It's a difficult problem. I realize that as a maintainer, I'm in a privileged position to accept/deny changes and also know what is likely to be merged or not.

naive fractal
#

☝️ something which is only in my head so far, but which I hope turns out to be feasible and helps pip maintainers with the review load, is that I'd like to evolve pip-tools to one day not cross the private API boundary. I worry that if that doesn't happen, sooner or later it will become unmaintainable. But to make that change, I need to build and get buy-in on proposed new component libraries to vendor, as replacements for things that are currently inside of pip. I don't even know enough at this stage to know if it's feasible, so I'm probably getting ahead of myself even mentioning it aloud. So far I just have toy projects to start messing around with this idea. But I think if there were some kind of generic "dependency finder", for example, with two tools consuming it, that would make some things easier. (It also adds challenges, of course.)

dapper laurel
timid stag
#

(wait, what's unearth?)

dapper laurel
timid stag
#

ooh

dapper laurel
#

Readme says it all

timid stag
#

also my plan is to buckle down and get the first unit of work out for PAPER in December (basically, full polish on the actual "install single wheel" part, and the self-install zipapp, and the associated API)

#

and I'm hoping I'll be able to provide a base that does accommodate more significant changes, the sort of thing that works when the field is greener

dapper laurel
#

full polish on the actual "install single wheel" part
installer code as a prose? (is CaaP even a thing? 🀣 )

timid stag
#

I lost several months this year to my own struggles :/

#

I'm not actually using installer because it no longer seemed helpful when I considered the other things I wanted to fit into it (but also just due to my architectural taste)

#

I might come to regret that, idk

naive fractal
# dapper laurel `unearth`, `build`, `resolvelib` and `installer` are basically all you need to b...

Yeah, build and resolvelib are exactly the sorts of things I had in mind. I wasn't aware of unearth (I don't think pip uses/vendors that?). The goal I have in mind is to get pip-tools onto a more sustainable architecture. But I also think that there's some pathway to "do it right" that also delivers benefit to pip. TBH I don't think I have enough knowledge yet to work on it -- I still have way too many blind spots within pip when I go source diving.

finite perch
#

No pip doesn't vendor unearth, it'd probably be a big project to migrate, if so it would require a maintainer to spearhead it

#

As Richard was saying, with the amount of review capacity pip has anything other than a maintainer PR needs to be relatively small and/or simple. Even for maintainers it would be difficult to have capacity to do a large refactor.

finite perch
#

That's not to say non maintainers couldn't effect a large change in pip, I just think the most likely way to do it would be to start small without large refactoring and keep building on it and perhaps become a maintainer in the process

hidden flame
#

Yes, pip maintainers stop being directly responsible for it, but now we're at the whim of the maintainers of those dependencies. This has typically worked out fine in practice since there is overlap between the maintainers of pip and such projects, but we can't really have pip maintainers take on more sub-projects.

hidden flame
naive fractal
# hidden flame Yes, pip maintainers stop being directly responsible for it, but now we're at th...

This is exactly why I've been harboring these thoughts mostly in secret: my worry is that it's a bad idea to ask pip maintainers to help split things up into more sub-projects. (Especially since I haven't come back to try to help in months, despite my best intentions.) At the same time, build feels like a great success in reducing complexity for pip and making it possible to share out the maintenance workload. ... Maybe this is all just wishful thinking. πŸ˜…

hidden flame
#

We don't even use build.

#

We use pyproject-hooks to manage our PEP 517/660 interfaces.

finite perch
#

Fun fact, build uses pip

naive fractal
#

Oh, of course. Because you need pip to install the build backend 🀦
Now I feel silly. Thanks for educating me though!

#

Actually, now I'm really curious how the installation path for a build backend works out inside of pip. I think I ought to read this a bit on my own; it will be good for me.

timid stag
#

(you can choose to have build use uv now. But that selection is hard-coded. If you want to use some other custom installer it seems you'll have to setup the environment yourself first and use -n. At which point maybe you only need pyproject-hooks anyway.)

timid stag
#

@hidden flame re the #general discussion I appreciate the thought but to the extent that pip is involved in the problem (in turn, to the extent that I understood it!) I suspect the problems are too fundamental to pip's design, and to its default status (the ensurepip bootstrap etc.)

#

because ultimately, the distro is providing pip with a site-packages folder (assuming sysconfig even works properly) and as I see it, the only sensible way for a package manager to work around that is to make venvs itself

#

I do kinda regret saying anything in there, though. I hoped I could contribute something useful but it's just getting too heated now

timid stag
#

and re the pip zipapp, just for illustration

$ time ./pip.pyz --version > /dev/null 

real    0m0.705s
user    0m0.668s
sys    0m0.037s
$ time pip --version > /dev/null 

real    0m0.200s
user    0m0.184s
sys    0m0.016s
hidden flame
#

It's tempting to just close all AI issues/PRs outright.

ashen geyser
#

What's stopping you?

hidden flame
#

Some of them seem to be worth something although Damian is the only one willing to put up with them.

#

I'm not :)

hidden flame
#

Hmm, my inprocess build deps branch is now error with the latest main. It seems like the removal of legacy builds broke the preparer.\

#

Nevermind, the Avoid pip install --dry-run downloading full wheels commit changed the contract with preparation.

timid stag
#

it was previously pointing out something about how --dry-run will build sdists, yes? I saw that and thought "oh, another duplicate"

#

that's been an issue for basically the entire history of pip afaict

hidden flame
#

Yes.

timid stag
#

(Ah I was thinking of pip download actually... as in issues 7995, 1884 etc)

finite perch
hidden flame
#

Yea, I agree that we've all been there. πŸ‘ On this front, you're more reasonable than me.

naive fractal
#

It also depends on how junky the issue is, in my experience. Someone showed up on one of my projects, opened a PR, which failed linting, and then opened a slop issue to say "linting rejects this code". πŸ˜‚

finite perch
#

Yeah, it's on a case by case basis, but that is hilarious

naive fractal
#

Does pip have a stated policy? I keep putting it off but I need to write one for pip-tools

finite perch
naive fractal
#

I like how succinct that is. I probably won't be able to keep myself that brief on it, but that's great. Thanks for sharing the pointer

finite perch
#

Well, as with everything in pip there was a bit of back and forth on exact wording, but really thanks to Richard for sitting down and writing the final PR wording

hidden flame
#

sigh, I wish requirement files did not support CLIs flags

finite perch
#

I was shocked when I found that out

hidden flame
#

or at least, I very much hope that constraint files do not support CLI flags because I'm not supporting that

finite perch
#

But then slowly I found there are entire ecosystems built on it

#

Largely hidden from public consumption

hidden flame
#

And inprocess build dependencies and build constraints are now working together πŸŽ‰

ripe shoal
hidden flame
#

I guess. We'll find out if it's a problem at some point I suppose.

hidden flame
finite perch
#

We can't remove CLI flags from constraint files, they're too widely used

#

People don't use --hash in constraints files because it's broken and doesn't work

hidden flame
finite perch
#

Yeah

finite perch
hidden flame
#

Hell yeah, I like how these tests are looking with some of the new test helpers.

timid stag
#

ooh, sophisticated.

hidden flame
#

I spent a bunch of time on rewriting tests to stop depending on the network. For my personal sanity, it needs to be easier to write tests that "do the right thing" (tm) so we don't regress.

#

@shy echo while you're here, do you have any opinions/notes on switching pip to use venv for its build isolation? I'm looking to pick up your old PR after in-process build dependencies (hopefully) land.

I did at-mention you on the PR, but I understand your GH notifications are a dumpsterfire

shy echo
#

It was 4am when I got that mention. πŸ˜…

#

It's disabled by default, no?

#

It might actually make sense to enable it on 3.15+ so that we at least have a 5 year deadline on that front and to make sure that it's a non breaking change for people.

Idk if we should change the default on older Python versions - that'll depend on what portion of package builds would be broken by the change and I don't have a sense of what that looks like nowadays.

#

A good check might be to see if anything in the top N packages breaks with that change?

#

Back when I was looking into this, we didn't have uv or even build fully stabilised -- the odds of things being broken in some weird way are definitely lower now.

#

goes back to preparing for family staying over the holidays

azure heron
#

I don't think anything in the top N packages should break with changing to venvs

#

We use real virtual environments in uv and have never had any problems

#

We're seen more weird problems with pip's approach πŸ˜„

finite perch
#

I think the main problem for pip is going to be the performance cost

shy echo
#

Wait, perf cost? How so?

finite perch
#

venv is slow

#

Like, really slow in some situaitons

shy echo
#

Huh, I thought all it did was dump a couple of files on disk. Is the stdlib venv doing more stuff still?

#

I've very much out of the loop TBH, so I'd absolutely not be surprised if what I'm remembering is completely wrong/outdated.

finite perch
#

No idea, never investigated it, I just know python -m venv .venv can be noticably slow, like sometimes a couple of seconds on Windows

azure heron
#

You could use our rust impl πŸ˜‰

finite perch
#

Not unless it was integrated into the standard library, I'm quite supportive of the pure Python only stance

dapper laurel
#

Making venv without pip (and Setuptools in older versions) is pretty fast

finite perch
#

Oh I see, well, we wouldn't need either of those

dapper laurel
azure heron
#

We could, but I'm not sure to what end

hidden flame
#

Wait, removing PYTHONDONTWRITEBYTECODE=1 from our test isolation code is actually resulting in faster test times. Are we not precompiling pip's bytecode before running tests?

finite perch
#

I believe it

hidden flame
#

Oh wow, letting bytecode be written shaved a full minute from a full test suite run locally (3:45 -> 2:45) with only two failing tests.

#

We spawn a lot of pip subprocesses in our test suite, if Python has to generate bytecode every time, that would explain why our pip startup times are abysmal in CI.

hidden flame
#

Also PYTHONDONTWRITEBYTECODE=1 is set in the pip script runner, so this only takes effect after test setup (I think?)

hidden flame
#

Hmm, actually, having recorded filesystem events. I suspect that we aren't compiling the common wheels we're linking into every test environment. This is bad because coverage and pytest_subket run on Python startup.

#

It may also be beneficial to also not inject coverage unless we're actually collecting coverage.

finite perch
#

We are not collecting coverage, it's still configured to use setuptools as our build backend. I've mused about fixing it, but I don't have the energy to increase test coverage, so didn't seem worthwhile

timid stag
#

"real virtual environment" is an interesting turn of phrase

timid stag
#

(and I do mention it briefly in my pipx tricks post, but the fun part is why the seeding takes so long)
(and actually I can probably give more detail than I originally had if I think about how to collect more evidence)

pale epoch
# timid stag "real virtual environment" is an interesting turn of phrase

the pex project has been naming and solving these problems for years. when pants runs tests it generates a pex with all the deps and then a pex with all the sources and composes them via PEX_PATH. this retains eager bytecode compilation in the cached subtasks and is much more friendly to the OS filesystem cache

#

not at all picking on youβ€”i am trying to say that i do very much want to see your blog post on this topic

#

one of the reasons i started contributing to pip so much was because i very specifically wanted to try to move more of our work outside of pants itself in order to benefit the general python community

timid stag
#

ah well that's part of why I deferred it

#

there is a lot of material about venv creation being slow, but the original article was about why pip is bootstrapped into venvs in the first place

#

so I also ended up reaching into topics like the shortcomings of --python

#

ultimately it felt like too many things to explain in one place; I want to have separate detailed pieces and then something that's a summary with references

pale epoch
#

and the specific goal i had in mind while hammering away at pip for so many years (especially through install --report) was to enable pex to provide extremely powerful and truly instantaneous venv creation

hidden flame
#

... well hopefully the shortcoming of --python will stop being a thing at some point

#

maybe in 2030 πŸ™ƒ

pale epoch
timid stag
#

heh

pale epoch
timid stag
#

but yeah these things are why I didn't throw in to contribute to pip, and sorry if I come across as trashing the project and not being helpful but the ideas I have just make more sense in a greenfield project

hidden flame
#

I have some plans to make some tangible progress on big ticket items next year but there's nothing firm and I can't promise anything.

timid stag
#

I do greatly appreciate that effort, especially since I recognize the legacy commitments impeding you

timid stag
hidden flame
#

It is so much easier to start from scratch and build new tooling than write and maintain for years.

#

pip is old.

pale epoch
timid stag
#

yes and I had different limitations of --python in mind

pale epoch
timid stag
#

I'm not actually very familiar with pants but the pex concept is pretty neat

pale epoch
#

pip is actually really good about PEP conformance!

pale epoch
#

but pex is extremely powerful

#

the maintainer john sirois is great to work with

#

one issue i have with uv is that unlike pip or especially pex its little venvs aren't intended to be something other tools can then consume

pale epoch
#

also they don't use the normal rust zip crate for some reason

hidden flame
pale epoch
# hidden flame I'm not surprised!

it took me like three weeks of intense effort but the result was immense. i think the result of package finding as retrieved from the simple repo API should actually be its own PEP file format

hidden flame
#

Uhh, I find it hard to immediately see how such a thing would be standardized?

timid stag
pale epoch
#

well my initial thinking was not pip but cargo, and particularly establishing a protocol like pants v1 had for build tasks, where the results of all kinds of build tasks can be audited and transferred across systems. we would frequently be able to provide the .pants.d/ directory as a shared volume to docker builds

#

we have a specific format like this for what pypi provides over HTTP, but that's not at all the same thing as how to transfer the result of parsing that json or html to another python packaging tool

#

#12257 and #12258 (linked in https://github.com/pypa/pip/issues/12921) demonstrate exactly how you could make a standardized protocol for that. it's a huge amount of repeated work

GitHub

The Python package installer. Contribute to pypa/pip development by creating an account on GitHub.

#

i also want pypi to let me limit entries to specific time ranges

pale epoch
#

it's just not engineered to serve that purpose is all

timid stag
#

because the tool is oriented towards managing everything itself? or because of something specific to what it puts in the environment

pale epoch
pale epoch
#

using a venv was proposed multiple times throughout the development of this feature https://github.com/pantsbuild/pants/pull/8793 but a venv is very specifically about the act of making an environment for a python interpreter imo

GitHub

Problem
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of differen...

#

it's a tool-to-user protocol rather than tool-to-tool

timid stag
#

(and also it works better for the initial state I want to set up, including a seeded cache)

pale epoch
#

this PR is unfortunately my first attempt at describing package finding, then filtering, then resolving as distinct phases of the pip invocation https://github.com/pypa/pip/pull/12258 my current pip fork does more than just tricks to achieve caching of pypi simple api responses like here, but completely separates the implementation of package finding from anything specific to the current invocation

#

cc @hidden flame this is how i devised a long-term fix to the scourge of the --python

timid stag
#

... is your package finding separate to the extent that you could make a separate library wheel for it, because I could probably use that

pale epoch
#

yes!!

#

i also made a separate fork of the packaging crate because it has absolutely no distinction between parsed and unparsed objects like Version or SpecifierSet at at the type level so you have to check for errors that were already validated earlier and in particular checking if a Version satisfies a SpecifierSet is fucking cubic due to pathological (not like an edge case, it's just that the impl is pathological) round trip parsing performed upon every single inner comparison

pale epoch
timid stag
#

... simple API responses are large? even for a specific package?

pale epoch
#

the simple API is super overloaded and offers absolutely no form of filtering

#

the unavoidable act of json parsing itself is a bottleneck especially until free threading is complete

#

obv you can multiprocess but at this point the ability to tightly control bits and bytes retrieved from a streaming network response makes a dedicated tool in rust or c++ more viable

#

rust has bootstrapping issues which i'm hoping to work with a rustc team member to radically improve in the next several months

#

it turns out parsing is a bottleneck in some other places too. my pip fork achieves most parsing speedup by making it a hard runtime error whenever a string is received when a parsed representation was expected

#

but url quoting and unquoting in cpython relies upon a pretty grotesque uncommented hack that works because it triggers a memchr() call

#

i have a cpython dev branch that tests for sse/avx, then for avx2 support on the host, and would provide very basic methods for searching a single literal byte sequence or finding match positions for a set of bytes across a string or a stream

timid stag
#

... hold on, we're talking about optimizing things like str.__contains__ behind the scenes now? o_O

pale epoch
#

it turns out btw that cpython reimplements this pattern (find specific byte(s) to quote or remove within a buffer) several times across the repo for encoding and decoding tasks

timid stag
#

(oh, str.replace then?)

pale epoch
pale epoch
#

the cpython re interface for the _sre C string search impl is very effective at being robust over decades of python development. in this way it is deeply similar to the emacs regex engine

timid stag
#

interesting

pale epoch
#

but it absolutely does not attempt to optimize too hard

timid stag
#

(I've long wanted a python-level way to manipulate the underlying DFAs or whatever beyond regex syntax)

pale epoch
#

emacs has rx to do that with sexps. i think it's outrageous that regex engines force me to serialize the AST by hand before processing it

#

this is who i rly rly wanna do a phd under https://jamiejennings.com/posts/2021-09-23-dont-look-back-2/

A \10 in a regex is sometimes a back reference to capture group 10, but not necessarily. It could be an octal literal, depending on if there are 10 or more capture groups.

#

the result of this work would be less about the SIMD instructions and more about an interface for composable search/match/replace operations instead of needing to create bespoke C impls for every new type of string parsing

#

but for package finding itself, i think there is a really strong case for protocolizing that in a json format

#

https://docs.astral.sh/uv/reference/internals/resolver/

astral has this page (their only "internals" doc lol):

finding a set of version to install from a given set of requirements, is equivalent to the SAT problem and thereby NP-complete

this is extremely widely believed in the industry but is actually false. you can transform it to an instance of a SAT problem, just like every other problem in NP, but the converse reduction is extremely tricky. there's also a closed-world assumption being made that is not actually a feature of the problem itself but of the aforementioned transformation into boolean SAT

#

this part is good to hear someone else say though

The slowest part of resolution in uv is loading package and version metadata, even if it's cached.

(someday i will shame astral into crediting me for all their caching techniques which charlie marsh shamelessly copy/pasted from the very detailed explanations and implementations i provided to pip over the course of several years, after not hiring me then ghosting me when i tried to keep in touch)

For most resolutions, the resolver doesn't need to backtrack, picking versions iteratively is sufficient. If there are version preferences from a previous resolution, barely any work needs to be done.

making use of cached results like this means you risk choosing different versions nondeterministically depending upon the state of the local cache, which is radioactively dangerous in every way

that's why protocolizing intermediate phases of resolution is important, so you can achieve strong guarantees (e.g. that no new versions of any relevant package have been published since the last resolve) which enables very powerful inferences like this

#

some reasons we currently run into issues with simple repository API performance:
(1) the API https://packaging.python.org/en/latest/specifications/simple-repository-api goes into great depth on Accept header negotiation, but there is currently no requirement to respect any form of Cache-Control. fastly currently does this really cute thing where they won't return a simple 304 if it was in fact cached but had to bounce across more than one server for reasons which employ load-bearing internal jargon

#

(2) no form of filtering is supported at all. pip's package finding (except in my fork) does client-side filtering by matching python while streaming the results because there are quite a lot of them. this is a significant bandwidth cost that could be improved in particular if we could support time range queries (or even just one-sided ranges like "everything since this utc timestamp"). pypi currently unofficially supports Changed-Since with Cache-Control, but that only produces the boolean 304 (short) vs 200 (long) cases

#

i suspect this is something astral supports in their proprietary repository offering

pale epoch
#

i don't think filtering on python version or impl is anywhere near as much of a slam dunk, especially because
(a) it would be much harder for pypi to implement, and they already took almost two years to support PEP 658 after it was accepted. that PEP was specifically intended to address FUD raised about the http range request technique i invented, which relies upon HTTP standards that pypi also suddenly dropped support for and explicitly refused to engage with me about.
(b) it's not a simple database query that corresponds to a canonical totally-ordered timeline so much more work would be necessary to agree on how to serialize a general query over the space of python compatibility.

in general, i want pypi to declare the specific set of HTTP standards it implementsβ€”there is NO!!! specification for what pypi has to support or what it requires for HTTP requests of e.g. specific wheels and that's just very unserious.

pip implements a workaround for the lack of a separate metadata API => pypi breaks negative byte range requests and everyone i ping says they can't comment on that and furthermore indicates that pypi can't be expected to support any particular http standard
a pip maintainer takes this at face value and makes a PEP exposing a separate endpoint for metadata => pypi drags their feet and again no one can comment and i am told this is because they have no engineering headcount for things like implementing accepted PEPs we specifically created to circumvent the thing pypi's refusal to entertain any discussion about supporting standard HTTP features

pale epoch
#

but pypi does have a fully funded engineering position posting pdfs like this https://alpha-omega.dev/wp-content/uploads/sites/22/2025/10/ao_wp_102725a.pdf which specifically describes the existence of nonconformant implementations of the zip spec as if astral intentionally choosing to disregard the zip spec is anyone else's problem but astral's

ZIP even supports deleting files within an archive by rewriting the Central Directory to remove the reference to a Local File.

no it fucking doesn't!!! it absolutely fucking doesn't!!! the actual spec https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

4.3.2 Each file placed into a ZIP file MUST be preceded by a "local file header" record for that file. Each "local file header" MUST be accompanied by a corresponding "central directory header" record within the central directory section of the ZIP file.

the CVE assigned to uv was specifically regarding uv failing to correctly implement the zip file specification. the wheel format is explicitly specified to require conformance to the zip file specification. this would have been immediately caught by fuzzing, but uv uses a mysterious rust crate that doesn't fuzz anything at all, and not the rust crate named zip, which i have contributed a great degree of performance and security improvements to and is by far the most trustworthy implementation.

The implementation with the fewest differentials was
β€œronomon/zip,” which is implemented explicitly to reject many sources of differentials at the expense of compatibility with all ZIP archives.

the zip crate also rejects "differentials" which are explicitly forbidden by the ZIP spec. it does not lose compatibility with any ZIP archives, because these differentials arise from individual implementations that don't employ industry-standard protections like compiler-enforced memory safety and several forms of randomized testing i.e. fuzzing.

pale epoch
#

Packaging ecosystems like Python’s often have many tools with different organizational structures and roadmaps, and therefore cannot easily coordinate on ecosystem-wide challenges like archive format features.

translation:

  • pypi stonewalls any attempt from pip developers to establish a point of contact to discuss features we need
  • pypi will suddenly drop support for http standards like negative byte ranges which are actively used in pip to achieve its own performance goals, and continues to stonewall an increasingly public sequence of attempts to establish a point of contact regarding pip's expectations about archive formats
  • a PEP which was explicitly designed to circumvent reliance upon specifics of any archive format mysteriously takes years to go live, with no corresponding announcement or indeed any explicit guarantee of support for the accepted PEP

This is especially true when the issue arises from an ambiguity in a packaging standard, which can take months to fix and often requires public correspondence, meaning users are left to fend for themselves from exploits while standards are fixed.

there is absolutely zero ambiguity in the packaging standards. there was also zero ambiguity in PEP 658, but pypi did indeed force users to fend for themselves from exploits during that time.

Applying protections at the public package repository level would mean that would-be attackers can’t exploit public information about a vulnerability at scale;

hence PEP 658, which circumvents this issue entirely yet pypi still doesn't actually guarantee support for and actively ignores the existence of

pale epoch
#

However, applying protections like rejecting archives from a large public package repository like PyPI is not without challenge.

rejecting uploads is the only solution pypi will accept, and they demonstrate active contempt for the entire PEP process.

This was done after noticing that, despite the wheel package specification requiring installers to check the contents of the archive with the filenames listed in β€œRECORD”, no popular installer was taking this step.

that's because this is actively false in two separate ways: https://packaging.python.org/en/latest/specifications/binary-distribution-format/

Although a specialized installer is recommended, a wheel file may be installed by simply unpacking into site-packages with the standard β€˜unzip’ tool

so this has now escalated beyond simply ignoring the existence of accepted PEPs like 658, but to unambiguously fabricate "requirements" and accuse installers of nonconformance. but there's more:

Update distribution-1.0.dist-info/RECORD with the installed paths.

so apparently, correctly installing a wheel file actually requires the installer to completely ignore the contents of RECORD and accept whatever the zip file contains

#

seth larson produced a document with multiple outright falsehoods regarding spec conformance and ignored the PEP which expressly avoids this issue that pypi still refuses to guarantee support for. i don't understand what the point of the PEP process is or why pypi ignores anything pip needs but will shift into crisis mode to uncritically repeat anything william woodruff says while ignoring the code i implemented in pip and rust zip to solve these precise problems.

pale epoch
#

anyway, after seeing pypi stonewall any attempt to codify support for http standards, suddenly break support for negative range requests that pip used to avoid any archive confusion, then slow-walk PEP 658, then claim that nonconforming zip implementations which don't even do fuzz testing make zip impossible to secure, then repeatedly state unambiguous falsehoods about the requirements for spec conformance in an official security update, asking pypi to support something like a time range seems like a waste of my time

#

the other interesting thing about supporting time interval queries is that it allows for checksum-based verification (e.g. via merkle tree) that a simple repository API hasn't retroactively modified its response at all. this is also something that a protocol for parsed package versions from a simple API response would achieve, enabling users to perform complex queries with entirely local information. this is what we would need in order to do the approach astral's resolver documentation describes, which reuses a previous output

pale epoch
#

i was really disappointed about the response to the telemetry too. making telemetry a json format and then making it a single unambiguous override in pip (which also makes opt-out a special case of an empty override) is clearly better for data quality and unifies that data across alternate clients such as uv. the open-source user-agent parsing code is described as poorly maintained and has a very complex optimization system. i cannot identify any engineering-based rationale against standardizing this information, and particularly in the lack of any opt-out.

in https://discuss.python.org/t/pre-pep-user-agent-schema-for-http-requests-against-remote-package-indices/104006 i described a very specific scenario i have performed and which occurs in every corporate package index: downloading a copy of some package in order to move it across a security boundary into our trusted environment. this is where i saw a full opt-out being irreplaceable. this was simply ignored

pale epoch
#

i have wasted multiple years of my life repeatedly getting to final-stage interviews and then getting rejected. during and after my unfortunately brief tenure at LLNL, i realized that everything i'd worked on at twitter could be generalized to make python packaging awesome. i wasted so much time assuming good faith, only to find people actively misrepresent my ideas without my consent:

  • fast-deps assigned to a GSoC student and shipped to prod without consulting me, then smeared as "broken", while my implementation is simply never accepted
  • astral claims to have invented http range requests on zip files after rejecting me. no one corrects them or cares that charlie marsh's VC investment is predicated upon a misrepresentation
  • pep 751 just ignores pip install --report

it's been such an actively contemptuous waste of my time and energy to contribute to a community who will concern troll me, stonewall any attempt to escalate, then just take credit for what i invented

#

i respect william woodruff's pride in his work and his patience with the PEP process. seth larsen just writes easily-disproved falsehoods in official documents and says out loud that having public discussions is problematic. that's worthy of a monty python skit

#

i hope my experience may be useful to others. thanks tzu-ping chung and daniel holth for your work and for your reviews and i hope you can achieve more here but i think pip and pypi and the PEP process are deeply compromised.

#

the credit stealing hurt the most

finite perch
# pale epoch i was really disappointed about the response to the telemetry too. making teleme...

I can only speak for myself, but personally I found this proposal to be tackling too much at once.

I would love to discuss a dedicated proposal for a telemetry spec, but how a tool overrides telemetry data I think should be left up to tools and not specified in that specification.

I think this makes there be way more points to discuss and cross discussing them multiplies the number of things that can be discussed, as now not only do we have to agree on a good spec proposal but also all tools have to agree to the prescribed behavior.

Maybe I get too overwhelmed too quickly, but I largely sat out that discussions because of this.

azure heron
#

The personal attacks here seem quite inappropriate.

cosmic pebble
hidden flame
#

I've locked this channel for the time being since the topic of conversation has veered heavily into off-topic category.

finite perch
#

Threads?

hidden flame
#

I've pre-written most of the 26.0 release post. This way, I have a post ready to go for pip 26.0 even though I am going to have basically no free time near the release date.

hidden flame
#

Hmm, 6% of our total test runtime still consist of one massively parametrized keyring test. There is some caching/environment reusing that could be added, but I haven't been able to get it working yet.

finite perch
#

I'm flying back home now, eager to do a few bigger reviews and get some PRs out the door, later this week, assuming my jet lag give me more time to work on stuff not less πŸ™ƒ

hidden flame
#

Hi, FYI, I did unlock this channel a few days ago. Feel free to go back to spending too much of your free time on managing digital packages πŸ™ƒ

willow flicker
#

Roughly when would packaging 26.0 need to be released to be included in pip 26.0? I forget when in January pip is aiming for.

finite perch
#

If I'm doing the release I will aim to release on the 30th Jan, but please don't rush to get things out for pip, we'll just get it early in the next release cycle

willow flicker
#

I'd like to get an RC out in a day or two, and full release in a week or so, but it could always end up delayed by standardization questions (or if people hit issues with the RC, hoping it will get some exposure)

You've been testing pip with current main, I think?

finite perch
willow flicker
#

Nice, thanks!

finite perch
#

I've removed all "good first issue" labels, there's been a recent pattern of users posting almost identical questions, I think they are doing it as part of some course?

I have no problem guiding students how to submit a PR, but all the actual issues with that label have some nuanced questions that need answering before submitting a PR.

hidden flame
#

Agreed.

hidden flame
finite perch
pale epoch
#

would be so cool if pypi supported a timespan (or rolling hash) range for querying package links. that's the only part i can't optimize away. no clue how likely it would be to get accepted since it took several years for PEP 658 support and nobody could be reached about it. package uploads should be architected as an explicitly append-only log and it would be super cool to work with someone on that

finite perch
#

You're need to discuss with in #pypi or on the warehouse repo, I beleive there's many ways to query their data outside python packaging standards. Certainly lots of corporations just mirror the whole thing and query it locally. But I've never spent any time looking at PyPI implementation details.

pale epoch
#

thanks!

#

i'm sorry for the tone in the previous message. will work on that

hidden flame
#

I'mma try to get some reviews done either tonight or tomorrow πŸ‘

finite perch
#

I think I've found a bunch of subtle bugs to do with yanking, that may require me to make a DPO post and ask for a spec clarification 😭

hidden flame
#

The inconsistencies and gaps are limitless!

finite perch
#

I think I've pinned down all inconsistencies to do with versions and specifiers.

The big source of inconsistencies now are to do with versions/releases vs. distributions.

Yanking is defined on the distribution level, but it has behavior defined on the version/release level.

I need to write some tests, but I'm pretty sure no package tool fully respects the ability to yank a single distribution from a release, but leave others unyanked. And PyPI has no functionality to do this anyway.

#

But I need to spend some time looking at pip's implementation, it might be possible to make it spec compliant without too much complexity.

#

I'm going to keep a close eye on this distribution definition vs. release behaviour for future packaging PEPs

ember shuttle
finite perch
#

Actually, I'm just reading PEP 592 for the first time in a long time, and I have to say I don't think this went through the same level of scrutiny as a packaging PEP would receive today. This is the first paragraph under the "installers" section, and I do not understand the intent of this paragraph:

The desirable experience for users is that once a file is yanked, when a human being is currently trying to directly install a yanked file, that it fails as if that file had been deleted. However, when a human did that a while ago, and now a computer is just continuing to mechanically follow the original order to install the now yanked file, then it acts as if it had not been yanked.

#

As an installer, like pip, how am I supposed to know if a "human being" or a "computer is just continuing to mechanically follow the original order" is calling me?

#

Annoyingly the PEP repeatedly refers to the "yanked release" everywhere except the specification

#

This is definitely going to need a DPO thread πŸ™

ember shuttle
#

Is there not a pre-suppoosition of a lock file? I think that's the intention, if you've "locked" to a yanked release, you'll still get it, but if you're trying to resolve a set of constraints, a yanked release shouldn't be used in that equation

finite perch
#

That makes sense

ember shuttle
finite perch
#

Not from a specification point of view, but good to know there are pypi docs on it

azure heron
#

I don't think any index supports that?

finite perch
# azure heron I don't think we should support yanking at the distribution level

I agree, but in the specification part of the PEP it just says that the file can be yanked

And while the PEP refers to yanked releases, it never makes an attempt to define how that relates to a file being yanked.

So the thread would be to confirm that the expectation is releases are yanked, and the specification is how that's represented in the simple API, so if an installer see's any file yanked it can and should assume the whole release is yanked.

Otherwise the whole section on what installers should do doesn't make any sense.

azure heron
#

I'm not even sure what would happen if something was partially yanked in uv

thorny crypt
#

In Poetry, partial yanks are supported and a release is considered as yanked only if all files of it are yanked. (That's how I interpreted the spec.)

finite perch
#

Yes, I would say it's interpretable the way, but I don't think that was the intent

finite perch
#

Okay, I've re-re-read the PEP, I think what Poetry has implemented is correct. Specifically because of the lines:

In Warehouse, the user experience will be implemented in terms of yanking or unyanking an entire release, rather than as an operation on individual files, which will then be exposed via the API as individual files being yanked.

Other repository implementations may choose to expose this capability in a different way, or not expose it at all.

hidden flame
#

Huh!

finite perch
#

Going to make a pip issue to track supporting yanking files vs. yanking releases

finite perch
hidden flame
#

I will try to squeeze a PR review of the --uploaded-prior-to in this week, but most likely it will have to wait until this weekend.

finite perch
finite perch
#

I've learnt a lot about CI best practises in the past week or so for packaging improvements, once pip 26.0 is out the door I'm going to look to make some improvements

chrome epoch
finite perch
#

I'm going to start curtailing discussions on PRs that aren't about the implementation, having lengthy discussions about if the goal of the PR is valid makes it difficult to review the code, I'm going to start pushing those discussions into issues (or even DPO threads where that's appropriate). I'm very guilty of this myself, so I'm not throwing any blame or shade to particular commenters, but I want to reduce the barriers to PR review to a minimum.

fallen scroll
#

Is pip going to eventually drop support for uninstalling eggs?

jovial jasper
#

Why do you ask?

fallen scroll
#

Mostly curiosity. I was examining the uninstall code and was surprised to see how complex it was.

finite perch
#

I don't think uv supports any of this non standard stuff, and no one appears to be demanding they do, so that's been a good indicator that pip can probably deprecate and remove it without too much trouble

cosmic pebble
#

I haven’t seen any load-bearing eggs in projects in quite a while

dapper laurel
#

I think only old project use eggs and pypi stopped accepting the uploads some time ago

ripe shoal
#

I think the last time I've seen eggs is on projects stuck on a very old setuptools doing development builds. But that probably doesn't work with modern pip

azure heron
finite perch
#

Interesting

azure heron
#

We've had some problems with it though

jovial jasper
#

Legacy editables (via easy_install.pth and .egg-link files) have only been removed from pip in the last release. So we may want to keep supporting uninstalling those for a little while.

hidden flame
#

In general, yes, I would be open to removing support for the ancient distutils egg installation format, but it's also not really a huge amount of tech debt.

#

So, at some point, someone may decide to put in the time to clean things up, but it's not exactly a pressing concern.

hidden flame
#

@finite perch in terms of what's left from me for pip 26.0, I would like to get my egg fragment removal PR in and possibly a small follow up PR for inprocess build deps. If I get around to the latter, I'll likely merge it without review since the feature is experimental anyway.

#

My week is pretty busy so I'm not sure when I will have time, but I'll slot it in somewhere (my to-do list is a living creature and evolves day to day).

finite perch
#

@hidden flame noted, there's a couple of comments on the egg removal that need addressing, please don't merge anything immediately, I'll probably have time to quickly look over a PR

finite perch
#

CI is failing, taking a look