#pip
1 messages Β· Page 4 of 1
Sorry that sort of sounds like I'm trolling, but we do call into Python π
But the workers are all managed in Rust
Yeah, pip has zero parallelization at the moment.
The workers run on threads
Lucky that Rust doesn't have a GIL.
I'm curious, how do you deal with backend calls? They're pretty slow, but there's nothing you can really do. You have to call into Python and pay the startup penalty for every hook.
It's slow!
Do you skip any hooks? For example, pip will call the prepare_metadata hook even if it's 100% going to build a wheel for installation anyway.
(don't get me started on the get_requires_for_build_X hooks, which is of limited use for most packages IMO)
Part of me wonders if we could skip the get_requires_for_build_wheel hook, but that'd require an extension to the PEP 517 interface so packages/backends could declare that they don't have any dynamic backend dependencies.
I did try this https://github.com/astral-sh/uv/pull/895
And I'm not writing a PEP, lol.
The goal here is to improve performance by avoiding the 10ms overhead of invoking a Python process on each hook call.
Man, and pip itself takes like 150ms to start (on my fast SDD-equipped machine) and pip calls itself in a subprocess to install build dependencies.
(it's on our radar to stop doing that, but that will require a lot of refactoring.)
we had the same thought, i'm interested in adding something for that
Ping on vendoring resolvelib 1.1.0: https://github.com/pypa/pip/pull/13001 I would like to get merged asap to work on other resolution issues.
Thinking about it more, I have no idea how we'd pass this state from the backend to the frontend. It'd be most beneficial for source-tree builds, but we basically mandate nothing for those trees (only that they contain a pyproject.toml).
Making the packages themselves declare that they don't have any additional dynamic dependencies in pyproject.toml would sidestep this problem, but it would be of limited benefit as it'd be in opt-in.
@finite perch it does seem like the multiprocessing is using too many jobs which hurts smaller packages. I technically have 16 CPUs, but in practice, I have never seen a meaningful performance improvement over 6 jobs in most applications.
Once we include all of pip's code, i.e. add the vendored libraries, then the performance degradation isn't so severe, but it's clear that using os.cpu_count() is probably isn't a good idea.
Interestingly, using multiprocessing.Pool results in less overhead (over concurrent.futures.ProcessPoolExecutor) ...not sure what's up with that
that does not work because the backend dependencies aren't controlled by the project
Yeah, it'd need an amendment to the interface, which obviously renders this idea moot.
yeah, I don't see how you can get any overhead out from the current interface though
this is essentially constrained by the design
which itself makes sense, and I think needed to have this two-step dependency discovery approach
@finite perch I have a prototype for my parallelized bytecode compilation proposal. Would be curious how it performs on your original example: https://github.com/pypa/pip/issues/12712#issuecomment-2657552309
Thanks, I'll give it a test this week
It's still very far from being production ready, but it'd be good to know whether the idea is sound. If so, I'll tweak/polish it.
does that take under consideration a possibility of using interpreter different from sys.executable? unless that is not what --python flag is about?
It's a prototype
Also, IIRC, that flag works by zipping up pip and provisioning it in the new python temporarily?
oh, ok
https://github.com/pypa/pip/blob/main/src%2Fpip%2F_internal%2Fcli%2Fmain_parser.py#L81-L86 okay it doesn't seem to use a zip app but it does recall pip
tbh I was asking cause we have the case of "compile for another interpreter" issue in #installer and was hoping someone has figured out how to do it properly (especially from the testing side)
yeah, maybe, but easier is not always the right answer
(Compilation for a foreign interpreter, that is, not installing to a foreign interpreter. That's probably useful, and still hard)
Ugh, the main problem with the parallelized compilation is that multiprocessing.spawn is expensive as hell, which means small installs pay like 200ms flat when serial compilation would take much less than that.
There isn't really anything I can do about that though unless I refactor a bunch of installation code to measure how much code is being compiled and switch between serial/parallel compilation based off that (which is going to be a best-effort kind of thing for sure).
*well I guess I could add a pre-install step to inspect the zip metadata to do the necessary measurements, but I'm not sure if that's feasible or practical.
I came to a similar conclusion, I crested simple scripts to test compiling n number of medium sized Python files, and it was dependent on number of files, size of files, and platform, it didn't seem likely to be able to gather all that information ahead of time without a significant PR
Yeah, I'm going to have to do that. Fortunately, I do have a range of hardware available to me, so I can deduce some reasonable cutoffs, but this is going to be a non-trivial patch :(
@willow flicker I'm confused with https://github.com/scikit-build/cmake-python-distributions/issues/586. Are Python-level tools in the parent environment meant to be visible to the build environment?
I guess I can test this out locally, but I always thought that wouldn't be allowed.
If you use an aboslute path to them, yes? Generally using a full path to a script is supposed to work.
And it normally works, but it seems to just be in isolated builds that this fails.
Where is the absolute path involved?
class CMakeExtension(Extension):
def __init__(self, name, source_dir=".", target=None, **kwargs):
super().__init__(name, sources=[], **kwargs)
self.source_dir = Path(source_dir).absolute()
self.target = target if target is not None else name.rpartition(".")[-1]
@classmethod
def cmake_executable(cls):
cmake = os.getenv("CMAKE_EXECUTABLE", "")
if not cmake:
cmake = shutil.which("cmake")
return cmake
This seems like they're trying to find the cmake executable during the build in the isolated env. Are they setting CMAKE_EXECUTABLE?
A path in general. If you run thing, and thing was installed in it's own venv (like from pipx), it should not see the environment you are actually in.
shutil.which returns the absolute path.
ah right, the user/process level PATH
The shabang line installed by pip is supposed to ensure the imports inside the file work.
And it usually does, but it seems isolated builds are weird. I've seen this a few times, but haven't worked out why it doesn't work correctly.
@LecrisUT I think was researching a bug that might be related from another angle a while ago, let me see if I can find it
It's due to the sys.path shenanigans to remove system site-packages. https://github.com/pypa/pip/blob/93d43c9f5cdd40388253be757419b9312da1d24c/src/pip/_internal/build_env.py#L96-L99
Apparently in a virtual environment, system site-packages actually means the virtual environment itself
Python 3.12.4 (main, Jun 21 2024, 18:39:32) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pip._internal.build_env import _get_system_sitepackages
>>> _get_system_sitepackages()
{'/home/ichard26/dev/oss/pip/venv/lib/python3.12/site-packages'}
Yes, it was something related to this that was breaking scikit-build-core. We have an ugly workaround for it that @LecrisUT didn't like.
TBH, I don't really know how build isolation is meant to function specifically. This just seemed wrong, given the whole point of build isolation.
Build isolation is supposed to isolate the Python enviroment. System scripts are valid; for example, gcc needs to work π
It shouldn't matter if the system scripts are written in Python, stuff in the PATH should work.
I do not remember where the issue is, though I do know where the workaround it. The issue post-dates the workaround
Ahh, it's in the comments here: https://github.com/scikit-build/scikit-build-core/pull/880#discussion_r1747646118
Actually that seems to be different.
That's nearly an inverse, I guess - the isolated environment is reporting sysconfig.get_path("purelib") to be the virtual environment it was called from, and not the isolated environment.
That's what I was remembering, though.
(I'd like to totally redo how pip provisions build dependencies in the isolated environment. Not the path/environment shenanigans, but replacing nested pip calls with in-process logic, so I am interested in learning more, but unfortunately I have no one to learn from π)
It seems like this should work the same way a normal virtual environment works. I think that's what we do in build, I don't remember too many shenanigans there.
FWIW, we don't create a virtual environment. We play our own tricks to emulate a virtual environment as it's faster (for one, at least. I wasn't around when build isolation was first added so I don't know the rest).
Ah, yes, but it's actually a lot faster if you avoid seeding pip in the environment, which used to be required, but hasn't been for quite a while. You don't have virtualenv, so tricks still might be faster, but for venv, that helps a lot. Also avoiding pre-compiling byte codes, etc.
Β―_(γ)_/Β―
I wasn't around when this was first added, and I'm pretty unfamiliar with this part of the codebase.
I can tell you that we play additional shenanigans (can't exactly remember what method it is) to add the current pip to the build environment. Also, yeah, we stopped compiling bytecode (it's a recent change).
Support for targeting environments was added well after isolated environments, so not surprised
Well --python is quite slow, the implementation works by calling pip in a subprocess with the target python executable.
Sounds like it works out for y'all at build, but not sure about us.
It's a lot slower to install pip than it is to call a subprocess, but subprocess is likely slower than hacks. I guess that's actually two subprocess calls, since we have to call pip --python in a subprocess in the first place.
Ignore me, I lied :)
Ah, it's probably that we set PYTHONPATH https://github.com/pypa/pip/blob/93d43c9f5cdd40388253be757419b9312da1d24c/src/pip/_internal/build_env.py#L149-L155 which is then picked up by the cmake subprocess
Yeah, if I reset the envvars by replacing self.spawn with subprocess.run([...], env={}), the build works.
Setting PYTHONPATH is always a recipe for disaster π
Well, I'm not in a position to rework pip's build isolation, but there we go. We found the problem.
The entire build isolation feature is severely underspecified. The code itself isn't the easiest to follow.
AFAIU we don't really codify the relationship between the parent and isolated Python environments, which predictably leads us here.
PYTHONPATH has been used to implement some sort of isolated environment for ages. Here's a PR from 2016: https://github.com/pypa/pip/pull/4144
The shabang line is actually doing nothing here. The pip virtual environment Python is the sole Python being invoked here, by pip or the build process itself. The calls are identical for all we can tell.
@finite perch I'm totally guessing (which is why I'm messaging you here, I don't want to spread even more potential misinformation) but I suspect part of it is that Python 2 doesn't have venv when the build isolation logic was initially added in 2016.
There is https://github.com/pypa/pip/pull/11619 but that's been dead for a long time.
Although that's still setting PYTHONPATH so maybe that alone wouldn't fix this.
That makes sense
Hey Richard, you're the RM for 25.1, right?
Is PEP 735 support (https://github.com/pypa/pip/pull/13065#issuecomment-2611010799) still on track to make it in?
Do y'all have specific dates for releases?
I'm not in any rush, I'm just thinking through overhauling my build tooling and wondering what options I'll have available.
no one has approved or objected to me being RM so the answer is no I am not the RM
. I'll approve you 
PEP 735 will probably make it as long as there is consensus among the core team about the UI
I'm not terribly fond of adding new flags and commands to pip because pip is already complicated enough. I've been meaning to take a look at some point, but it's overwhelming.
This actually caused problems for us too... https://github.com/astral-sh/ruff/issues/13321
See reproducible at https://github.com/gaborbernat/ruff-find-bin-during-pip-build/actions/runs/10802198412/job/29963805989 With code gaborbernat/ruff-find-bin-during-pip-build@fd532e8 In this case,...
oh fun, there is an overhead difference between pip and python -m pip for multiprocessing because it imports the main module while spawning new subprocesses
I wonder if I could temporarily override the main module to workaround this.
original_main = sys.modules["__main__"]
try:
import pip.__main__
sys.modules["__main__"] = sys.modules["pip.__main__"]
ctx = multiprocessing.get_context("spawn")
self.pool = ctx.Pool(workers)
finally:
sys.modules["__main__"] = original_main
Of course this results in a nontrivial reduction in pool initialization time. I love Python...
**This only works because I purposefully designed the src/pip/_internal/utils/compile.py module to be completely independent from the rest of pip, knowing that import time would be critical.
This is still a TON of imports, but it's much better than the 350 before the __main__ hack.
~80 is required during Python startup, and the rest are used by compile.py or multiprocessing itself
What a non-trivial feature ... so much work already, and there's still more to do
part of me is feeling crazy enough to embark on a project to finally overhaul pip's keyring and "HTTP authentication" story
I 100% should not and will not do that as I already have enough things to do, but it is such a major sticking point for our users.
It'd be a good project to pay someone to review the feedback pip has received, do additional research on the needs (given it's dominated by corporate interests where we have poor visibility), and propose a proper solution. Of course, that is also reliant on having enough maintainer bandwidth to manage such a project...
tbh, with what I have learned about keyring interaction we have in Poetry, it's pretty much an impossible topic. There is no way that makes everyone happy
great π
Another idea, pip plugins for HTTP authentication/keyring support. That's also an insanely large project, but it could happen. (There is an unrelated plugin proposal that seems nice, although I haven't thought about it too much.)
That would open another can of worms called "plugin api" and breaking the decades-long motto of "the only API is CLI"
I'm mostly dreaming. Realistically nothing is going to change in the forseeable future.
The main improvement I would want to see for keyring in pip is that all the logic is vendored out to another library and then we can point users there!
keyring was supposed to be that, but it turns out a lot of people don't like it.
The sticking point, other than limitations in keyring itself, is how pip uses keyring. That's why I'd want to delegate all of the responsibility for custom HTTP authentication to a plugin.
That way, the corporate users can devise their own solutions. Of course, those solutions will probably end up requiring a fair bit of support from pip's end as said corporate users come up with more and more esoteric requirements (that demand more APIs).
"I want to hook into the PEP 517 build subprocesses so I can inject my authentication headers into the requests.get calls that my Janga-tower of a setup.py build script issues to fetch supplementary resources."
There's already a solution for corporate users to do all of this, point to a local proxy and write all that logic in the proxy
"That's too much work. Can you please just add a --do-magic CLI option?"
Ahem. Anyways. That's enough complaining about corporate users from me.
I just think any HTTP plugin solution would need to justify itself compared to a proxy, because the nice thing about a proxy is it will work with any HTTP library and any language, it leaves us very decoupled from supporting specific APIs and works with more than just pip for the user
Oh yeah. It circles back to the idea of "having too many users." At a certain point, they're going to want a million things. Maintenance considerations (and generally codebase sanity) mean not doing that.
It is a real need, and fixing it would be a good thing, but it's not worth the maintenance costs.
Yea, as far as I can tell, the problem is that no one cares enough to pay for it.
This is by far the worst thing I have ownership of in uv
oh no π
It needs to be overhauled π
Us too
We do this https://github.com/pypa/pip/pull/12496
Or something like it
It's the source of all sorts of problems
that's not great...
glad that we didn't merge it, although realistically, we've been in the same place effectively regardless
I'm having trouble finding an answer for this, so wanted to ask here before opening an issue. If I have a package foo with an optional dependency bar>2.0 that I install with baz which requires bar<2.0, I don't get an error (or even a warning). Is there any tooling right now to highlight this conflict or is it always going to be a runtime issue?
This is currently working by design π
OK, my parallel bytecode install branch is in a workable state. I still need do some experimenting to decide whether/what cutoffs to add and maybe some documentation. https://github.com/pypa/pip/compare/main...ichard26:pip:perf/parallel-compile
I renamed --workers to --install-jobs where 1 disables the parallelization. The option name is misleading as it's only bytecode compilation that's parallelized, but an option called --bytecode-jobs seems obscure.
Bikeshedding: --install-workers
Fair enough
It'll go through another round of bikeshedding once I file a PR, but that's still a ~week away.
There's some discussion in https://github.com/pypa/pip/issues/7122, I have some strong opinions on how pip should handle this, and I do plan to write up a proposal, but I haven't had any time, and might not have any time for a while
Ah, that's really buried there in the issue. Thanks for pointing it out! I'll give it a read tonight.
that is a lot of labels...
Just needs a "good first issue" to round it out.
needs eyes
The other solution to the problem with --install-jobs being inaccurate is to add more parallelization to pip install. I could also look into parallelizing the zip extraction, although this would be more involved than the bytecode change.
Especially since zip extraction is also a very file I/O heavy task, so it may be important to also use multithreading at the same time. I'll have to do some experiments.
Thoughts on Β« https://github.com/pypa/pip/issues/12712#issuecomment-2672876961 Β» are welcome
@finite perch FWIW, with the parallelization of bytecode compilation and potentially zip extraction, I'd mostly consider https://github.com/pypa/pip/issues/12742 as moot. While there would be smaller performance wins to be had if we parallelize the entirety of the install logic, bytecode compilation and zip extraction are AFAIK the largest two contributors to install time anyway. The next contributor would be file I/O, but I'm not really sure if we can readily optimise that (I may try to roll that into a PR parallelizing zip extraction, however).
I believe the approach of long-lived worker pools for handling specific slow operations during installation should be more maintainable/less risky and performant enough to be sufficient.
I agree, with all the work done (by mostly you), I think it's time to close that specific issue
Not yet. I'll close it when my parallel PRs are landed :P
zip extraction can absolutely be parallelized i have been working on exactly that recently and would love to consult or review or contribute to a PR doing that
i have an implementation of this in the rust zip crate https://github.com/zip-rs/zip2/pull/236 but as @hidden flame said it's not trivial to optimize. we wouldn't need to engage the full complexity of that solution for pip to get some real benefit. the main performance blocker for zip extraction is the way naive decompression intersperses cpu-bound work with io requests. there are ways to pipeline the reads to avoid this.
secondly, multiple files can be extracted at once to take advantage of multicore CPUs. this requires some work up front and also may require the pread() posix syscall to read in parallel from a source file.
it would require some effort to translate this into something pip can do in python code but i absolutely think this can be done.
haven't been active recently due to grad school apps but would love to get back into working on pip
Have you been following my parallelization work in https://github.com/pypa/pip/issues/12712? There is a stream of consciousness on that issue of my thoughts on parallelizing pip install.
Technically there's a dedicated issue for parallelizing pip install, but I forgot about that issue.
I was only interested in seeing whether naive parallelization would help but no, it doesn't. We would indeed need to pipeline the I/O and CPU work to maximize efficiency. That is far outside of my scope.
I'd much prefer getting your existing PRs in at some point first before working on any more π
I am finally catching up on some PR review (none of your PRs, sadly). I plan on reviewing the resumable download and pip cache remove pattern matching PRs next.
no i'm onboarding now π
TL;DR I have a branch to parallelize bytecode compilation only. It is the largest contributor to the install step time and is probably the easiest thing to parallelize. Although the branch is already at +430/50 changes, and I still want to add one improvement: https://github.com/pypa/pip/compare/main...ichard26:pip:perf/parallel-compile
this is incredibly exciting
The complexity is limitless 
I've taken this simple function extracted from the monolithic operations.install.wheel code:
def _compile_single(py_path: Union[str, Path]) -> CompileResult:
stdout = StreamWrapper.from_stream(sys.stdout)
# TODO: is catching warnings necessary?
with warnings.catch_warnings(), redirect_stdout(stdout):
warnings.filterwarnings("ignore")
is_success = compileall.compile_file(py_path, force=True, quiet=True)
pyc_path = importlib.util.cache_from_source(py_path)
# XXX: compile_file() should return a bool (typeshed bug?)
return CompileResult(py_path, pyc_path, bool(is_success), stdout.getvalue())
And somehow turned it into a 500+ LOC change.
~Roughly half of the additions are unit tests, and then there's all of the support code to determine whether to use parallelization, how many cores, plumbing that configuration from the CLI, etc.
my heart. a connoisseur of unit testing
It's also just really well commented. This is possibly the most well commented module I've ever written: https://github.com/ichard26/pip/blob/perf/parallel-compile/src/pip/_internal/utils/pyc_compile.py
i counter with my lazy wheel PR https://github.com/pypa/pip/blob/b06d73bfc8f2a54149b95d38499d72c328588546/src/pip/_internal/network/lazy_wheel.py
this rocks
@contextmanager
def _patch_main_module_hack() -> Iterator[None]:
"""Temporarily replace __main__ to reduce the subprocess startup overhead.
multiprocessing imports the main module while initializing subprocesses
so the global state is retained in the subprocesses. Unfortunately, when pip
is run from a console script wrapper, the wrapper unconditionally imports
pip._internal.cli.main and everything else it requires. This is *slow*.
This module is wholly independent(*) from the rest of the codebase, so we can
avoid the costly re-import of pip by replacing sys.modules["__main__"] with
any random module that does functionally nothing (e.g., pip.__init__).
(*) This module's entrypoint does import from pip. This is fine as it's only
called in the main process where the imports have already executed.
"""
original_main = sys.modules["__main__"]
sys.modules["__main__"] = sys.modules["pip"]
try:
yield
finally:
sys.modules["__main__"] = original_main
... π
this is really really cool
And so hacky and awful, haha
In other words, multiprocessing has significantly more overhead if you run pip install vs python -m pip install
pex does a lot of hacks like this in its bootstrap script for startup time since it invokes pip as a subprocess
i think they will love this
as someone who has to maintain this code that will be running on millions of machines... hard to feel the same way :P
I really don't want to break the world hence the extra effort to submit a very polished PR.
sorry i'm trying to be supportive here but was failing to understand your point that this code is also a little scary. understood
hahaah, I appreciate the excitement! It's just that I've been stuck in polish-limbo for a little bit
I need to benchmark the process startup time on Windows/macOS so I can decide on a cutoff value on total .py size to always use serial compilation.
I am very excited for the change as well. It will be a major performance win for any remotely nontrivial install π
haha, of course
i have responded to the review comments on https://github.com/pypa/pip/pull/12208 and would appreciate help getting this merged
Can someone approve these unit test additions please, looking to merge it and add a couple of follow up PRs very soon https://github.com/pypa/pip/pull/13230
suppose I want a hook to make sure that a particular branch of a remote Git repo is always in sync, would the following work?
pip install "proj@git+https://github.com/org/proj.git@main"
or would subsequent runs not fetch and I must manually maintain a local Git repo and instead point pip to that?
I've been doing that and assuming that works, though I can't say I've verified it, I would be surprised if somehow I missed the fact that this didn't update my package
I also haven't tested but I would actually be surprised if it did our expected behavior because the JSON file that is saved when you install from a direct reference is supposed to encode just the branch name if you request the branch name and, using naΓ―ve logic, if you request again it would be a no-op in theory
It will fetch again on subsequent calls. If not please file an issue and ping me.
just FYI I also asked UV about their behavior and by default they do not refresh so you must pass another flag to the install command
Yeah, uv tries to avoid doing any work
But avoiding reinstalling a source tree because you'll think it will result in the same package is unsound, uv has heuristics, and multiple ways for you to tell uv that the package has changed, it's a hard design problem, I doubt there will ever be the motivation on pip side to not reinstall a source tree when the user gives a command to install it
deciding when to reinstall direct URLs is hard indeed. https://github.com/pypa/pip/pull/10564
pip does cache the built wheel if the git ref is a commit sha, though
but many improvements could be done in that area, such as caching wheels off git branches with the cache entry keyed on the commit sha
also, caching prepared metadata, etc
PTAL, I rewrote the flaky test to use a local git repo instead of Launchpad bzr: https://github.com/pypa/pip/pull/13243
The frustration was reaching pretty seriously high levels 
have several PRs for this
I can never win: https://github.com/pypa/pip/pull/13243#issuecomment-2683203789
me: fixes flaky test
[~12 hours later]
the world: how about a new setuptools release that breaks 30 tests?
me: 
Honestly, part of me wonders if they'll revert this change. It seems rather disruptive.
is this name normalization?
I mean, it does seem like a step forward to emit the correct names
yup
sorry it broke the tests π¦
it's fine, but I'd like to wait until the dust settles before fixing all of the tests
but it's been part of the wheel spec for a loooong time so I think reverting should be unlikely
Ah I didn't realize the specification was updated.
yeah, one of the downsides with the "specifications live at packaging.python.org and not in PEPs" I think
was updated in https://github.com/pypa/packaging.python.org/pull/1032 four years ago, I missed it too
While we're here, how common are wheels with unnormalized .dist-info directories? I wrote an optimization to use the name from the .dist-info directory name (instead of looking at the METADATA file), but I'm wondering that's broken
@hidden flame wheels built with setuptools suffer(ed) from the same bug, so it's fairly common, but usually (always?) they match the project name in the filename exactly
https://github.com/pypi/warehouse/issues/17377 has some stats on how common that is generally
@hidden flame got a present for you: https://github.com/pypa/pip/pull/13246
Testing the compilation code with progressively larger amounts of code until the serial and parallel impls. take ~roughly the same amount of time on my (decently fast) Ryzen 5800H Ubuntu box with the spawn multiprocessing start method seems to suggest a MB of code is probably fine as a cutoff. With very large files, the equivalency point is higher, but at that point, the technically avoidable overhead is rather acceptable. Still need to think through this logic more, but this is progress π
Fork (the default on Linux before Python 3.14) has noticeably less overhead, as expected
I am worried once I start testing other systems though, I'd imagine Windows/macOS will see much worse worker creation overhead
hahaha, thank you so much! 
The goal to avoid wasting 100ms on creating a pool of workers when someone is simply trying to install a few small packages that'd compile serially in like <50ms. It honestly doesn't really need to be that complicated. I realize that I've made this branch very complex. It should (hopefully) be not too bad to review, though.
~100ms of process creation overhead on Windows, that's not as bad I thought. I do think my laptop is unreasonably fast for this though.
install step durations:
- setuptools: 2.7s -> 0.8s
- tests/requirements.txt: 9.5s -> 3.4s
the gains are even bigger on windows π
Can you do some happy eyeballs thing where if there's no free processes you spawn a new process and schedule a new task, and either an existing process can finish a task then start the new task or the new process can pick up the new task
The concurrent.futures executors spin up new workers (until the max) on a "as-needed basis" although it's definitely not as fancy as like a happy eyeball algorithm
I'm just trying to avoid "you install six (or $other-small-package) and wait ~100ms for a worker to be spun up that isn't going to be helpful"
I don't really want to think through manually scaling the parallelism. That's a lot of work for relatively minimal gain.
This feels good. β¨
Somehow writing this commit message took nearly an hour... which doesn't feel as good.
FAILED tests/functional/test_install.py::test_double_install_fail - AssertionError: Script returned code: 2
= 1 failed, 394 passed, 29 skipped, 16 xfailed, 42 warnings in 480.66s (0:08:00) =
Lazy imports and reinstalling pip really is the source of all hell.
Other than that, the test suite is passing.
diff --git a/src/pip/_internal/commands/install.py b/src/pip/_internal/commands/install.py
index db61d7b7e..9f5f48123 100644
--- a/src/pip/_internal/commands/install.py
+++ b/src/pip/_internal/commands/install.py
@@ -418,6 +418,10 @@ class InstallCommand(RequirementCommand):
# we're not modifying it.
modifying_pip = pip_req.satisfied_by is None
protect_pip_from_modification_on_windows(modifying_pip=modifying_pip)
+ if modifying_pip:
+ # Parallelization will re-import pip when starting new workers
+ # during installation which is unsafe if pip is being modified.
+ options.install_jobs = 1
fun fun fun π
There might be scores of deps other than pip here. Shouldnβt be; but can be. Would it make sense to install everything parallel, then pip last in serial?
An explicit goal was to avoid touching the installation order.
Well you canβt guarantee the order if itβs parallel right?
You haven't read the PR :P
Ahhh β you parallelise inside one package install. π
A pool of bytecode compiler workers is created right before installation starts and reused for all packages. Python files are submitted to the pool to be compiled on masse.
The other solution to patch __main__ to point to a completely unrelated module, but I don't really want to create an entirely new module outside of the pip namespace. It has to be outside of the pip root package because importing anything in pip will at the bare minimum result in pip.__init__ being imported as well (which is why CI is failing).
I'm pretty sure some folks would get pretty mad if they saw a new __pip_compile_empty.py top-level module in the pip distributions.
Yeah Iβd not complicate the PR by doing (*before_deps_parallel, pip_serial, *after_deps_serial) β especially since 90% of cases are pip/setuptools.
For context, multiprocessing will import __main__ to restore any global state in the newly created processes. It's annoying, but it makes sense.
I know. It is a pain.
I'm already patching __main__ with sys.modules["pip"] to avoid re-importing pip's actual internals which is god awfully slow, but it's the automatic import of pip.__init__ that's blowing up the test suite.
I don't think setuptools will crash pip...?
Makes sense if pip is updating pip.
I meant 90% of cases where a package is installed alongside pip.
No effect on the PR.
ah, yeah. It sucks, but I don't think I have much better alternatives (without a fair bit extra complexity).
Fortunately, people are probably installing setuptools with pip less IMO now that setuptools isn't needed to install things since pip will install setuptools in an isolated build environment anyway.
Youβd be surprised. π
well
it is pip (and Python packaging, generally), everything is acceptable
I am scared for the fallout when this is merged and released.
Lazy imports is generally "a bad idea" when you're installing/replacing packages. pip is in the awkward position of doing the latter as its primary job.
Lazy imports have already broken reinstalling pip before.
I understand the pickle yeah. Pip changes pip and the new pip is picked up by multiprocessing workers.
Other this bug, and a few other silly bugs, this has been surprisingly kind to the test suite. π ... although the devils are in the details... or edge cases, rather.
Funnily enough I couldn't reproduce the crash locally. This is because on Linux, multiprocessing defaults to the fork start method whichβfor reasons I don't knowβforces all of the workers to be created immediately during pool initialization. ProcessPoolExecutor will only create workers on-demand when the forkserver or spawn (default on Windows/macOS) methods are in use.
*although this will no longer be true on Python 3.14 where the default start method on Linux has been changed to forkserver.
Found a comment in pip's source code "# Old, pre-Optik 1.5 behaviour." and that caused a brief history lesson detour
Also, I'm looking for someone to accept https://github.com/pypa/pip/pull/13244 that fixes direct preferences in resolving.
The change itself is quite simple, there is an MRE of a real issue from a user that can be used to validate it, and I've added a unit test to prevent regressions, though unfortunately setting up the unit test is far more complex than the change.
I've got more of these fixes coming, so the quicker we can merge the more merge conflicts I can avoid. I'm happy to answer questions on the PR or here on what exactly get_preference is doing and what this is a fix.
I'm pretty pessimistic about the parallel PR. While it should be doable to address the security concerns, I'm not really willing to fight for this if the sentiment is that it's simply better to flip --no-compile as the default.
It's really my fault for not ensuring there was consensus on this change before working on this in earnest.
I'm going to make a reply to that PR later today that tries to decouple the discussion, IMO the current direction is unproductive.
I've not reviewed the PR itself though, so I can't comment on the complexity by itself yet.
I took a cursory look at the PR, as I have one small note, you've left a lot of PR comments on specific lines, but at least some of them would be more valuable as code comments, e.g. "This import is done here so I can catch any import errors and fall back to serial compilation.", I'd rather you put that as a code comment so someone looking at it later doesn't have to find the PR to get that information
Done, it wasn't as visible to me that it was still locked
I felt that I already included too many code comments, but sure I can convert some of my PR comments to code comments.
Well, that might be true, but IMO if it's important enough to write a comment on a specific line in a PR, it's important enough to write that line in the code itself.
Eh, I'm not so sure I agree with that in practice. Some review comments are used to simply help put the changes in context. Once it's landed, it's not so relevant. I agree the comment you mentioned should be added to the code (and the incremental requirement inspection comment).
Well, maybe the heuristic is overly simplistic, I do agree that review comments can be helpful, just to me once it's adding context to specific lines of code, it's got to the point where that context is probably just as useful in the code itself, I could probably proven wrong with counter examples though
I am trying to document more, having read Woodruff's blog post on one's personal OSS blast radius:
Ludicrous amounts of documentation. Iβm one of those people who thinks that nearly every code surface should be documented, not just with structured machine-parseable specifications but also with the mental state of the engineer who wrote it. Mental states give future maintainers (including the ones who might be fixing my vulnerabilities!) insight into the why and not just the what of the code theyβre grappling with.
But at a certain point, it's overwhelming.
I'd not read that before, it's an interesting read, I don't think by the blog's definition any maintainer of pip has a blast radius because they maintain pip, but pip is somewhat on the edge of meeting the definition
Pip in the grand scheme of things isn't doing that bad.
We have like 6 core maintainers, although a few are of course mostly inactive.
If some CVE with real world impact comes out about the pip resolver tomorrow I'm not sure if my flight or fight instinct will kick in π
I fortunately haven't had to deal with a CVE yet but I was around when a click release broke black, that was not fun.
does pip no longer provide an isolated build environment for packages that only have a setup.py?
It should still. are you facing a bug?
hmm wait, there may be an exception where if setuptools/wheel are installed in the environment then pip won't provision an isolated environment for legacy builds
error:
so after you fixed your environment you actually cannot reproduce this?
oh, I don't know if this is relevant, but I just realized this is not a virtual environment but rather a Conda environment
1 big difference between virtual environments and Conda environments is Conda environments will take and prefer user installed packages over environment installed packages
It's interesting that it's specifically Can not execute setup.py since setuptools is not available in the build environment. that is being raised. That's pip's own error. So pip obviously found a setuptools and wheel install locally, but then the build process didn't.
I would be tempted to try replace pip install with python -I -m pip install, or some other isolation level
hmm, not sure how build isolation interacts with Python flags, bet that's fun
Also looking at that script I would be concerned that conda has already installed setuptools, and installed it in a way that pip fails to uninstall it, leaving two versions of setuptools installed
force uninstalling and reinstalling worked... thanks!
FYI I'm going to be relatively inactive for a little bit.
@finite perch what's the status on packaging.utils.parse_wheel_filename? I want to try adjusting https://github.com/pypa/pip/pull/13094 to use it instead of handrolling its own normalization.
https://github.com/pypa/pip/pull/13094#pullrequestreview-2669292777 what a review π
The resume PR is next on my review list. Given how long ^ took, this may take a few hours...
The problem is pip currently accepts wheels that parse_wheel_filename would error out on, until pip 25.3
@finite perch fwiw, I'm also planning to take a look at the resume PR, the more reviews the merrier though, of course :)
*at least π
setuptools only changed behaviour like, last month.
We likely want to be a lot more cautious about the rollout around rejecting non-normalised names.
setuptools changed the behaviour that they only produce fully normalized wheel filenames, but this was a change in the spec that happened ~4 years ago, before that non-normalized wheel filenames were accepted, and this is actually what parse_wheel_filename parses against, so old setuptool produced wheels still work against parse_wheel_filename
That is parse_wheel_filename parses against the spec as it was ~4 years ago, not today's spec
there was a proposal to make it reject non-normalized names, no?
IIUC that was rejected due to impact though
Yes, and it was agreed not to do that
The problem is that pip accepts wheels that are not spec compliant against even the old spec
And because pip accepts it, there are real users doing it
Oh wow, there's a . in there. π
Ignore me, I'm just completely off base here. π
Pretty sure the spec says that installers are expected to deal with the legacy names tho.
Tools consuming wheels must be prepared to accept . (FULL STOP) and uppercase letters, however, as these were allowed by an earlier version of this specification.
Doesn't seem like tools need to deal with all forms of legacy names.
Yeah, we'll see what the damage is once the deprecation warning is in a released version of pip. And I certainly think pip should accept wheel filenames if they can be normalized to the spec.
As per the discussion in https://github.com/pypa/packaging/issues/873, if we get an API which has a strict and non-strict mode I will migrate pip to always consume wheels in the non-strict mode
I fell ill (or rather, ill again) this week :( I'll try to get the resumable PR reviewedβat least partially reviewedβthis week, but I'm not as sure as I thought
get well soon! your health comes first, the PR can wait
would it be very challenging to improve the errors regarding failed resolution in the case of Python version requirements? the package in question has never supported anything but 3.12.0 or higher https://github.com/DataDog/datadog-agent/pull/35056
I think there's several open issues on this, I've been meaning to consolidate and clarify the.
I've not looked at the code, but my high level understanding is the issue stems from the separation of concerns in the design. The resolver determines the needed requirements and if they are satisfied, and the collector collects candidates that match that requirement. The collector doesn't know if a requirement is necessary to complete a resolution. And the resolver doesn't know the collector filtered out candidates that don't match the Python version.
But having not looked at the code, maybe there is some neat trick that can improve the error message
Also, I'm pretty sure this got worse by a recent regression: https://github.com/pypa/pip/issues/13260
yes that matches what I remember from my deep dive in Q4, probably blocked on large changes which require deliberate design choices like this https://github.com/pypa/pip/issues/13111
I took a brief look at the resume PR, I have some questions for the code. I haven't actually tested it out locally though. I shall do that now.
This may be one of the situations where we need to ask for broader feedback, I'm not sold on the current UX where retrying is opt-in and it does not persist across pip installs.
I get it's much easier to implement resume if it's only in-process, but imagine a situation where I download half of a $very-large-wheel and then the download is dropped. I get a incomplete-download-error explaining what happened. Good! What's not so good is that pip will have to download all of said $very-large-wheel again.
I'd rather start with it being in-process, it is much simpler and solves the main use case. Already cache control is so complex and the source of problems, I don't want to see even more complex problems replicated somewhere else.
I understand your criticism on the UX, but it seems that would mostly be solved by having a non-zero resume download default, say 2 or 3.
I'd be pretty frustrated if I had to opt-in into resumption and have to redownload all of $my-large-wheel
Especially if I'm on a slow or metered connection.
Well, still better than the current experience, of good luck if it ever downloads
I definitely agree this is an improvement over the status quo (I thought I'd sent a message saying that, but I guess not) but I'm 90% sure we'd immediately get complaints/requests to smooth the rough edges.
The PR isn't really that bad to review, all things considered. I haven't looked at the tests, but a large part of the changes is simply refactoring to accommodate the resumption logic. And this is actually something that's in my wheelhouse.
I agree. I'll suggest that we make resumption out-out in my initial review.
Caching is already complex enough, indeed.
I love the new install progress bar π
awesome, thank you! i'm glad to hear it!
I still do want to make installs faster (whether that's by parallelizing or disabling bytecode compilation by default). I was reading https://pythonspeed.com/articles/faster-pip-installs/ which highlights that the majority of the performance advantage of uv comes from disabling compilation and being parallelized. Beyond that, uv isn't really that much faster.
That's nothing surprising when you think about it, of course. It just demonstrates that in theory, it shouldn't take that much to make pip install more in line with uv. Reality makes that harder, though, with backwards compatibility, general compatibility constraints, etc. etc.
Yeah, you can only unzip a wheel so fast
Unless we make unzipping faster
Someone tried replacing zlib in CPython with zlib-ng to improve the extraction of pip install. One test improved the time from 14s to 9s: https://github.com/python/cpython/issues/91349#issuecomment-2722244407
I may be in the minority, but I don't generally like manual linting PRs. If they're well-reasoned, then sure, but in general, they aren't.
It's actually more work then letting the infrastructure maintain itself.
Same FWIW.
I think they can be an okay way to figure out how to contribute to a project, but yeah, pip could probably do with minimizing the number of linting rules it is using to "almost certainly an error" and "overwhelming an accepted style", I wouldn't mind doing a review and updating the linting section to include information for future contributors
I think they can be an okay way to figure out how to contribute to a project
Yeah, if they're done right. Honestly though, I've grown rather cynical about the ability for OSS to attract and retain new contributors. External contributors are essential, of course. They often contribute improvements and fixes for areas that the core team isn't interested in or where there is a lack of subject expertise. However, the casual contributor is not going to stick around 98% of the time. They'll leave before their contributions are a net positive (in terms of time/effort invested).
That's admittedly a cold way of looking at it, but I am painfully aware that I can burn out, so I try to avoid overextending myself. I'll do my best, but I do expect other people to put in effort as well.
FYI, we may see some reports of setuptools 78.0.0 causing packages failing to build: https://github.com/pypa/setuptools/issues/4910
uv is getting a lot of users complaining, but so far quiet on pip side, but it affects pip installs the same as uv installs, until setuptools yanks 78.0.0 or releases a new version, the only solution is to set setuptools<78 in a text file with PIP_CONSTRAINT pointed to that file.
how great
We haven't upgraded our vendored copy of setuptools in a while. There isn't really a compelling reason to do as it is only required for the legacy pkg_resources metadata backend. And yeah given the recent string of setuptools breakages, I wouldn't want to upgrade setuptools unless we need to.
Probably only if a new version of Python breaks it. I do worry if we'll ever be able to drop our legacy paths, it's not like old pure Python packages are going away
Setuptools should be feasible to drop in a few years. The pkg_resources backend is the default on Python 3.10 and 3.9 only.
We still use it for all versions to discover legacy and ancient .egg distributions, but those are on their way out.
Maybe in 2027 or whatever, π
Exciting
@finite perch feel free to merge https://github.com/pypa/pip/pull/12873 if you'd like to get it in. I'm not really enthusiastic for the PR, but it is technically a positive change.
The examples given by the author are not particularly convincing (saving 6ms, how nice) but I assume you've seen scenarios where it'd be more applicable.
Yeah, I'll merge in a few days, having followed along with uv's resolver performance gains, outside changes to the algorithm the answer was often more things being cached, and figuring out how to make objects smaller
Smaller is probably not something we'd benefit from in Python land. Python objects are huge so we're going to hit the allocator a ton regardless.
Less expensive objects to create would be a win, however.
The performance impact is real, but only in really massive examples, I already measured it when I set the cache to 2048 that some scenarios that I knew of did not fit into this, I just picked 2048 as "probably good enough that no one would ever complain", but someone did
fair enough, I'm happy to defer to your judgement :)
"until someone complains" is a reasonable way to approach this
this is so noisy :(
This is probably the less common scenario for resumption though as I'm disconnecting the Internet entirely to trigger the resumption logic. In practice, I'd imagine the download fails, but the request/connection can be easily restored.
haha, amazing the download was dropped without me causing it while testing this out
I have have had inconsistent connections to indices where this message gets sent but I succeed in installing
Which message?
Unless you get a hard error, not all hope is lost. requests/urllib3 will retry the request (not the download of the response contents)
oh the connection failure message
it's a warning
a specific attempt to establish a connection failed, but urllib3 will retry
yep, I just was responding you to saying it is less common for resumption
The warning it issues when it does that is horrible, though. I want to improve it, but it's not trivial.
(the giant warning is raised from urllib3, not pip!)
While you're here, any thoughts on full download retries being enabled with download resumption? For context, I'm reviewing a PR that enables pip to resume/retry incomplete downloads. It will use HTTP range requests to resume the download if possible, but will fall back to restarting the download if needed.
My gut feeling is that users won't want pip to restart a download from scratch multiple times, only resume.
hmm let me think about that
I suppose for small files, it doesn't really matter (or could be a benefit), but for large files, it seems unproductive to restart the download from zero multiple times.
I think I definitely agree it depends on file size, but I also think being too clever would confuse people
Yeah, that's the million $ question, which is more reasonable to always do?
For additional context, I'd like to eventually make automatic resumes the default in pip 25.3. This PR should be available in pip 25.2, but will require --resume-retries to be enabled. We want a release cycle to stabilize and make any future tweaks before turning it on by default.
If resumes are the default behaviour, I believe that the downsides of large downloads being restarted outweighs the benefits for small downloads.
I would say that connections that are unstable are more likely to be slow as well. In that case I would say it would be fine to retry because the total amount downloaded wouldn't be as large for large files
(yes, I know the terminology is confusing. It's the resume PR, but restarting is also included π)
I would also argue that small downloads are less likely to fail and thus benefit from the restarting.
I agree with that
Realistically, I can't make this switch unilaterally. I'll include this in my review and we'll iterate.
Part of me wants to keep this functionality as simple as possible (i.e. remove restarting) because this is already complicated 
There's the connection retrying from urllib3, the connection timeout/read timeout, and now also download resuming and restarting (aka "retrying" together).
Having --timeout, --retries, and --resume-retries is also not great.
Sorry I'm confused
To be clear, what I am saying is trying to resume large file downloads seems fine to me
Resume or restart?
Resume. I might argue that restart is ok as well
Resume, 100%. Restart, that's what I was asking you about
The scenario where it is restarted I presume the remote doesn't support resumption or something?
I want resume to be the default, but I'm not sure including automatic restart is also a good idea.
Yes.
Imagine how I feel reviewing this
heh
*specifically, it doesn't support HTTP range requests or the file was changed on the server since the last download (by checking ETag/Last-modified)
I will also re-review everything to make sure the terminology/messaging is clear. The difference between resuming/restarting is subtle.
I think people can Ctrl+C if it is restarting more than they want
You can't force pip/urllib3/etc to restart as a user
I think "will restarting work" is a very context dependent question
so my gut tells me you should leave it up to the user to decide based on context
They don't have a choice. There's only --resume-retries which configures how many resume/restart attempts are allowed (it's a shared counter).
If they want to allow resumes, they also allow restarts.
I think the implicit choice is that if you do restarts by default, they can Ctrl+C
I guess so. I'll leave it as an open question.
As an opt-in feature, I agree it's fine, but I'm still unconvinced it's good as a default.
I do want to get this in for pip 25.2, so we can leave this for later once we get feedback.
yeah makes sense!
It's fortunately something we can change without backwards compatibility concerns (unlike many things in pip π)
Yeah, makes total sense. If restarts weren't default, I presume --resume-retries would stay a flag?
It's going to have to exist forever (which I don't like, but pip having too many flags is a ship that sailed a long time ago).
I feel that, mypy has so many I can't keep them all in my head π΅βπ«
I strongly suspect there are situations where people don't want any resumes or restarts, so --resume-retries 0 must be supported forever.
yes, I expect so as well
I can't imagine what those situations would be, but probably rather horrible
"my server adds a client to a blocklist if it's too aggressive in resuming downloads" β some corporate user with a weird proxy/server
Actually, that seems likely to be rather disruptive generally. Something like "my server doesn't understand http range requests, but freaks out/misbehaves when one is received" seems more likely.
yep
though that type of server is bound to run into issues as I doubt it servers .metadata files, so pip will try to use range requests anyway
That's not true...?
You need to pass --use-feature=fast-deps for that (and there's no plans to enable it as a fallback by default)
Oh my mistake! I thought pip tried to use range requests to read METADATA by default as a fallback if .metadata isn't served
That's what fast-deps does IIRC. It's currently in this bad in-between state where it's strictly worse than .metadata files and it's also actually not that much faster in a lot of situations because it makes pip issue way more HTTP requests (incurring the request overhead)
Interesting, that's really good to know!
There is a PR to make fast-deps more efficient but because .metadata is so much better (and supported by PyPI) there is little motivation to actually review and merge that.
Yeah, I wish it was mandated that .metadata was served from simple API servers π₯²
Hi everyone,
can i get some help on why i am getting ths error when i try to install my package?
Looking in indexes: https://test.pypi.org/simple/
Collecting walacor-python-sdk
Using cached https://test-files.pythonhosted.org/packages/29/bd//walacor_python_sdk-0.0.2rc1-py3-none-any.whl.metadata (3.5 kB)
INFO: pip is looking at multiple versions of walacor-python-sdk to determine which version is compatible with other requirements. This could take a while.
Using cached https://test-files.pythonhosted.org/packages/2d/30//walacor_python_sdk-0.0.1rc1-py3-none-any.whl.metadata (3.5 kB)
ERROR: Cannot install walacor-python-sdk==0.0.1rc1 and walacor-python-sdk==0.0.2rc1 because these package versions have conflicting dependencies.
The conflict is caused by:
walacor-python-sdk 0.0.2rc1 depends on requests>=2.20
walacor-python-sdk 0.0.1rc1 depends on requests>=2.20
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
What command are you running? What index configuration do you have?
this is the package https://test.pypi.org/project/walacor-python-sdk/
jus copy pasting this pip install -i https://test.pypi.org/simple/ walacor-python-sdk
Is requests on that index?
and getting this
pip install -i https://test.pypi.org/simple/ walacor-python-sdk
Looking in indexes: https://test.pypi.org/simple/
Collecting walacor-python-sdk
Using cached https://test-files.pythonhosted.org/packages/29/bd/ba5168b8960a5eb99d00a6d6c8f14cd693f7b9b7e9d0ed3e78f179e1134a/walacor_python_sdk-0.0.2rc1-py3-none-any.whl.metadata (3.5 kB)
INFO: pip is looking at multiple versions of walacor-python-sdk to determine which version is compatible with other requirements. This could take a while.
Using cached https://test-files.pythonhosted.org/packages/2d/30/81c20ac4d4fa4779120263c39a58cbf9030f0b023da2fda64dffad973409/walacor_python_sdk-0.0.1rc1-py3-none-any.whl.metadata (3.5 kB)
ERROR: Cannot install walacor-python-sdk==0.0.1rc1 and walacor-python-sdk==0.0.2rc1 because these package versions have conflicting dependencies.
The conflict is caused by:
walacor-python-sdk 0.0.2rc1 depends on requests>=2.20
walacor-python-sdk 0.0.1rc1 depends on requests>=2.20
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
not really,
I see i need to install requests first,
but one more thing why don't i get the latest version?!
Pip tries the latest version and checks if it satisfied the dependencies, then it'll keep trying older and older versions looking one it can satisfy the dependencies for, but yeah on that index requests exists but with weird version numbers: https://test.pypi.org/project/requests/#history
You can do two things, pass --no-deps to not install the dependencies, or pass in the test index with --extra-index-url, instead of --index-url (though check docs I'm writing this from memory right now so I might have made a typo)
thanks, so the problem is in my .toml i suppose?
I don't think so, just your dependencies aren't available on the test index, that isn't unusual
They will be available on the regular PyPI index
ohh i didn't still published on PyPI yet!!
why does pip take >10s to start collecting distributions on Windows?!
If you find out, I'd be very interested in improving that. π
surely pip startup isn't that bad on Windows π
my god π
It's taking one whole second to import _hashlib.This seems rather hopeless...
From the surface look, it doesn't seem like there's anyway obvious to improve the situation. The initial import is just god awfully slow on Windows GHA (probably due to the .pyc writing).
Set PYTHONCACHEPREFIX to the D drive? Or set PYTHONDONTWRITEBYTECODE?
I just tried the former, didn't help.
The latter may be helping a little bit? Although it doesn't seem conclusive.
Β―β \β _β (β γβ )β _β /β Β― is usually the conclusion I come to looking at Windows performance issues
FWIW, I have found that importing stuff contributes quite a bit to Pip runtime on simple tasks even on a normally installed local copy on Linux. Part of the problem surely is the sheer number of modules involved, and that may in part be down to how dependencies like rich conduct themselves
For example, installing Pip from Pip in another environment (via --python) takes something like 13% longer than having Pip update itself, purely because of the time taken to import as far as the main function the first time and then do the subprocess logic (and that, in turn, is more than 80% importing).
(When Pip installs itself from a wheel via ensurepip, this overhead time goes up dramatically while the actual time spent on copying files and compiling .pycs hardly changes)
Can someone point me to the pip issues where people are asking for arbitrary header overrides for auth tokens or whatever?
I can't find the magic search terms
lol Google > GitHub
Hehehe
Feedback is wanted on my proposal to rework the UI for automatic resuming/restarting of incomplete downloads: https://github.com/pypa/pip/pull/12991#issuecomment-2770771387
(note: if this lands in pip 25.1 as hoped, this will be opt-in for now, requiring the --use-feature=retry-downloads flag).
UI is hard, so I'd prefer deferring the UI work that aren't necessary for experimentation until it's turned on by default.
Is someone on uv asking for the same thing?
A while ago https://github.com/astral-sh/uv/issues/1369
We're building out some support internally because we have a use-case https://github.com/astral-sh/uv/pull/12610
Y'all are building index software as well?
Or, is this for interacting with some existing index?
(please feel welcome to say that you don't wanna discuss that π )
We're prototyping some things!
There's a larger push to improve the authentication experience though
The authentication story is pretty complicated right now, i.e., basic auth via dynamic passwords from the keyring python package π
I think there are several other refs, but I was struggling to find them as well.
People have asked for Bearer token-based authentication in a few contexts
Yeah, we're pretty aware of this. Unfortunately, there isn't an easy solution, short of "refactoring" pip by consolidating modules ... and asking our dependencies to do the same.
That's not going to happen, for obvious reasons.
I'm so busy with life, but I will get the resumable PR merged soon. I just want to look over it one more time, actually write a decent follow-up, and approve it.
@finite perch I believe we have some funds with the PSF linked to pip development and it would be reasonable to cover the costs of PyCon US registrations with that (for pip maintainers, anyway).
If the conference registration costs are the prohibitive piece, that'd be something that we could ask PSF accounting folks to help with. Lemme know and I'll reach out to them to figure out the details.
Thanks but no, I don't want any special funding. It's not prohibitive for me per se, it just goes from a cheap one night getaway without a PyCon ticket, to something pricey and I can not impulse book with a PyCon ticket.
Have there been any plans to switch pip to using flit as the build system (to shorten the bootstrap path)?
No plans, I would be supportive of moving to flit for simplicity reasons, not sure what other maintainers think
I'd like to think that it would probably be a good thing (setuptools is so unstable lately), but the problem is that pip is at the root of the packaging ecosystem. I'm fairly certain downstreams would break if we switched backends.
Assuming flit is also easy to bootstrap, it's probably tenable, but it'd require someone to reach out to our major redistributors and make sure we don't break a bunch of builds all at once.
flit has been around for over 10 years now, I assume most redistributors will have encountered it vendored it?
Flit is easier to bootstrap than setuptools, since it includes a self-install script (IIRC).
There's a solid chance pip has been grandfathered/is treated specifically due to its unique role in the ecosystem. I agree that redistributors should be able to migrate without too much fuss, but we'd want to go about it in a friendly way.
I wouldn't want to antagonize our redistributors.
I'm pretty sure there was a discussion on bootstrapping the root of the Python ecosystem on DPO
https://discuss.python.org/t/pep-517-backend-bootstrapping/789/16 this is for backends, and it's over 250+ posts long.
(I probably read this at some point, I don't remember anything).
This is something that we should discuss on the issue tracker, since we can have redistributors chime in if necessary.
- it ends up being a public discussion that does not vanish with time. π
Fair! I'm trying to find pre-existing discussion first. 100% this needs an issue on the tracker if the proposal is serious.
@fallen scroll feel free to open an issue on the tracker if you'd like.
We're rapidly approaching a release, there is no way we're going to make this change anytime soon :P
Is there a reason we're not using the cache action in our CI?
I told myself that I'm done with micro-optimizing pip's performance, but I got nerd-sniped while looking at the performance improvements by dropping pkg_resources: https://github.com/pypa/pip/pull/13323
Fun!
Hmm... I dunno how much I wanna go down the rabbit hole of optimising packaging.
Fair enough! I won't in that case :)
Damian, while I have your attention, I am waiting for people to say "okay, sounds good" to https://github.com/pypa/pip/pull/12991. I realize this PR essentially became my PR, but I don't want to merge this unless people know what's going on and don't have any objections at the very least.
To cache what? pip installs?
Sans IO, packaging is the primary the cause of how long any specific long resolution takes, from the perspective of trying to optimize resolution it is kind of painful not to be able to change anything about it
Yeah, installing nox and the environment with test dependencies that nox creates
Probably because no one has tried? It's unlikely to bring a speed improvement, at least on Linux/macOS where pip installs are fast.
I'm sure the test suite can be made faster by eliminating redundant work/aggressively using local or cached data to avoid hitting the network as much as possible, but that's not trivial.
(We've had these discussions before. it's really a matter of someone putting in the work/time to take a look at the test suite. I want to at some point, but it's fairly low on my list of priorities.)
Yeah, makes sense, same here, generally I would rather work on actual pip performance or UX improvements, not test improvements
I'd really like to repick up my network diagnostic errors and parallel compilation PRs (although it seems more likely that we'll flip --no-compile to be the default, instead).
I'm on the fence regarding --no-compile, mostly I just wish we could make users more aware of it
I suspect though a majority of users are unaware that Python "compiles" π
I'll try to simplify my PR, but there isn't much room to simplify it unless we degrade the experience in certain situations.
There's also the problem of lazy imports being a security risk. Not that pip install is secure, but this is the sort of stuff we've received security reports over, so it's probably best to avoid adding more holes.
Yeah, the parallel stuff is tricky, I've not spent much time looking over the PR, but I did follow the conversation, I do think some of the issues would need to be addressed or documented, the lazy import issues, but also the non-reproducible compilation
@finite perch I'm looking forward to hear what people think of the install progress bar. So simple, but it makes for such a meaningful improvement in the pip install QoL.
For long installs like that one it makes a huge improvement on the UX π
We finally got our first issue related to setuptools+wheel. https://github.com/pypa/pip/issues/13327
(I'd be interested to know what's in the requirements.txt mentioned in the screencap, as a performance test...)
... why would that setuptools+wheel issue occur? Didn't they get installed into the same (build) environment? (I see OP solved the problem by updating to a Setuptools version that incorporates the wheel logic. Might be worth making it easier to learn the minimum Setuptools version for that?)
(That seems to be 70.1.0, but I had to find that out by searching through the changelog...)
... wow. I've never seen anything like it
but winpython is an entire suite of tools for working with the Windows API, or something like that, right?
I think it's a distribution like Anaconda, basically pre installed all the utility packages you are likely to need
Here's an even bigger real world requirements file: https://github.com/home-assistant/core/blob/dev/requirements_all.txt
I have concerns
i've seen worse, only cuz i wrote it
I'm curious to how slow pip takes (and how much faster uv is)
Say what now?
Have you seen home assistant? They have AAAAALL the integrations as part of the base package. It dynamically downloads deps for integrations that you use, if you're using it as a regular user fortunately π
Yea, they had a multi hour resolve IIRC and the switch to uv sped things up a lot for them.
My understanding is the multi hour resolves were for an older version of pip and only for some very low power vms (QEMU or something?)
As of a few months ago I think the performance difference between the latest version of pip and uv was closer to 1 hour vs 10 mins.
We found, and fixed, a lot of bottle necks against that scenario, but the issue author never came back and confirmed if it improved what they saw. And we weren't able to reproduce exactly what they were seeing.
Yea, that's true!
pip 25.1 is shaping up to be a large release. It's going to be a struggle to find time to write my next pip release blog post π
should we be excited? π
My favorite thing is the install progress bar.
I got a lot of resolver fixes in, but most people will never come across those.
Other major changes:
- Support for
pip install --group (group)(:path/to/pyproject.toml)? - Support for resuming/restarting incomplete or interrupted downloads
- Python 3.8 support was removed
Did the group PR merge yet?
Mhm, a while ago!
Oh, I missed that, is a big release
I wanted to look at it, but it already went through an extensive round of discussion, so Β―_(γ)_/Β―
The lock file export PR might merge before 25.1 also
This is so long...
We should fix the changelog entry for PEP 735 and change the PEP reference to a link. π
I think I did that right 
@shy echo what's your sentiment on https://github.com/pypa/pip/pull/13168? I'm looking to either reject or merge it at some point. If you don't like the idea of special-casing our man pages, I'm fine with rejecting the PR. Otherwise, it is a fairly well-contained change that would make our redistributors' lives easier.
This is meant to be a yes/no question. I don't need a detailed review. A simple "+1" or "-1" would suffice.
Thanks for taking a look!
news/13247.feature.rst | 4 +
src/pip/_internal/cli/cmdoptions.py | 34 ++++++
src/pip/_internal/commands/install.py | 3 +
src/pip/_internal/operations/install/wheel.py | 38 ++-----
src/pip/_internal/req/__init__.py | 31 ++++-
src/pip/_internal/req/req_install.py | 5 +-
src/pip/_internal/utils/pyc_compile.py | 142 +++++++++++++++++++++++
tests/unit/test_utils_pyc_compile.py | 157 ++++++++++++++++++++++++++
8 files changed, 384 insertions(+), 30 deletions(-)
I got the parallel compile PR to be a bit smaller (down from +439/-31). I'll need to add some more comments, but otherwise, this is pretty much the most I can do without sacrificing QoL features.
(oh, the #off-topic discussion is related to this, then. and it's about building multiple wheels at a time? sorry for the confusion)
(in that case, I'd be a little worried about what happens when the build systems decide they can do their own multiprocessing. but probably that's just a performance issue if it ends up trying to use more processes than there are available cores...?)
parallel pyc compilation
That has the overhead problems I was discussing earlier. Also, I'd have to refactor installation to do all of the .pyc compilation at the end (for performance reasons), instead at the end of each package.
I don't want to mess with the installation order more than I need to.
ahh.
Regardless, I'm glad you have something working. 25.1 seems like it's going to be a pretty big deal.
This isn't merged. I'm trying to simplify the PR so it could be maybe merged: https://github.com/pypa/pip/pull/13247
Although I'm still not optimistic.
mm.
Does the installation order still matter nowadays?
I dunno, but I'm sure someone is relying on it.
... wait, it still incurs all that import overhead even though compileall is using concurrent.futures.ProcessPoolExecutor rather than multiprocessing.Pool?
ProcessPoolExecutor is a smarter wrapper over multiprocessing.
oh, I guess it handles the assignment of jobs to the processes, but it still has to spawn those processes.
mm. I take it the CompileResults are future-proofing? Or just to make it more testable?
If two packages are installed into the same location it will certainly matter
Compilation order matters as well in terms of build reproducibility, I don't think it's possible to compile in parallel and guarantee byte reproducibility of the pyc files without doing something like multiple passes, but that's an edge case that's going to have to be communicated
You mean the same file in different packages? Do we support that?
Yes, and depends on what you by support: https://github.com/pypa/pip/issues/4625, I'm sure packages are relying on this behaviour
Feel free to merge https://github.com/pypa/pip/pull/13251 when you deem it ready. I have no more comments.
but if compilation is delayed until after all the .py files are in place it would be okay, yes?
(I'm not sure what kind of reproducible build you have in mind where the .pyc files are part of the result... ?)
Docker images
I wasn't talking about the order of installs and the order of compilation separately, I wasn't thinking about the order dependency between them
My line in the sand is in the order in which packages are installed (and as an extension, the order of the files of a given package, which is presumably based on the zip entry ordering). pip generates the .pyc files so I'm not that worried about their reproducibility.
Well, it's been an issue for uv https://github.com/astral-sh/uv/issues/10619
I've seen that issue (I asked the uv folks if they had ran into any issues with parallelization), I'm still not convinced it's a huge deal.
Unless this horribly breaks some docker caching? (in which case, π)
I feel like, as long as .pyc creation works the way it does (with marshalling and all that), people who need .pyc pre-compilation and need byte-for-byte reproducible .pyc results (not just logically equivalent bytecode) might have to be stuck with serial precompilation.
Yes, it would be up to CPython rearchitect the way compiling works, nothing pip can do about it
Yeah. That would also help uv.
It doesn't break simple cases of docker caching, because the layer isn't rerun, but there are more nuanced issues it can cause, such as rebuilding an image on a host without cache, and then layers that would otherwise be seen as equivalent are seen as different and many more layers need to be pushed or pulled
fortunately(?) pip install is already reportedly not deterministic so (for now) that's not a problem major concern?
I don't use docker much.
news/13247.feature.rst | 4 +
src/pip/_internal/cli/cmdoptions.py | 34 ++++++
src/pip/_internal/commands/install.py | 3 +
src/pip/_internal/operations/install/wheel.py | 38 ++-----
src/pip/_internal/req/__init__.py | 30 ++++-
src/pip/_internal/req/req_install.py | 5 +-
src/pip/_internal/utils/parallel.py | 17 +++
src/pip/_internal/utils/pyc_compile.py | 103 +++++++++++++++++
tests/unit/test_utils_pyc_compile.py | 153 ++++++++++++++++++++++++++
9 files changed, 357 insertions(+), 30 deletions(-)
Anyway, I made some more progress on simplifying the patch. This does rely on the distlib PR I filed earlier today.
In my experience most docker images are not well optimized to take advantage of layer caching anyway, it's only really dedicated users who would notice, and they are often willing to put several workarounds in to get layer caching optimized
Right. That makes sense.
I do want to get the compilation question decided before the end of the year. Either we flip the default to --no-compile or parallelize .pyc compilation (or both?)
This isβby a large marginβthe biggest contributor to pip installs being slow (ignoring complex resolves).
I thought the question had been answered, that's why I closed the issue, it seems to me it's unlikely to be defaulted to off just due to the inertia of pip's defaults
I'll restart the discussion after the 25.1 release.
FWIW, I also chose to parallelize .pyc compilation to feel the waters with introducing parallelization to pip. It'd be nice to add more parallelization to pip, .pyc compilation is definitely one of the lower risk applications.
I'm seeing how much I further trim our vendored dependencies, if I remove all if __name__ == "__main__" blocks from rich, then I get 45 files changed, 1422 deletions(-)
I'm also fairly certain we don't use rich's emoji support, thus we could trim the giant 3600 LOC emoji mapping file.
I don't think it's worth a lot to be tree trimming our dependencies?
No it's not in most cases. It's only rich that I'm looking at.
I have no plans to turn these into PRs, mostly just doing this out of idle curiosity.
Especially since I'm pretty sure we don't pay much cost for something like that (if __name__ == "__main__" ). I won't say no to measuring it but if it's in not even in >1ms, then I kinda don't think it matters much.
The main impact would be probably be distribution size and install (pyc compile) and import time, but _vendor totals at around 95000 LOC so a 1000 LOC reduction is not meaningful.
I do want to get rid of typing_extensions.py though. We depend on it (at runtime) via rich but they don't really need it either.
Totally ignorant question, just asking out of curiosity, what what it take to stop vendoring the dependencies entirely?
Probably won't happen until somehow all the reasons outlined in the following are resolved: https://pip.pypa.io/en/stable/development/vendoring-policy/#rationale
oh there's docs
cool, thanks
Isn't setuptools trying to unvendor their dependencies? AFAIK, there have been some pain.
setuptools can do that because it's no longer installed by default in newer virtual environments.
Ah right.
I do want to measure this now, though π
I should do more productive things for the release before getting nerd-sniped again.
AFAIK, the pain has been older Python versions and redistributors.
Yeah, it doesn't seem like setuptools could devendor itself until it stops being included in ensurepip.
(it could devendor on 3.14 and above, but people don't like tying packaging tooling behaviours to the language version, even though I really like that mechanism for rolling out changes with a well defined time horizon)
Aside from Rich, I'd suspect Requests is a relative hotspot
(I think I'd prefer the world where some/most of its ideas had made its way into the standard library instead; but.)
You would be wrong, it only makes up 4% of the code footprint. urllib3, the library under it is larger at 9%.
I can say that 25.0 takes something like 20% less time to bootstrap via ensurepip than 24.0 does, in my testing, which seems to be almost entirely due to smaller size (I think there had been something devendored which was only left in because it hadn't been fully replaced yet?)
ah, I tend to think of requests as including its dependencies, but that's not really fair
If I apply all of the trimming I've experimented with, I can reduce the total code footprint of pip by ~10% or 600 KB.
(is that Baobab? I hadn't actually thought of using it this way, neat. I tend to use ncdu at the command line for this kind of analysis)
Baobab, mhmm. I <3 the tool.
I'll consider submitting a PR to trim distlib, but I'll do that during the 25.2 development cycle.
(switching to Linux saved me probably about 65% of installation time. But I hear there's a "dev drive" feature on Windows now that mitigates a lot of overhead. I'm guessing it's mostly the fault of Windows Defender.)
We only use distlib for console scripts, but it's a full-fledged distribution metadata and discovery library. We re-implement or use other dependencies for those pieces of functionality so those are simply dead-weight.
Trimming distlib to the bare minimum resulted in a -6000 LOC decrease (~200 KB).
yeah I've also noticed that those stub Windows executables for installers can be found in a few different packaging-related packages
The actual heavy part of distlib are all of the Windows stub executables, but those have to stay.
as long as they aren't duplicated anywhere (I don't think they are currently, but I've thought about repackaging them separately)
We dropped 5 vendored libraries since pip 24.0 IIRC. I filed PRs to remove four of them.
I'm planning to kill typing-extensions. After that, in a few years we can remove pkg_resources and tomli.
I forgot about this tool, honestly.
ncdu? I think I picked up the hint from the Mint forums. Absolute lifesaver.
I either use baobab or plain du.
mm. I just got so frustrated paging through du results, though
Is the bootstrap time significant? If so, that's reason to consider trimming our heavy dependencies further.
on my machine, just now for reference (I also tested upgrading with a pre-downloaded wheel to avoid network latency):
$ time python -m venv with-pip
real 0m3.378s
user 0m3.127s
sys 0m0.206s
$ time with-pip/bin/pip install --upgrade pip-25.0.1-py3-none-any.whl
Processing ./pip-25.0.1-py3-none-any.whl
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 24.0
Uninstalling pip-24.0:
Successfully uninstalled pip-24.0
Successfully installed pip-25.0.1
real 0m1.518s
user 0m1.330s
sys 0m0.161s
$ time python -m venv --without-pip without-pip
real 0m0.054s
user 0m0.044s
sys 0m0.011s
(I knew roughly what these numbers look like, but decided to bake some fresh for you)
cross-environment installation with --python is slightly slower than this --upgrade, but you knew that
I'm not sure if those numbers are as comparable as you think, but yeah, that does seem a bit faster.
I'm not sure exactly what you mean by "bootstrap time". ensurepip doesn't seem to add significant overhead, but it does have to import tons of Pip modules via zipimport
(indirectly)
In other news, I'm on the dependency groups feature for my pip 25.1 blog post. Part of me doesn't want to write it because I haven't kept up with PEP 735 or the pip PR at all, but I should probably get up to speed before we start getting issues about it.
that one seems pretty straightforward? Parse a list of dependencies out of pyproject.toml, feed it to the resolver as usual?
Well, my style is that I like to dig a bit into the context/history where I can. The pip release blog posts are a mix of strictly "a more detailed changelog" and "here's something about Python packaging you didn't know"
I may forego the latter here. I haven't dug that deep so far in this post, partially because I don't feel like it but also it's not as valuable for the features I'm discussing.
ah, I appreciate that. I aim for that kind of depth on my blog, but I feel like I end up with a disorganized mess
I included a lot of extra context in my entry for pip 24.2: https://ichard26.github.io/blog/2024/08/whats-new-in-pip-24.2/
In version 24.2, pip learns to use system certificates by default, receives a handful of optimizations, and deprecates legacy (setup.py develop) editable installations.
That was fun to write, even if it took a while.
the interesting part of PEP 735 is the motivation and rationale, probably. there's not a lot to say about the specification, except to people who actually have to implement it
yeah, I remember that title.
(Honestly I've never been big on editable installs at all, but.)
all of the footnotes :D
I really need to develop a habit of pulling parentheticals out into footnotes where it makes sense.
which for my style is probably almost always. :/
I'm trying to remind myself when writing anything for pip users that I am literally one of the most deeply involved individuals out there. 99.9% of developers do not care for the technical aspect of the tool they simply use to get their job done.
They want to know what's going on and what they need to do (or not do). Some may care for why, but it's important to not forgot about the former group.
indeed
@azure heron does uv look in parent directories to resolve a dependency group if the current directory does not contain a pyproject.toml?
I have this call out in my post, but I wanted to confirm it's reasonable advice:
pip will NOT look in parent directories for a pyproject.toml file to install dependency groups from. pip does not have the concept of a project workflow. If youβd like auto-discovery of parent dependency groups, you should use uv.
^wording is subject to change. It's a bit awkward, I'm aware.
I'm honestly not sure off the top of my head. We discussed it, but I don't know what landed.
We definitely do outside the uv pip interface
If you want a review of the blog post, you could ping me. I was fairly involved in PEP 735.
dunno how much of an appetite there is for a logo, but this came in to the PyPA catch all email π§΅
yeah, dev drive is amazing
windows actually has a decent dev experience, provided you use the features meant for development
anyone awareof a timeline of pips install from source behaviour - i recall a few years back copying subdirs to /tmp was removed - but im not sure what to earch for in the changelogs
it seeems https://pip.pypa.io/en/stable/news/#v21-3 made in-tree the default
Yep, and out-of-tree build was removed in 22.1
I'm happy that this line is (finally) going down. I'd quite like to get it down to 110, although that will likely take some time.
Thanks for closing those PRs @finite perch. I'm pretty conservative when closing PRs, but realistically, I agree those had no chance of being merged.
I would also love the open PR count to significantly drop, but most PRs I take a look at and quickly realize I don't want to waste into the discussion, lol
Mhmm. The discussion makes it difficult to do anything but nothing.
I did take an idea I first used with Black and that's the up for grabs label. It's for PRs that are a good idea, but have stalled out for some reason (author is unavailable/conflicts/etc.) and need a new champion. In practice, I expect that I'll champion these PRs, but it's helpful to keep track of what PRs to focus on next: https://github.com/pypa/pip/pulls?q=is%3Aopen+is%3Apr+label%3A"state%3A+up+for+grabs+(PR)"
OK, the draft of my pip 25.1 post is ready: https://hugo-draft.floralily.dev/blog/2025/04/whats-new-in-pip-25.1/. If anyone wants to take a look and offer suggestions, I'm all ears. Otherwise, it should be good to go for tomorrow.
pip 25.1 introduces support for Dependency Groups (PEP 735), resumable downloads, and an installation progress bar. Dependency resolution has also received a raft of bugfixes and improvements.
Now I need to update/rework our deprecation issue descriptions. I'll do that after dinner, though.
A temporary progress bar has been added
I'm a bit confused by this. Does this mean that the progress bar will be removed in the future or that it shows up temporarily or something else?
very exciting feature list though!
transient is the word I'm looking for, I'm just trying to get across that it won't persist after the install is done. I can probably simply axe the word, however.
Yeah, I think it may be more confusing than helpful, I think most progress bars are transient. At least in my experience.
if it's that important, it might be better to just explicitly say that it gets erased from the terminal after installation
(but I don't think it is that important)
What's new in pip 25.1 | Richard Si
When Pip creates those INSTALLER files in an installed package's .dist-info that just say pip, is that something that importlib.metadata expects to see? Or else what exactly is the purpose?
(I know installer has specific support for writing that kind of thing)
I'm honestly not sure who else reads that file, but pip uses it so it knows whether it should or should not uninstall something in some cases.
mm. If it isn't needed by importlib then I want to take a different approach in my own project
I can't imagine importlib would care about it.
I guess I have to research that, but I also doubt it
... although now that I think about it: how is an installer supposed to upgrade something that was installed by a different installer, given it doesn't directly know everything that was installed?
That's what RECORD is for.
oh, that's supposed to get updated too, it's not just what the wheel says. Right
Yeah, the RECORD file needs to be kept in sync with the actual installation once it's "installed".
.dist-info is both distribution and installation metadata.
In the general case, it can't. INSTALLER was added to keep pip (and other Python-only tools) from breaking packages installed by conda/dnf/apt/etc, where there's extra non-Python-specific installation metadata that gets confused if pip goes in and starts making changes.
Ah, I was thinking other Python-level tools like uv, poetry install, etc.
I thought the purpose of INSTALLER was so when there's no RECORD, pip doesn't just fail to uninstall a package, but can direct the user to the original installation tool.
(Presumably if there is a RECORD, pip can still uninstall the package even if it didn't originally install it.)
That's exactly how pip uses it.
$ pip uninstall six
Found existing installation: six 1.17.0-elementaryOS-1
error: uninstall-no-record-file
Γ Cannot uninstall six 1.17.0-elementaryOS-1
β°β> The package's contents are unknown: no RECORD file was found for six.
hint: The package was installed by myjankyinstaller. You should check if it can uninstall the package.
ah, so it's a convention that Conda etc. agree with?
I probably should write it, then.
I dunno what conda does Β―_(γ)_/Β―
(Pip shouldn't have any trouble uninstalling things I install, but still. Maybe Conda would want to know.)
Conda is effectively a meta packager, they don't make Python packages, they wrap Python (or other) packages. They should basically follow best practices of the underlying package system they're wrapping.
I realize that it's still a weekend, but I'm surprised we haven't received any pip 25.1 regression reports (yet).
Perhaps, they'll all come on Monday when everyone's back in the office.
Thanks @finite perch for the reddit post! It's making the rounds. π°
if you want quick feedback, what you need to do is quiet-ish-ly remove support for something that was deprecated approximately 4 years ago. ;)
any opinion on having pip print a warning when people use --no-binary=:all: as not being supported due to bootstrapping constraints for build backend and/or allowing binaries for build env construction
I would say it should fail hard, because in many environments (like CI) people don't read warnings, will miss it and assume that all their dependencies are installed
I don't understand the scenario you're describing, no-binary isn't passed to the isolated build environment from the CLI
... then how do we get into the bootstrapping issues?
no-binary can be passed to the isolated build environment via env variables or config, you you could use no build isolation, or you could set up an index that has no wheels, but I'm not sure what is meant here
@hidden flame good feedback about your blog post and the pip release here π https://www.youtube.com/watch?v=BGhDge-iUTw
Join us to be part of the live recording and have your comments and questions featured on air.
Thanks, I needed that. It's been an awful past few days so it's good to have some good news π
I tried to write a simple PR to improve setuptools' get_requires_for_build_wheel|sdist hooks, now I'm stuck with crashes I don't understand. https://github.com/pypa/setuptools/pull/4973
I am glad that I don't work on setuptools. This is hell.
yeah, that's a big part of my motivation for bbbb
(although Hatchling is probably more than good enough already)
specifically I mean the layers of workarounds for the core setup.py-based design. (I still use arbitrary code, but for specific narrow roles within the process, designed with the hindsight of PEP 517 already existing.)
Does anyone know how to update or where pypi is getting information for pip's maintainers list: https://pypi.org/project/pip/ ?
That should be people with PyPI perms for that project
For org it shows "Owner" instead
Thanks, I'll ping Paul on a separate communication channel
For pip 25.2, I'm likely going to going to focus on faster bytecode compilation and installing build dependencies in-process. I've been looking at the latter on and off for a while now. I need to do a deep dive into a large part of the pip codebase to determine just much logic I can reuse/trim, but it shouldn't be too bad...? The other big part of is going to be writing tests to enforce proper inheritance of options to the build deps.
I'm supportive, but dubious about doing isolated builds in-process, but I haven't dared look at all the things it would involve
Oh, in-process builds aren't going to happen. For a large variety of reasons, that is not feasible.
My goal is to simply stop pip from calling itself in a subprocess to install build dependencies. The actual PEP 517 hook invocations have to happen in a subprocess (AFAIK is mandated by the standard).
Oh, I don't think I have a good enough understanding on the flow then to answer any of your questions, certainly your comment on 9801 just makes me very confused
Honestly, I'm not really sure why pip shells out to itself to install build dependencies in the first place.
It was likely a mix of "the codebase was not designed for that" (the resolver?) and "it's easier to shell out"
As opposed to what? Using target and pointing to the isolated build environment?
Yup.
I have a POC that does that and it seemed to work fine (haven't tried a test suite run, though).
Ahhhh, I think the answer to your question is going to be, when resolvelib was adopted that shelling out just seemed simpler, to whoever authored the PR
There shouldn't be any major design issues with installing build deps in-process. Care just has to be taken to ensure caches/other state doesn't interfere.
We control the isolated environment already (and use --target to shell out), bringing that in-process should be feasible.
I have no concern, other than I was planning to get my --build-constraint PR out for 25.2, so there may be some mild overlap about choosing what flags are passed
Installing in-process may make migrating to virtualenv/venv based isolation harder (as then we aren't fully in control of the environment) but we can deal with that when we cross that bridge.
(I have no interest in doing that, and I'm the only person who realistically could be interested in such a change right now).
My plan is to implement the spec. I wrote on the issue, but defer any additional options that need more discussion. This will include --constraints (and --require-hashes, too).
Also, we should perhaps add an alias --constraints for --constraint , it's a VERY common typo
Would be consistent with uv too
The former.
It happened way before resolvelib IIRC.
Specifically the resolver or the whole wholebase?
Basically, everything.
fair enough :P
We still have the old RequirementSet object in the codebase I think, which is the only piece that knew how to build distributions as well as was tightly coupled with the "resolver".
Most of the actual logic has moved around, but I still think InstallRequirement needs to die in exchange for a better data model. π
I think as long as I take care to construct new instances of most things (resolver, preparer, etc.), installing in-process shouldn't be too bad.
I'm sure there are numerous global caches that will mess that up. Likely those caches will need to be moved to whatever <class> they logically belong to.
Gonna preemptively say that I have no appetite to refactor InstallRequirement π
i have made progress on this
recently did not get into grad school and unclear what my availability will be but hoping to follow up still
Wow, that is quite verbose.
why...
Frozen keyword-only dataclass but we have to support Python 3.9
ok, but why using the object.__setattr__? why can't a standard self.attr = value or at least setattr(self, 'name', 'value') be used?
frozen
frozen dataclasses won't let you perform any normal attribute assignment even within the __init__ (which I consider bad design, but *shrug*)
might be just my taste, but the moment you have to write such code to go over some obstacles, I would prefer to just use plain old classes... but whatever
but then they wouldn't be frozen
unless you sprinkle your own magic on top, reinventing dataclasses
ah... tbf we the amount of changes coming to dataclasses only in new versions of Python, I tend to limit the use of them to just simple struct-like containers
but is the frozen part important enough to sacrifice code readability?
I donno but the immutability is a pretty nice perk
"immutability", sadly
Part of the thing is that @jovial jasper is looking to upstream this code to packaging.
The fact that packaging.requirement.Requirement is mutable is a bad historical design wart IMO.
alternative is to make everything a property but that has a higher cost on each use rather than just on initialization
I don't have any say in the matter, but if I encountered such code, I would have a huge "WTF" moment
I'm not going to have time to review PRs for the next few days
so it's not really my problem yet :P
it's the problem of future Richard π
Tbh, it very much is a wtf because you're trying to do something that dataclasses were not (yet) designed to do and they're not really extensible in that regard
I wouldn't want anything like that long-term in my code
There's also this massive PR: https://github.com/pypa/pip/pull/13371 although it's on the other side of the spectrum, trying to make our code more readable.
although usually it would be more of a quirk for like one or two attribs that you're trying to do something special with rather than ALL attributes because you're reimplementing keyword-only __init__ manually
yeah, it's mostly automated ruff changes though
I'll still take a (passing) look at the changes. The tools aren't infallible. For example, pyupgrade does actually generate bad code when I last ran it on pip.
I think ruff is more robust at this point, it gets a lot of battle testing and feedback, there are obviously edge cases though
Definitely let us know if there are things that are wrong π
Well I would love ruff formatting to fix respecting the semantics of # type: ignore, and be able to accept # fmt: off in any part of an expression, and then pip could look at adopting ruff format, but I know they're particularly tricky problems
Yeah those are so hard haha
Yeah just for a few months until we drop 3.9 support. What matters is a clean API.
@lunar gyro I have a question for the resolvelib resolver, what's the point of the candidate cache? To avoid preparing the same candidates over and over again? I'm looking at installing build dependencies in-process. I'll need to use a new resolver instance for each isolated environment, but I'm curious to what consequences there are with bypassing the cache.
Ah, https://github.com/pypa/pip/pull/9265 implies the cache is to avoid spammy output.
OK, scratch all of that. The already installed cache is simply avoid spammy output, but the other candidate caches are necessary to avoid re-preparing the same candidate multiple times.
Coolio. I'm not too worried with optimizing in-process installation as much as possible. Simply avoiding the startup penalty and being able to reuse in-memory finder/index cache would be a major improvement.
(it took a few tries to get the right set of pip arguments to prove/disprove my theories about the resolver caches)
whelp, so much for writing a refactoring PR today π
I'll deal with the merge conflict (and writing some more tests) later
Sowwy. π
Would https://github.com/pypa/pip/issues/10636 be the right issue to follow for potential future pip install -r pylock.toml (or however that ends up being spelled) support?
<@&857998631491600404> I've got an email thread started for a "pls keep confidential" thing. It's got a deadline for tomorrow, so pinging here as well. π
If it's something we (the Astral team) can help with, feel free to reach out!
No, it's a communication addressing the pip maintainers about project stuff, nothing exciting, just addressed in confidentiality
I just noticed that CPython's C API deprecations don't show up in the output when building with pip install, only in pip install --verbose:
src/_imagingft.c:145:13: warning: 'Py_FileSystemDefaultEncoding' is deprecated [-Wdeprecated-declarations]
145 | Py_FileSystemDefaultEncoding,
this is unfortunate, as it means maintainers may miss deprecations and be surprised when they're finally removed
perhaps pip could show these deprecations by default? or is there an option we can recommend (other than --verbose) that shows them?
(and uv pip install doesn't show them either, --verbose or not)
Not today, no. @stuck girder Could you add a comment on https://github.com/pypa/pyproject-hooks/issues/157 ?
will do, thanks!
... isn't that up to the actual compiler process that ends up getting selected by the build system? We're talking about when pip builds an sdist that happens to include C code using the CPython API?
IMO, as a maintainer you should be using build, which doesn't hide build output.
I'm absolutely on board with that in general. but there are a few different use cases here
- build one or more wheels for distribution
- verify what the build-from-source experience will be like for end users (because some end users explicitly want to build from source regardless!)
- just publish an sdist (perhaps out of ignorance of wheels); maintainers might only ever build a wheel implicitly because they install in a dev environment (and they might only ever build an editable wheel)
There's always going to be a gap between what we'd hope every maintainer does, and what many of them actually do.
I invested quite some time into researching this issue: https://github.com/pypa/pip/issues/13389
Itβs now βlocked as spamβ. I hope that just means that some spam bot(s) posted there and that this doesnβt hurt things getting resolved, but I wanted to ask here if thatβs correct. (cc @hidden flame)
Description pip install adds the target interpreterβs path to script shebangs both for scripts defined via entry points and for scripts in (data)/scripts (i.e. the wheel path pkg_name-1.2.2.data/sc...
yes, that's correct.
GH has deleted the spam accounts, should be fine to unlock now
Unlocked, thanks @stuck girder !
Sorry for the confusion @ashen geyser !
Considering adding --config-json to build, might be of interest: https://github.com/pypa/build/issues/900
(doesn't pip use the pyproject-hooks directly?)
Yes
Looking at my calendar, I'm unlikely to be have any substantial time for OSS until July. pip 25.2 is going to be quite the small release!
@rapid blaze would there be interest if I revived https://github.com/python/cpython/pull/8536? We just got a "I tried to use pip install in the REPL and it didn't work" issue on pip: https://github.com/pypa/pip/issues/13409#event-17858096956
Huh, I had completely forgotten about that. Yes, reviving that would be good. Reading through my comments, it seems like it shouldn't need much tidying up to avoid the side effects that were bothering me.
Coolio. When I get time, I'll submit that as my first CPython PR :)
An annoyance: if my Internet connection fails, I can get an error message like
WARNING: Connection timed out while downloading.
error: incomplete-download
Γ Download failed because not enough bytes were received ...
note: This is an issue with network connectivity, not pip.
hint: Consider using --resume-retries to enable download resumption.
But if I do that, the bytes that I did receive so far are apparently lost. I'd prefer not to have to have known in advance to use resume-retries. (Also, it would be nice if the hint clearly communicated that this option is expecting a numeric argument for the number of retries.)
I think I also saw one recently on PDO. FWIW, there's a SO question you can also refer people to: https://stackoverflow.com/questions/8548030/
We're going to default it to on in a future release
There was concern that the logic would cause some unknown issue, so we wanted to wait and see if we got bug reports
Ah.
... can pip be told to uninstall (from a system environment) only if the package is in the user install directory? Does it require --break-system-packages to uninstall user-level packages like it does for installing them?
(I'm trying to guide someone else through cleaning up an existing mess. But it might be better in this case to start by archiving the user site-packages entirely and starting fresh...)
I'm not sure, but in the past I've had to instruct users to go into their user site directory and just delete anything in there
I probably will take a similar approach in this case. (Though I do want to know what's installed first, to make good decisions in replacing it all. I'm not really sure how much of this is due to the user doing Python development vs running Python applications, in particular)
A note about the documentation: at
https://pip.pypa.io/en/stable/topics/dependency-resolution/#backtracking
the example describes:
Now, pip notices that the version of cup it has chosen is not compatible with the version of spoon it has chosen.
This seems inaccurate, and is counterintuitive if there are cases where it's accurate. It's spoon which is supposedly declaring what versions of cup are compatible with it; if the latest version is "incompatible" despite meeting the restriction cup>=1.6.0 then that needs explaining.
I can imagine:
cup==3.22.0in turn declares a mutual dependency onspoon, or ontea- (probably more common?)
cup==3.22.0requires a more recent Python version than the one in the current environment
It would be helpful to describe these and other possibilities in the documentation. Presumably the user reports from 13281 would help uncover more... ?
but also, when Pip caches a .metadata file nowadays, is PyPI directly providing those? Or are we still dependent on the range-request trick?
PyPI provides them, pip doesn't do any range-request trick, uv does though
I think you've misread the example, spoon hasn't declared a dependency on cup, tea declared a dependency on cup.
But I would happily review a PR that made this example easier to read.
last I checked the range-request code is still in Pip, though....
I did misread the example. But it's still worth highlighting reasons why the two dependencies might be mutually incompatible.
Yes, it just says this:
pip notices that the version of cup it has chosen is not compatible with the version of spoon
Couldn't certainly be more clear
I think the range-request code is hidden behind some flag? I've not spent any time looking at that feature, so I could be wrong
aha
... actually
suppose spoon==2.27.0 demanded cup<3; would Pip be able to skip the 3.x versions that tea is okay with but spoon isn't?
I don't follow the question, but if spoon 2.27.0 is pinned, and had a requirement on cup < 3 and cup is current unpinned, then when pip tries to collect cup it will include the restriction cup < 3 and not download cup 3.22, etc..
but is it temporarily "pinning" during the resolution process?
It pins a package version until it has proved that package version can not be satisfied, only then will it unpin it
(incidentally, the name tea is used on PyPI; it doesn't appear to have any dependencies)
It is locked behind --use-feature=fast-deps. It's effectively a dead feature.
Did it die because metadata files became exposed?
(I suppose the name makes sense given it was chosen before the metadata files became exposed :) )
I'm working on handling output in a nicer way for in-process build dependencies... it is way more difficult than I initially expected.
The spinner is highly coupled to the subprocess code. Rich has a nice spinner, but it behaves differently from our custom spinner. It seems like I need to write a custom spinner based on rich. Great.
This is where I am currently at (w/ vanilla rich spinners).
Looks great to me
I don't even want to think about error handling. I'm going to need to break out the root installation error handling into its own function and then make sure the error is handled and propagated upwards correctly.
Output/error handling is something that you simply get "for free" when using subprocesses. Get rid of those and you face the wrath of complexity that it usually entails π
That wasn't too bad, actually.
Barring the debug output I added, there isn't any difference to the eye.
Hmm, I could potentially massively simplify the additional error handling by using nested diagnostic errors. I have no idea how to do that yet though!
I've made some more progress. I need to do something with diagnostic pip errors, but it shouldn't be too bad (and this is the sort of thing that can be tweaked later on).
Ideally, the diagnostic error would be printed nested under and after the "failed-build-dependency-install" error but getting that to work is surprisingly tricky.
The nested subprocess error format is good enough for now.
Might have to lower the maximum number of rounds for backtracking, the new optimistic backtracker can end up doing a lot more work per round, I'll put some performance numbers together
Alrighty. PR for installing build dependencies in-process is now up: https://github.com/pypa/pip/pull/13450. Let's see if we even want to consider this. (I am pretty pessimistic at this point given how complex this turned out to be.)
I'll probably break the PR into smaller independent PRs if people approve of the idea so it's easier to review.
@finite perch I'll note that it'd be more complicated to pass build constraints to the inprocess installer because the CLI argument parsing is bypassed entirely.
Although given that constraints are implemented as a special kind of install requirements, you can likely feed them to the inprocess installer and have it inject 'em into the install requirement list. That shouldn't be that bad.
@hidden flame relooking at your PR I mixed up the Subprocess Build Environment Installer with the in process one. Does this change the behavior of the constraints in the build environment? If so I have concerns
I think it will nullify the PIP_CONSTRAINTS envvar for build dependencies.
Oh dear
When I get a chance I will review the behavior here and understand how it can be controlled
I'm fine with adding support for the PIP_CONSTRAINTS envvar although that would be hacky.
There is a reason why I want to merge this as a opt-in feature. Behavioural differences are expected.
In this case are the differences intended or just a consequence of the new architecture?
I think we need to be careful here, if we're changing behavior it should be to a design we agree on, not just a consequence of the architectural change.
Both. The intention is to make it easier to manage the configuration of the build dependency installation. The inprocess installer inherits the network and cache flags for example. The main behavioural difference due to architecture is envvars.
Hmm, I see, in that case the build constraint flag is going to overlap quite a bit π
yup
The fact that build deps inherit the envvar but not all of the flags is a design wart that I wish didn't exist, but alas, I'll have to deal with that when I stabilize the PR.
Thanks for reminding me about that!
Okay, I'm going to correct my comment on the PR, I think if we'l have to make a choice,l about constraint vs build constraint before adding this feature
Oh hmm, you want to special case recursive builds? Yeah, alright, that's going to be messy.
@hidden flame I've written a proposal on the PR, basically if there's going to be a breaking change I propose we clean up the behavior.
I think my proposal makes things simpler? No special env handling for backwards compatibility, still have the ability for separate regular and build constraints.
yup
We may need an extended migration period for the build dependency installer though (or just really good warnings).
Yeah, I think add a warning when constraint usage is detected in a build environment
I had some fairly tortured designs to keep backwards compatibility and introduce build constraints, so I will be happy to go down this route, if no one objects
Everything is a hack it feels like.
I would be thrilled to simplify things if we are able to π
I never replied to this. Sounds good. I'll keep an eye on my inbox.
My instinct is that we should pick up the constraints and pass those down, with a deprecation of the behaviour and a dedicated build constraints env var.
I'm not a fan of this because there isn't an easy solution to "I want dependency constraints but not build constraints"
Well, that gets solved at the end of the deprecation period, no?
Not if we're passing constraints down? I guess I don't understand your message
Lemme try again...
Now: PIP_CONSTRAINTS affects build environments.
Next: PIP_CONSTRAINTS affects build environments, unless [some other option] is specified. If [some other option] is not specified, a deprecation warning is printed.
Later: PIP_CONSTRAINTS only affects things outside the build env, build env is affected by [some other option].
Separately, a more conservative option is to leave PIP_CONSTRAINTS as-is and to add two new options, one that affects only outside build env, another that only affects inside build env.
Oh I see, I didn't realize you meant three phases
Yea, that deprecation comes with a change in behaviour.
It might not be the best option because we know people rely on this as a workaround today.
My idea was that if --use-feature inprocess-build-deps is started as opt in and backwards incompatible anyway, then --build-constraint can be only for --use-feature inprocess-build-deps and no need to do a migration process
FWIW, if a new option for build constraints is added, it ought to be able to specify them per-package.
I've written a table here for the new behaviour I propose: https://github.com/pypa/pip/pull/13450#issuecomment-3014429824
But I wrote a table here to keep backwards compatibility: https://github.com/pypa/pip/issues/13300#issuecomment-2787887526
That would be a seperate proposal and not something I would have time to implement
Annnnnd I'm back from β±οΈ. I'll catch up on my GH inbox over the next few days.
That wasn't that bad :)
Well the thing is that I'd want inprocess-build-deps to be the default and sole option at some point. Getting to that point may take a while, but at some point, there will be a hard cut-off.
I see no issue with that
Hello maintainers! I'm wondering if it would be feasible in your mind to use the JSON API during resolution for certain future features when users have opted in.
The PyPI specific JSON API or the simple index JSON API?
the latter i.e. https://peps.python.org/pep-0752/#repository-metadata
We already use the simple index JSON API based on the http content type header responses
I believe it is prioritized
Implemented per the spec: https://peps.python.org/pep-0691/#content-types
I will note the fact that features only exist in the JSON API has caused these features to not be adopted by internal indexes that only serve static pages
And the JSON decoding causes pip to use significantly more memory than the HTML API, for some CPython reason I've never been able to figure out
But they're both side rants
For context, I'm addressing the final feedback from PyPI maintainers about PEP 752 (namespaces) and there was a question why I didn't support the Simple API and only changed the JSON API. My rationale is that the Simple API is really only for artifact metadata rather than project-level metadata. Based on feedback release-level namespace metadata has been explicitly rejected since the first public review because it would cause confusion. The potential suggestion in this new document is that it could be added as an attribute on the base /simple/ project page or applied to every single artifact.
Personally, I'm okay with less adoption by internal indices if it means that I don't have to pigeonhole project-level metadata into every artifact.
So I'm trying to gauge feedback on what you want and also from the UV folks.
Hmmm, that is an interesting question
My gut reaction is it's unlikely pip will ever read this metadata
A potential security related feature I document is:
Installers could support enabling a security policy that would only allow packages that match a specific set of namespaces and whose owner has an active grant for the namespace.
If pip ever does read it I think we only currently read distribution level metadata, not project level metadata
But I don't think it would be hard to read project level metadata so, Β―β \β _β (β γβ )β _β /β Β―
Something I'm concerned about is bandwidth suddenly ballooning for no reason if we attach namespace metadata to every artifact. Also, I know a bunch of implementations of private indices that serve static pages which would require namespace changes to cause a mass update and that would be supremely undesirable.
If it's just status information on the API like yanking it should be fine, if it's in the distribution metadata that would be a problem
If private indexes are set up to clone PyPI and they are probably configured they treat files, such as sdists, wheels, and distribution metadata as static, and index data such as distributions available for a given version and yanking as having some ttl cache that they should check in on
If the idea is the namespace should be on the distribution metadata then it should only be included in new uploads
But I've not read the PEP, I'll try and make time and read it later
Part of me wants to try prototyping installation from lockfiles. That is very much a "I am taking on way too many projects" thing. Probably best to do that after the release. That's the main priority :P
it can't be that hard to translate a lockfile into a set of install requirements -- a naive richard says to himself
Anyway, over the next week, I'm going to review and get as many PRs across the finish line as I can (as a solo reviewer). I've also been more aggressive in pruning PRs that have no hope of being merged.
Thanks for the hard work!
pip has so many type hint errors that were being hidden away by how it vendors requests, fun times
I'm fixing up how mypy is called and 90% of the work is on figuring out requests
I'm also having fun times with β¨ windows β¨
I figured it out in the end. Only took a few days and lots of confusion.
Do you want a second pair of eyes? I run Windows locally, I'll be available to do some more work Sunday evening
def get_configuration_files() -> dict[Kind, list[str]]:
"""Wrapper over pip._internal.configuration.get_configuration_files()."""
if WINDOWS:
# The user configuration directory is updated in the isolate fixture using the
# APPDATA environment variable. This will only take effect in new subprocesses,
# however. To ensure that get_configuration_files() never returns stale data,
# call it in a subprocess on Windows.
code = (
"from pip._internal.configuration import get_configuration_files; "
"print(get_configuration_files())"
)
proc = subprocess.run(
[sys.executable, "-c", code], capture_output=True, encoding="utf-8"
)
return ast.literal_eval(proc.stdout)
else:
return _get_config_files()
π
My personal style is to swap the branches and lose the else, but yeah, looks fun
- wheel_file = data.packages.joinpath("simple.dist-0.1-py2.py3-none-any.whl")
- script.pip("install", "--no-index", wheel_file)
+ script.pip_install_local(data("simple.dist-0.1-py2.py3-none-any.whl"))
I'm experimenting with a simpler, more declarative way of running pip install --no-index $test-data.
Drive by comment as I'm making dinner: I'd make that a data.something(...) call, because doing it directly on data feels be bit magical.
I didn't do that as I couldn't think of a good name. data.find()?
- wheel_file = data.packages.joinpath("simple.dist-0.1-py2.py3-none-any.whl")
- script.pip("install", "--no-index", wheel_file)
+ script.pip_install_local(data.find("simple.dist-0.1-py2.py3-none-any.whl"))
This seems good. Funnily enough, my original idea was to resolve the resource path in pip_install_local but that felt even more magical (and then I realized that wasn't going to fit the API very well anyway) π
The main goal here though is to make writing tests that exclusively use the local test data easier because it's currently too easy to write a test that accesses PyPI (etc.) and then end up with a slower, more flaky test suite :(
the only way I know of to forbid pip from trying to access an index server is to give paths to actual build artifact files. and even then I'm not sure what it will do about dependency resolution
... wait, have I just been overlooking --no-index?
I think so, --no-index will prevent pip from checking indexes . This is not the same as an offline mode, but it covers a lot of real world needs.
so it will just resolve based on the packages it can find in the --find-links, then?
(actually, what does it do with its cache in this scenario?)
Yes, it only uses find-links, the http cache for the index will be ignored, because it is just that, http cache for the index. The wheel cache will be used.
... well, that does explain why things only end up in the wheel cache when they're built locally, I suppose.
This is one big PR π https://github.com/pypa/pip/pull/13476
Gonna punt reviewing it for later since I have some other user-facing changes I'd like to land for 25.2.
I can't reasonably break it up though, because I was finding that if I only partially fix type hints I need to write wrong type hints to make mypy happy π
It shouldn't be hard to review. I imagine it consists mostly of mechanical changes. It's just a matter of finding the time to sit down and review the changes
Don't worry about it!
Yeah, the majority of the line changes are changing where requests is imported
I came to ask about whether there would be somebody willing to review the logical change in https://github.com/pypa/pip/pull/13482 which removes the "preparation" part of resolve to avoid always downloading distributions... I may be barking completely up the wrong tree. I totally understand the "review load" challenge of the project - the intention is not to add any pressure here, just curiosity whether this is a viable change to the resolve stage really...
Since I was passing by, I hope you don't mind but I did a drive-by review of the type-checking PR.
It's on my queue to review if no one else looks at it, but realistically I might not have chance till September or later
@finite perch thank you for leaving an initial reply on every new PR! It's such a small thing, but I'm sure it does wonders for setting expectations and avoiding frustration for PR authors. It's something I used to try to do, but I've never been very consistent about it.
I was chatting to Jarek (from the airflow project) about this at PyCon and it inspired me to try and at least set expectations.
It's very disheartening to take time to write some code and hear nothing.
In jobs I'm paid to do, I always try to confirm we've received a communication to someone who contacts me or my team. But yeah, don't have the same resources for pip, so instead try and do it for PRs at least.
Sorry, I've been at work all today (worked a double shift) so I haven't been able to respond promptly.
I'm aware that I broke CI π
I'm not sure how to feel about third-party reviews, on the one hand, it's nice to get an outside perspective, especially in domains where we lack expertise, but OTOH, there is a lot of implied/passive context that they simply won't have/know. Eh, I don't have a very defensible take on this. I'm just tired.
I think it's important as maintainers that we set a clear expectation that while third party reviews won't be dismissed out of hand that the weight they will be taken with will be proportional to their simplicity and the implicit trust of the user (e.g. how much they've already contributed to the pip project)
FWIW, I think it is a great thing to do π
FYI, I am on vacation for a week, I wish I had more time to spend on this path to url /url to path issue, but I just won't have any time.
Enjoy your vacation!
Getting the test suite to function on 3.14 has been a pain, urgh.