#pypi
1 messages · Page 1 of 1 (latest)
Done
Thanks!
Is there a twine channel? I just got the "please use a token email" and I'm wondering if it can support a web workflow instead
Like gh auth login https://cli.github.com/manual/gh_auth_login it prints a code then opens a browser and then you paste the code into the browser and authorize the upload
Are you thinking of it remembering the token after you login once or more so being able to login each time you upload without twine remembering your credentials as you can do until August?
I was personally wondering about the latter case because generating token each time in web UI, later copying it to twine and then deleting after upload seems more annoying than simply logging in with 2FA each time without having to the create and delete token dance
All I did was wonder about it though since I personally use single repo token in CI and have the whole workflow automated myself but it did seem interesting to me that this use case is going to be hurt in August
Ideally the web viewer would show a list of files to release and I'd just click "go"
so authorization each time then
Yeah but I'd already be logged in in the browser
Ye, that's similar to what I was thinking about
I guess if you upload often that's going to be true 😄
So like atomic releases and web auth
since the cookie would still be valid. I don't go to PyPI that often which is why I mentioned having to login
Anyhow, I don't have my stake in it, was just curious if it's similar to what I was once thinking about 😄
How long is the cookie valid for?
1 day
Ah fair
Well I'd rather do 2fa auth each time than save a __token__ on my keyring
If I used twine locally, honestly same
I tend to just upload stuff by the web UI then kidnap the next person who PRs it and make them setup automatic releases
"you broke it you bought it"
I mostly enjoy playing with CI stuff lol
GH has some cool things like environments
I like the feeling when I just click the button and all the things starts happening one by one as I watch it
when it works
I thought it might be interesting to support a token that requires having re-authenicated to 2FA within the last N amount of time
Something like this? https://github.com/pypi/warehouse/issues/11112
conceptually a similar idea, but that's strict for the UI
Oh, you're talking about the token itself for API
yea
I'm not sure if it actually makes sense to be clear, but in theory you could add what's called a "third party caveat", which is basically just "for this token to be valid, you have to get a second token from X service", then Warehouse could just have a 2FA endpoint.
That's a little different, because with that, anyone could attenuate a token, before passing it on to some(one|thing) else, to require 2FA, but might not be worthwhile over just making a web auth flow that will mint a short lived token that does the full username/pw + 2fa login.
Hmmm. I'm thinking that it sounds a little bit like oauth2's access tokens vs refresh tokens, and when needing to use a refresh token, that's when the 2fa is invoked. Haven't seen it in practice yet, but something to think about.
Still needs some thinking for use cases like CI, where a user doesn't really input a 2fa there. I've been wanting to read more about the newer GitHub to AWS delegation via IAM roles, since that's an interesting auth flow which feels more sustainable
(for CI, of course)
yea you just wouldn't add that caveat for CI
but for at least some CI providers, the answer is basically the OIDC stuff that is being worked on
that's probably the AWS thing you're thinking of too
Aside, I often see OIDC and my brain reads "Oh, I don't care"
I wish it stayed that way
OIDC itself is very ugh
but GH is using it to enable a bunch of stuff so
OIDC is the demon marriage of OAuth 2.0 and a cryptographic token standard called JWT.
https://fly.io/blog/api-tokens-a-tedious-survey/ is pretty interesting
anyways, the really interesting thing about third party caveats for pypi's token, is that you can implement them without warehouse having any knowledge of them what so ever
(the client of course has to be aware of them)
But since our API tokens are Macaroons, and Macaroons allow the holder of a macaroon to add an arbitrary additional restrictions to the token, with "third party caveats", you can do something like:
- Stand up a service called say "GithubUserInSpecificOrg" which when a user logs in, will check if they're a member of a specific organization.
- Create an API token on Warehouse, then locally you add a third party caveat that says "to validate this token, you also need a second macaroon that validates for X secret key"
- Give that attenuated token out to people.
- When people go to use that token, their client would have to take their PyPI token from (3), sign into "GithubUserISpecificOrg", get the token from that, then submit both to Warehouse
This works basically because you can store encrypted data in a macaroon, and a macaroon's trust is rooted from some secret key, so you store an encrypted caveat that just basically says "to validate this caveat, the person needs to send a second macaroon that is valid for X secret key", which the bearer won't be able to access that secret key because it's encrypted, but the person who made the token in step (2) or Warehouse itself can.
The person who made the token in step (2) gives that secret key to the GithubUserInSpecificOrg service, and whenever the person logs in, it just makes a short lived macaroon from that secret key.
I'm not sure anyone would ever actually use this capability! but it wouldn't be hard to support in Warehouse, we basically just need to accept the additional macaroons instead of just the 1
I'd use this the moment it landed
I just want to add a GitHub user or org to a pypi protect and let any authentic CI run upload ala coverage tools
Thanks for this, it's a really nice dive into a lot of the complexity. I appreciate the author's perspective on 'random tokens are underrated' which is probably true for most services that have a relatively simple go/no-go authorization policy.
It seems to me that most of the added functionality for Macarons and Biscuits is for when you want to delegate some of the authorization policies to another party, hence the term "third party caveat", I think I get it now
I wonder if there's examples of Macaroon caveats that are used outside of a tech-centric ecosystem (i.e. platforms like pypi or GitHub), in a more consumer -focused integration universe. 🤔
FYI: Looks like PyPI error rate is up: https://status.python.org/ Not sure if anyone here can take a look.
Welcome to Python Infrastructure's home for real-time and historical data on system performance.
Likely due to https://github.com/pypi/warehouse/pull/12040, should be resolved shortly
Thanks @merry valve and @unreal jewel ❤️
@unreal jewel if I were to try again for the MathJax implementation, is there something else I'd need to consider for the cache busting to not break that page load? (I'm also unclear why the page details needs the JavaScript block cache busted...)
any static files hosted by warehouse have to be included the in static manifest
that's the thing that turns a path like "foo.png" into "foo.cachebust.png"
@tribal sedge if my memory is correct, you need to make sure that the files will be picked up by manifest
er
you might be able to just add some ** in there, or just add the other vendor folders?
Set https://github.com/pypi/warehouse/blob/main/warehouse/config.py#L604 to True and you'll get any errors in development too
Thanks for the pointers, I'll give it a shot!
Curious - how much should I balance putting into the vendored approach vs the CDN approach?
I think CSP3 supports putting the resource integrity hashes for external resources in the script-src instead of putting the domain
I'm not sure what browser support looks like for that
but if browser support is there, I think all the arguments against cdn approach go away?
This was my first foray at that approach: https://github.com/pypi/warehouse/pull/12028/commits/98e1915a76847d0d406bb5897164d9ed7b1aa7a7
I think you don't include the url if you do that?
in the CSP policy
because I think just the URL allows anything from that domain
it all worked out - I might try it again without the URL and only the hashes - but the connect-src needs to be there since the js downloads other files
and I don't want to allow anything from the domain, only from the mathjax paths
well, I'll keep going with the vendored approach for another hour or so before flipping back to the CDN approach
oh yeah, this won't work - unless the internal references in the mathjax code get updated to point to the cachebust hashed resources.
I'll go back to CDN-style and try my best to trim as much as possible from the CSP.
I submitted a new PR, this one using CDN.
https://twitter.com/AdamChainz/status/1562372544535175168?t=WS73yzBhcmhLwolhBNkbCA&s=19 heads up about this
what exactly is last_serial in the json api? I can't find good documentation on what it represents and how to mimic it. Additionally, when trying to find the source for xmlrpc.client I couldn't find that either 😅
but where is the source of xmlrpc.client
it uses that as the example, but i cannot find the module
ah
Wait what? I don't think Black is its own organisation on PyPI?
it has started to be rolled out
and lukasz is involved with the PSF
so perhaps it has
Huh, I'll ask him about it then
nah, it's happening on any project that has a sole owner
Would've expected an email if they converted projects to organisations
are you sure on that? I thought there's at least one more round of testing
I think they are gathering the initial feedback IIRC
before it gets like, deployed to PyPI
humokay
all the test rounds until at least 3 were happening outside of regular PyPI
it seems something at least partially got deployed because of this message though 🤣
yeah, very confusing messaging for sure
"your organisation" ... but what organisation lol
Black technically isn't a sole owner project though, no?
well okay, I don't know if it isn't perhaps deployed to PyPI but like, disabled through env var or sth
I'm an owner of the project on PyPI alongside Łukasz
yeah, nevermind on the "sole owner" part, it's like that on all projects
I guess this is the UI for transferring projects from individual account to an org
but the messaging claims there's a pre-existing org though hahaha
if it's env var locked then someone forgot to put it behind that lock I guess
I'll shut up :)
yeah, I'm not saying it's not a valid bug report lol
I agree this is probably the UI for managing the organisational ownership of a package project.
.. and anyway the second box goes counter the first one signifying that the shadow organisation the first says exists actually does not lol 😆
Seems like they only considered how it will look if it's an org-owned project
so two problems:
a) "Cannot remove project from an organization" should only show if a project is in an organization.
b) Both of these should not be shown when AdminFlags.DISABLE_ORGANIZATIONS is set
I tried fixing this here: https://github.com/pypi/warehouse/pull/12140
ah yes, warehouse's infamous check for generated translation files has failed, time to run the docker thing for that
oh no, it doesn't support podman because it uses a docker lib that connects to the socket
or maybe it just doesn't like being ran sudoless
yeah...
that's kinda weird for podman tbh, maybe the docker compatibility layer does that
Is there a public list of prohibited / reserved names on PyPI somewhere? Presumably not, but I can't find the issue where its visibility was clarified :/
There is not.
Understood, thanks!
Can I ask for a review on an hopefully-easy-to-review PR?
Based on the idea from pypa/pip#5216 (comment).
any update on this? https://github.com/pypi/warehouse/pull/11380
for folks that develop on warehouse, is there any appetite to include pre-commit that can run some actions in isolated (non-container) environments for fast-catch things like black? I find myself running either the tests or lint checkers and sometimes missing one or the other, only or have the CI catch my error.
I can imagine adding other checks like editorconfig to help prevent annoyances from creeping in
I personally very much dislike pre-commit, and I think it's existence in a repo is a footgun for new contributors
Why so? It makes a lot of things simple
pre-commit wants you to install it as a pre-commit hook, which makes actually contributing to the repository awful (git commit becomes super slow, randomly doesn't work if you have a linting problem, etc).
Everyone I know who works on a project that uses pre-commit says "oh I never install the hooks, that would be awful, I just manually run it".
Except that requires knowing that you can do that. If you follow the setup instructions on the pre-commit page, they treat installing the hooks as a mandatory step, and manually running command as some optional thing. Which means that someone who isn't familiar with the tool, just following along the setup instructions is almost practically guaranteed a frustrating experience.
I'm very much not a fan of tools where the "golden path" has problems like that, and seemingly nobody actually uses it that way. It just ends up being a trap that ensnares unsuspecting people
Everyone I know who works on a project that uses pre-commit says "oh I never install the hooks, that would be awful, I just manually run it".
I usually install the hooks, the first run is slow, but subsequent ones should be quick: it's important not to put very slow things in there
(Actually the first time I ever encountered pre-commit I just quickly ran through the setup instructions and didn't pay attention to what it was doing, so I didn't notice it was installing git hooks, as soon as I noticed that the project I was trying to contribute to's dev setup broke my git commit using that, I got mad enough I just deleted all my local work and moved on instead of contributing the fix I had for a bug)
pre-commit wants you to install it as a pre-commit hook, which makes actually contributing to the repository awful (
git commitbecomes super slow, randomly doesn't work if you have a linting problem, etc).
if you don't want to install it locally, it's useful as a pinning mechanism for the CI, the pinning should help avoid random linting problems
we already pin our linting
okay, that's good!
one nice thing about pre-commit is it has a command for auto-updating the pins
all our dependencies are generally pinned, so it seems more useful to have tooling that works generally across all of them. Lint dependencies don't really feel special to me
FWIW, even if you remove the speed aspect of it, I also just generally think linting as part of git commit is fundamentally the wrong thing to do
it frustrates me for the same reason go erroring out on unused variables frustrates me
it assumes that the only state code can be in is a final state
and there are no interim states
personally I don't mind the "fail early" aspect of it, but totally agree it's not for everyone.
and even if using pre-commit, if you don't like that aspect, you don't need to install it locally and can let the CI deal with it
another thing I personally like: if also using pre-commit.ci, it can send updates to PRs with lint fixes
for example, a contributor comes along and they didn't run Black. pre-commit.ci runs Black and updates the PR for them
@unreal jewel thanks for sharing! I had no ida it was that polarizing - I've had an entirely different experience
I've enjoyed using pre-commit so much, and noticed that some other repos in pypa-land use it, so I thought I'd ask opinions
Yea it was one of those that I happened to trash my entire local repo out of frustration
I also am not a huge fan of having ci automatically change peoples PRs
I’ve seen that before and it ends up causing merge conflicts when I’ve added more changes locally
Which isn’t the end of the world, but frustrating
yeah, to me the ci automatically changing PRs is likely to cause confusion to users who aren't experienced with git
like, it's not great to expect the user to know what to do if they made additional changes locally after some automatic tool pushed new change to their PR
oh wow nice, I thought I was the only one that strongly disliked pre-commit
git pull --rebase is (probably?) the best option in most cases but I'm guessing it might not be the first option a new user may find when trying to solve this
and it can introduce conflicts anyway
I dislike CI autofixers, I like pre-commit 😄
But (I think) I make it clear in contributing guidelines that it's optional to install
and also tell people how to run it separately
though looking at it, the current phrasing I have is:
(optional but recommended) Install pre-commit hook which automatically ensures that you meet our style guide when you make a commit:
which does still say it's recommended
My big thing is the more stuff you cram into pre commit the less optional it is
but at least I describe later on how to check:
If you've done the optional step of installing a pre-commit hook 4.1 Setting up your development environment section,
you actually don't have to worry about anything as all of these style checks are ran automatically whenever you make a commit. However, if you chose not to, you can:
- run all hooks on currently staged (
git added) files with:pre-commit
- or run all hooks on all files with:
pre-commit run --all-files
It also tends to break editor integration from what I can tell unless you’re very careful to never configure your linters via pre commit
It is entirely optional though, you can run the checks without installing it as a hook. The tool just combines it into a single command the same as tox/nox do
I guess I might nuke that step after reading this discussion though
since I already describe how to run it separately
Sure but what’s the value add over nox or tox? Neither of those have the problem where unsuspecting users might accidentally install it as a hook
I personally like having the commit hook
so it allows me to do that
and tox adds a lot of overhead
so it's unsuitable as a hook
how so?
speed-wise
it's not going to run in less than a second
while ensuring consistent environment
🤔 in both cases a virtual env is used
it's not designed for running as a pre-commit hook
pre-commit runs on staged files
Hatch (and I've heard tox4) is as fast as you want
tox 4 is not released :)
But yeah, it's not that it wouldn't be possible to do this with enough effort. But it's not going to be as simple as me running pre-commit install after cloning a repo
like, do you even have functionality to run only on staged files
I’m glad that workflow works for you
or is it another thing I would have to implement myself
oh ye, sorry, this is #pypi channel 
It’s completely opposed to how I want tools to work tbh
But cool that it does work for you
Yeah, to me it's just that pre-commit install is an optional step
I don’t think I’ve ever had a positive experience trying to use a pre commit hook in git
No idea why those two experiences are so different
I hate anything that either slows down commit or prevents me from making less than perfect commits
So you can just do pre-commit run --all-files same as you would tox lint or hatch run lint or something else while also giving additional benefit who do want to have pre-commit hooks
I often make wip commits for instance
Sure but that requires me to think about whether my commit is perfect or not
Or I git commit
It fails
Then I get mad
And i do the just commit the damn thing flag
I may not do it enough for this to be a problem but when I do it, I'm fine with --no-verify. But it does sort of depend on how many things you put in pre-commit configuration. Formatting with black, isort-ing, spurious whitespace are the sort of things I would want to do before commiting anyway.
But I recognise it's not for everyone which is why I list pre-commit run --all-files
I also often times comment out large blocks of random code and that tends to make linters really mad
Anyhow, I actually came here to say I dislike CI autofixing as well, I kind of went on to discussing this which wasn't really my intention but ehh, it happened
I mean it’s cool tbh it’s interesting to see other perspectives
I don’t care if other people use it
I’d just prefer warehouse doesn’t because then I’ll end up having to use it 🙂
I consider it a replacement for running each auto-formatter separately with the pre-commit hook being a nice (optional) bonus
Thanks for the conversation! It was in the back of my head, not remembering that someone had already started this effort, and I’d been reviewing it all along. 🤦
Here’s the context of what the proposed implementation looks like so far https://github.com/pypi/warehouse/pull/11309
What I want (and may be coming in tox4? Or, may be possible now & I've missed it?) are tox meta-environments. Declare [env:lint] meta = flake8, mypy, interrogate and then tox -e lint runs those three environments.
Yes this isn't a thing just yet
Signal boosting: https://twitter.com/samuel_colvin/status/1575853817903869953
@carltongibson @pypi The owner of the pydantic-settings package has kindly agreed to transfer it to pydantic, but he needs to reset his security key.
He says he emailed the @pypi team on 6th of September, but hasn't got a response. So can't log in to transfer the project.
Thanks for the boost. We have a big backlog of account recovery requests and are quite aware of delays here. We're working on some things to improve this but don't have additional details right now.
Hey @merry valve I'm from Sourcegraph and we currently index ~4600 Python packages. We would like to increase that to 404k in the near future.
With that said, we want to be able to work together so we don't hammer your origin servers.
I attached our current config and was wondering if you could help provide a URL/endpoint that would be able to handle the large amount of requests we would be making.
Thanks
why not have a mirror?
Yeah seems like you'd want a bandersnatch mirror you run yourself
It's pretty hard to hammer /simple/ hard enough that we notice, that page is cached really aggressively and has limited variants, but in general limiting concurrency is a good thing.
the /pypi/*/json endpoints are OK, but you need to limit concurrency or your overall rate, especially if you're hitting version specific URLs for all of PyPI or something like that.
The XMLRPC APIs are really bad and you should do whatever you can to not use them, but if you have to use them, you should do so really slowly
thanks all! @unreal jewel this is great info, I'll keep you posted.
What Donald said! Also, please put something identifiable in your user agent so we can know who to contact if there's issues
will do @merry valve
Yea, if we can't find a way to contact you from the requests you're making, and you're causing problems we will just block your IP addresses and wait for you to contact us 😄
Shoot, I just remembered that Poetry uses the default requests User-Agent 😆
Should probably fix that sometime
from Twisted mailing list: why is there foreign language? https://pypi.org/project/Twisted/22.10.0rc1/
Hey everybody! Currently doing a release for the PyTorch project and running into some issues uploading binaries. Turns out our binaries are a bit too big and we are in need of a size increase, is there anyone who can help expedite this to unblock our release?
Would be much appreciated!
I'm not sure where this occurs but where exactly in the build/upload prcoess does a Project-url with a home-page label become the homepage (without a hypen) on pypi?
is that label special cased in build tools, or does pypi change it from home-page to homepage?
Home-page is core metadata field
So PyPI transforms it into Homepage keys in the urls dict
so where would it be provided in a pep 621 configuration, and as what label?
I think it would be named Homepage in the [project.urls] toml dict
those will be deprecated iirc like https://packaging.python.org/en/latest/specifications/core-metadata/#home-page
Hatch's setuptools migration script https://github.com/pypa/hatch/blob/c879e33400fd25ce259d7bfde00fbf7c1c2ee8bb/src/hatch/cli/new/migrate.py#L241-L249
yeah but i mean
one sec
is there any benefit to continuing to provide Homepage in [project.urls]?
yes PyPI uses that table
yeah
pyproject.toml/pep 621 only support urls with that table
but any benefit to have the url with the label Homepage?
the only reason i can think of having it is for when packages want to look at the metadata of a package and get the "main" url
it has a neat icon https://pypi.org/project/hatch/ totally up to you
ohhh that's what this is
they special case discord??? https://github.com/pypi/warehouse/blob/28623ce26e3d5adb3be614c37245156031fdf476/warehouse/templates/packaging/detail.html#L38-L39 til
thank you, ofek!
While we're talking about it: i started specifying URLs like this, because why should they have their own section?
[project]
name = '...'
urls.Homepage = '...'
urls.Source = '...'
From TOML side, it’s the same thing
Of course, that's why it's an option.
Hello 👋 Wondering if there's an up-to-date index (JSON, CSV, or otherwise) of all packages in PyPi?
I currently can’t accept a project invitation, the error is simply “something went wrong, please check the PyPi status page” - any info on that? Status says everything is operational
Is there a reason that yank permissions require an owner? A maintainer can make a bad release and can’t yank it if it’s bad. I understand delete permissions being a bit stronger maybe, but yanking is pretty safe?
This happened on ninja today. It’s busted and we likely can’t get an owner to yank till Monday.
While I am a maintainer and the person who made the release is a maintainer too
Is is normal for a PyPI release to not have any files attached to it?
https://pypi.org/project/google-oauth/1.0.1/
https://pypi.org/pypi/google-oauth/1.0.1/json
(When we look at the 1.0.0 release, then it have a 1.0.1 sdist: https://pypi.org/project/google-oauth/1.0.0/#files ... weird)
It used to be the norm before PyPI merged the release creation and file upload API
Before the change you used to need to first create an empty release and then upload files for it
Thanks @serene fern, I see: the devs might have forgotten to upload the file.
PyPI used to point to externally hosted files, this could have been one of them
https://peps.python.org/pep-0470/
this is good history of PyPI aka the Cheeseshop:
https://youtu.be/AQsZsgJ30AE?t=804
I still lament the loss of the Cheeseshop name.
please 👀 / ✅ https://github.com/pypi/warehouse/pull/11380
Hi folks, is there a place where I can find some upload stats? Such as the number of new packages submitted per hour?
@lament needle You can use the BigQuery public dataset for this: https://warehouse.pypa.io/api-reference/bigquery-datasets.html#project-metadata-table
For example, here's uploads per day for the last 30 days:
SELECT
extract(DAY from upload_time) as d,
COUNT(*) as c
FROM
`bigquery-public-data.pypi.distribution_metadata`
WHERE
DATE(upload_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
AND CURRENT_DATE()
GROUP BY d
ORDER BY d
@merry valve, thank you! I will play with it.
if i wanted to download as many metadata files (json is also good) as possible, how would i best go about that?
additionally, if i also wanted to download the metadata files for the n most popular packages (where n is maybe 10000), what would be the easiest way to do that
Depends on what you're doing. Do you just need the metadata, or do you need the actual files?
just the metadata
How do you measure 'popularity'?
We have a public BigQuery dataset with all metadata from every distribution on PyPI: https://warehouse.pypa.io/api-reference/bigquery-datasets.html
the exact metrics isn't important, "downloads last month" probably
We also have a dataset that collects download metrics, some examples of analyzing that are here: https://packaging.python.org/en/latest/guides/analyzing-pypi-package-downloads/
i want to tst code i wrote and also collect some statistics about what edge cases are actually used in the wild
i'm aware of the bigquery sets, but can i actually use them to download large batches of metadata?
What exactly do you mean by 'metadata'? Usually this refers to things like the project name, version classifiers, etc.
(i've previously used bigquery for version numbers, it's really neat that this exists!)
PEP 508 metadata
All of which you can get from a BigQuery query
so you want the Requires-Dist field from the BigQuery dataset then: https://packaging.python.org/en/latest/specifications/core-metadata/#requires-dist-multiple-use
is it expected that mirroring name, version, filename and requires_dist from the the-psf.pypi.distribution_metadata dataset takes ~1h?
i'm just using the default client.list_rows from the bigquery python client:
from pathlib import Path
from tqdm import tqdm
import json
from google.cloud import bigquery
client = bigquery.Client(project="jupyter-local-project")
table_id = "the-psf.pypi.distribution_metadata"
table = client.get_table(table_id)
selected_fields = [
field
for field in table.schema
if field.name in ["name", "version", "filename", "requires_dist"]
]
with Path("pipy_requires_dist.ndjson").open("w") as fp:
rows_iter = client.list_rows(table_id, selected_fields=selected_fields)
for row in tqdm(rows_iter):
fp.write(json.dumps(dict(row)))
fp.write("\n")
(i want to parse the marker fields, so i can't do it in sql)
the script i spends most time idling, io/network are not a bottleneck
total table size is 8,165,801 fwiw
I'm not a BigQuery expert but I'm not sure that list_rows is the best choice here, you're making a lot of round trips. Maybe try writing the result of the query out to storage and then downloading it as one file instead? https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.extract_table
i kinda wanted to avoid setting up cloud storage/job handling/etc just to fetch those rows
what's confusing me is that i'm not even making a lot of round trips, i always spend many seconds until the api send the next batch, which is then quickly downloaded and written
drive by comment: the layout shift of pypi.org because of the banner on each page is a bit annoying. I always move my mouse into the wrong place initially because things shift away
Would it be feasible for pypi to indicate whether the metadata of all wheels is consistent? E.g. if i check https://pypi.org/pypi/numpy/1.2.0/json, it would be nice to have wheel_info_consistent key that is true if all wheels (an with metadata 2.2 also the sdist) have the same metadata as the info key
Feasible yes, but first PyPI needs to learn how to extract metadata from a wheel (which is in progress but stalled a bit due to security considerations, see attached link).
https://github.com/pypi/warehouse/pull/9972
And if you want that functionality in the API response you’d want to write a PEP
i see, thanks!
(there is interest in that, FWIW)
I think we'd probably want to pursue doing this 'the right way' instead: https://github.com/pypi/warehouse/issues/8090
(RE: metadata consistentcy)
Why is this a prerequisite? PyPI already gets per-file metadata on upload.
tbh if i write a PEP i'd probably work on fixing and evolving PEP 440 / PEP 508
What do you wanna change about them?
- guy who just read through them a lot more than he'd like a couple of weeks ago
for pep 440, i've written that down in the second part of https://cohost.org/konstin/post/514863-reimplementing-pep-4
REIMPLEMENTING PEP 440
I've reimplemented PEP 440 [https://peps.python.org/pep-0440/], the python version standard, for monotrail [https://github.com/konstin/poc-monotrail]: pep440-rs [https://github.com/konstin/pep440-rs]. Did you now that 1a1.dev3.post1+deadbeef is a valid python version, there's not only == but also === and that version spec...
For PEP 508, fix the grammar in the main document (the parsley grammar seems correct, but parsley is unmaintained and imo the main text grammar should be the reference)
I have a PR open on packaging.python.org to fix some of the PEP 508 errors
Nice writeup! I love this line:
So python has a hard time doing modern packaging because they were trying to do modern packaging before it was being invented.
Also for PEP 508, restrict the marker grammar to essentially wsp* env_var wsp* marker_op wsp* python_str | wsp* python_str wsp* marker_op wsp* env_var
on a related note, packaging and thus pypi seem to allow the non-PEP markers os.name , sys.platform, python_implementation and platform.python_implementation (that's an exhaustive list from the entire ~8 mio. release - thanks again @merry valve being able to use bigquery is great)
Where are these numbers from:
For comparison, firefox estimates 16–20 minutes for the semver spec, but 57–73 minutes for PEP 440.
Is that reading time?
yes, this reader icon
nods.
was the quickest proxy for i could find, copying and counting words doesn't work well for these kind of docs
Yea, makes sense tho. PEP 440 does go into a lot more detail + discussion.
Yea, those come from another PEP number (packaging's parser has a comment with the number)
sure, and we have more historic complexity to handle, but i still think it could be simplified quite a bit, both the content as well as the text
Yea, that would be nice.
Moving to packaging.python.org was the first step for that. :)
oh thanks, will check
yeah i've seen that and i'd love to make that happen
PEP-345
i'm actually currently figuring out how to support both strict-PEP and a modern version with ^ and ~ in my code
Python Enhancement Proposals (PEPs)
thanks! what's their status though? i can't find them in https://packaging.python.org/en/latest/specifications/core-metadata/, so are the like soft-deprecated?
re PEP 508, i'd also like to define clear rules which fields are PEP 440 and which are stringly typed. If i define implementation_version, python_version and python_full_version as PEP 440 and the remainder as stringly i see a lot of comparisons that are mostly accidents or will often not do what was intended
Things like platform_release < '12.0', platform_machine >= 'armv0l' or python_version < '3.8.'
imho pypa/packaging should at least warn when there's a marker such python_version < '3.8.' that is clearly a typo
and there's numpy==1.2; python_version >= '3.4, <3.7' which is kinda fine but also a challenge to parse
two very small steps towards that pending review: https://github.com/python/peps/pull/2898 and https://github.com/pypa/packaging.python.org/pull/1181
I don't think it's particularly worth trying to remove stuff from PEP 440
the pain of extra stuff is mostly borne by a handful of people who are writing packaging tools, and typically even a smaller set of people who are writing the version parsing stuff
the pain of removing stuff is borne by the wider ecosystem, and the benefit feels nebulous to me.
Also, removing stuff is harder than adding it, since people rely on it
Oh! I didn’t know it already has full metadata access. In that case the metadata PR should not do the extraction…
Is there a reason PEP 658 wants metadata to be served as a file? Is it just because it predates PEP 691? If so, if we put metadata in a PEP 691 response instead, would that work for pip?
We'd still need to solve https://github.com/pypi/warehouse/issues/8090, otherwise the metadata might be subtly wrong if it differs between distributions for a release.
pip has not implemented PEP 691 yet (if I recall correctly) so that’s needed. PEP 658 puts metadata in a different endpoint (separate file) simply because putting everything on the simple index page would make the response much too large for projects that has a large number of files on the index (e.g. numpy)
It does support 691: https://github.com/pypa/pip/pull/11158 🙂
Yeah, I just mean the JSON response, and omit it from the HTML response.
Right… I forget again
That’d work
Well
PEP 658 vs PEP 691 is an interesting thing to consider
I don't think PEP 691 definitely kills 658
Like there's 0% chance that we put everything that PEP 658 exposes into the simple index, since it includes stuff like the entire long description for every file ,etc
So something we have to decide is whether the extra information that 658 exposes is still useful, or if the only thing that really matters is the limited subset that pip would want in PEP 691
even in that case of that limited subset, it'd still be useful to determine how much that would bloat the PEP 691 response (with and without gzip), and whether having a larger response is better or worse than multiple smaller responses
I do think we need to fix more than just the way Warehouse stores metadata, the source of metadata inside Warehouse does not have to match the metadata inside the artifact itself
IOW, you can have metadata on Warehouse that thinks it depends on X, but when you read the metadata inside of the artifact, it says it depends on Y
If nothing else, PEPs 503 + 658 is still a much easier solution for alternative indexes
But for PyPI specifically I think it makes sense to only expose what’s relevant for pip
I was looking at https://peps.python.org/pep-0629/. What is the scope of changes that would be considered for a version change? is it down to css class names or just tag structure of HTML document?
Python Enhancement Proposals (PEPs)
Ah, so that's where you found this post 👀
There's not a set list, it basically comes down to making a decision when we make the change... but roughly speaking if existing clients are expected to still understand it in a meaningful way (even if the meaning has changed somewhat), then it would b e a minor change
if clients can't understand it in a meaningful way anymore, then it's major
hello! is a files.pythonhosted.org url to a specific wheel stable?
As in, will the URL change?
yup!
I am no expert, but I'd say that this is what Simple API was made for. It provides you direct links to packages
We don't make any guarantees but generally they don't change (and when they do we provide redirects)
The Simple API will always have the 'most accurate' file URLs for a given release, though
I know PDM generates lock files with the pythonhosted URLs: https://github.com/pdm-project/pdm/blob/main/pdm.lock
practically speaking we've never broken a pythonhosted URL that was linked from a simple index page unless the file was deleted
but as di mentioned, that's not a promise in our API structure
I was looking through the issues, but didn't find any answer. Is there any plan/feature request to add search API? basically nothing PEP-based, just the same output website search has, but in JSON for it to be easier to parse? In theory one might take Simple API index page, but for PyPI it's like ~23MB, so even the downloading could be slow on lower bandwidth. Another option would be to parse search results HTML page, but that's either requiring one of those big libs for HTML parsing, writing parser based on html.parser package or (with proper offer to dark forces) regexing that stuff.
for visibility https://github.com/pypi/support/issues/2545
thanks
does anyone know where requires_dist comes from for a .whl file in the pypi BigQuery dataset? Is it from the wheel's metadata?
Could different .whl files for the same package+version have different requires_dist values in the dataset?
Yes and yes
great, thanks
if my query is correct, out of 310,000+ latest versions of packages that ship wheels, there are 275 that ship wheels with different sets of requirements
You can rely on the URL redirects. I use those in Arch Linux PKGBUILDS, because one can’t use a dynamic API in those ones, and it’s annoying to manually use that API. It’s explained here: https://wiki.archlinux.org/title/Python_package_guidelines#Source
E.g. this is the URL template for wheels: https://files.pythonhosted.org/packages/py3/${_name::1}/$_name/${_name//-/_}-$pkgver-py3-none-any.whl
Announcing the launch of blog.pypi.org
is there a preferred/dedicated way to check whether there a new releases or new files for a larger number of projects compare to the local cache (i.e. an entire lockfile)? Currently i'm sending a lot of parallel requests against https://pypi.org/simple/{project}/?format=application/vnd.pypi.simple.v1+json with If-None-Match but it feels wasteful making 100 requests for that
is there a preferred dedicated way to
I'm reading https://warehouse.pypa.io/api-reference/ and trying to make sense of the following line
Requests to the JSON, RSS and Legacy APIs also provide an ETag header. If you’re making a lot of repeated requests, ensure your API consumer will respect this header to determine whether to actually repeat a request or not.
I understand this provides an ETag header, but I'm not sure how to provide this header to the API when I make a request
it just seems to always return a 200
in addition, the releases key is deprecated from the json API, but the simple api in json form does NOT provide nearly the same amount of information
Have you tried the If-None-Match request header?
Seems to work for me:
$ curl -i -H 'If-None-Match: "vthI01QTzZkEAMR/jk8Lug"' https://pypi.org/pypi/installer/json
HTTP/2 304
date: Wed, 29 Mar 2023 22:19:50 GMT
cache-control: max-age=900, public
etag: "vthI01QTzZkEAMR/jk8Lug"
x-served-by: cache-hhn-etou8220068-HHN
x-cache: HIT
x-cache-hits: 1
x-timer: S1680128391.952514,VS0,VE1
vary: Accept-Encoding, Accept-Encoding
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-frame-options: deny
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
x-permitted-cross-domain-policies: none
Yea, that's a conditional http request
HTTP has a concept of conditional requests, where the result, and even the success of a request, can be changed by comparing the affected resources with the value of a validator. Such requests can be useful to validate the content of a cache, and sparing a useless control, to verify the integrity of a document, like when resuming a download, or ...
it can depend on ~things too
now I just have this problem
if the information you want is only available in that format, then you don't have a choice
we're not planning to remove that currently, but it was a problem on other responses so it's a warning that if you can find another way to get that information that you might be better off doing that
noted, begins to scrape the user facing html pages
😦
That's how the simple API got invented
so you're standing on the shoulders of giants
I have to say, that documentation fragment is written kinda weirdly.
Requests to the JSON, RSS and Legacy APIs also provide an ETag header.
Surely it's the responses that have theETagheader, not the requests?
If you’re making a lot of repeated requests, ensure your API consumer will respect this header to determine whether to actually repeat a request or not.
You can't actually determine whether to repeat a request based on the ETag, can you? You can make a conditional request, but that's still a request.
ye
my initial implementation was lazy so i just cached all of my requests locally for two hours
though that leads to issues with a package releases an update, the information doesn't show up for up to 2 hours
(for those that don't know, originally there were no installers, PyPI was just a bunch of user facing html pages for people to manually look at, then easy_install came along and started scraping those pages, then that stated using a lot of bandwidth, so they made a "simple" html page that easy_install could scrape instead with just the links)
also for awhile there was a commented out <th><tr> in the simple api because easy_install used a regex to parse html and required that to work
what http library are you using?
aiohttp?
aiohttp, but for caching its my own custom implementation with an LRUCache
I'm refactoring it right now to use etags and redis
as i've done with all of my github related requests
the traditional way to implement conditional requests is to cache the response, and if the conditonal request returns a 304, turn that into a 200 using the cached response
yea
I haven't used aiohttp, but for requests I've used https://pypi.org/project/CacheControl/ which does things correctly. If you're like me and find working examples helpful, that might be a useful thing to poke at.
ah i was considering writing it myself
so i can use redis and whatnot
wrap the existing aiohttp _request method with a wrapper method that simply checks for etag headers and sets them on request
Yea, basically you check if responses have an ETag, and if they do you store them for later use, then when later comes you check if you have a cached response, if so you add the If-None-Match, and if you get a 304 you just re-use the cached response, and if not you use the new response, caching it if it has an Etag. Then your client code doesn't have to think about it, so it's a nice abstraction
hm although i might need to cache the original headers as well in the event that there was something to check 
@unreal jewel thx
it mostly works 
aside from header finagling
@fleet grove https://gist.github.com/konstin/fdfd1790b25d100566c6664983f546cc this is what currently succesfully does etag/if-none-match in my prototype
I'd use httpx and httpx-cache TBH (https://obendidi.github.io/httpx-cache/guide/)
Simple caching transport for httpx.
if it was a new project I would, but I already use aiohttp, partially because it's already used by a library internally
oh, I didn't know this one. I was looking for caching library for httpx and only found httpx-caching which didn't look maintained
ah i just looked more in-depth and that doesn't do etag caching
Bit of an internal discussion that we're looking for clarification on if anyone has some insight into the lower levels of how warehouse/pypi works.
- A malicious package is uploaded under the name
package-aand version1.0. The package is reported and removed appropriately. - Since names cannot be recycled (AFAIK)
package-aand version1.0cannot exist anymore on PyPI. - Does this mean that
package-acan exist on version2.0even if the package was removed, or does it need to have a distinctly unique title from that point on, since the version can no longer be updated without a reupload?
This documentation page is broken: https://docs.pypi.org/trusted-publishers/using-a-publisher/
can file an issue later
Thanks! Will fix this now
alright, no need for an issue then I presume?
(thanks for the beta invitation btw)
no, I'll just fix it
In relation to this, it says that the risk can be mitigated using a dedicated environment but I think this is not currently possible? It doesn't seem like PyPI checks environment name during OIDC request (though it seems it could, based on OIDC documentation from GitHub) so it would seem that a committer could still modify the publishing workflow to simply not use that environment?
I do use a dedicated environment with manual approval but it seems like switching to OIDC would mean that the risk is increased in that regard since one could circumvent the environment while currently they can't since the token is only present in the environment that needs to be approved.
alright, seems like I found the relevant issue for this:
https://github.com/pypi/warehouse/issues/13270
Yes, looks like https://github.com/pypi/warehouse/pull/13272 fell off our radar, the goal is to make that available before the end of the beta.
@merry valve thanks for the oidc beta invite. I just made a pip-deepfreeze release with it. Everything went smoothly. 👍
hi, where There could be checked why package is removed from pypi (in this case "codecov")
Added a "Report Malicious Package" button to the PyPI Package Inspector!
https://github.com/pypi/inspector/pull/93
what is the recommended way to upload a package for the first time?
as far as I understand one has to use the account password or create a temporary token with permissions for everything
is there another option?
There is another option, but it's in beta right now: https://docs.pypi.org/trusted-publishers/
oh well that is awesome, I didn't fully understand it before now
on the sign-up form for GitHub username should I put the organization?
You should put your personal GitHub username -- we'll use this to invite you to a private repo to discuss the beta
What do you guys think about proactive detection of known malicious code in pypi?
I don't want to keep bothering @merry valve with more reports 😅
the problem with proactive detection typically comes down to the false positive rate
atm PyPI relies on third parties to report malware, so that they do the work of sorting through the false positive rate in whatever means they're using to detect malware
Do you know what the current approach is for detecting this?
if there is one
Not sure if I understand the question, but there's some more context here: https://twitter.com/di_codes/status/1562160283178745858
A bit of context here...
The 'malware check' mentioned in the article isn't actually used on @pypi.
It was created as a proof of concept for a prototype detection system, and yeah, it's noisy (many false positives), but it also misses things & is hard to make better.
Why?
That's exactly what I was looking for, thanks!
There's some really interesting stuff in that thread.
Not exactly malicious per se, but a simple regex that flags some package names would fix a lot of the autogenerated clutter.
Like anything ending in _robux or how the roblox currency is called. Same for fortnight currency and other.
The number of false positives would probably be very low, from a quick test of mine it would currently be zero. Even false positives in the range of 1-10 could easily be handled via support tickets.
Yeah haha. I have been reporting the same badly written piece of malware for the past week.
And there's like ten more after that.
And I just figured that there had to be some way of automating this.
This guy just released another one: pipcryptographylibaryV2
To circle back on this, myself and a team of individuals maintains an automatic scanning suite. I was out of the office today and it... appears nothing got reported.
wew
@rare wadi unclogged the pipe, the juice is flowing again.
Would there be any way to speed up registration process for the trusted publisher beta for a maintainer (username: Kowlin) that has access to a project that already has it enabled (I enabled it on few such projects since I already have access)? Trying to see if we can avoid bus factor of 1 for the time being
Done!
@merry valve I finally got around to reading that Twitter post you threw up. We have access to OSSF/Backstabbers/Chainguard malware collections and are working on consolidating information. Do you locally mirror any of the packages we report, or is that something we should be building up in our repository so we can expand the current malicious package datasets for research purposes? Our false positive rates aren't... nearly what Chainguard seems to be implying (though they're still quite high, roughly 15-20% if I had to guess.)
I have an interest in sharing what we're doing, because I believe it's wildly effective, but I have no context to what companies like Chainguard/Phylum/Snyk's report rates are with your org.
We keep essentially everything uploaded to PyPI. You'll note that inspector.pypi.io links from malware reports still work, that's because the underlying file is still available on files.pythonhosted.org. We don't do a great job of linking a malware report -> project -> files on PyPI currently so it's non trivial to come up with that "dataset" but it's not impossible and I do hope to do that eventually.
Actually, that's helpful enough. We have the list of known-bad packages. So long as the items still exist, we can likely query them effectively I believe. That might be a priority in the coming days.
See also https://github.com/pypa/advisory-database/issues/45 and https://github.com/pypa/pip/issues/5777 for some somewhat-related discussions
That might be something that we can effectively generate, but our false positive rate would have to go down substantially if we were using our locally provided index of malicious packages as opposed to any single field queried by the warehouse API itself.
I shall ponder.
When a package is uploaded, even if it's the first 'instance' of that package appearing on PyPI, does that package still appear on the RSS feed for recently updated packages? 
Antimalware team "Try not to break the automated mailing system for one consecutive day" challenge. Can the team do it? Find out next time on Open Source Security Z.
(My heartfelt condolences to Dustin and crew for dealing with our scuffed systems.)
just stumbled across this repo: https://github.com/pypi-data/pypi-json-data
There is also https://github.com/sethmlarson/pypi-data (which I trust more because of the maintainer)
pypi-data is maintained by Tom Forbes (GH orf), he’s a pretty active community member as well, but yeah it’s hard to beat Seth
anyone aware of sensible ways to list only packages starting with a prefix instead of all of them? also do any of the pip data libs support caching by last serial?
(i want to reduce the size/number of http requests the pytest plugin list updater makes without having to invent anything)
it seems tho, as if just using requests-cache already is enough to avoid any more issues for now
There is no such API, sorry
Trusted Publishers launched!
https://twitter.com/pypi/status/1649155995954823169
Starting today, PyPI package maintainers can adopt a new, more secure publishing method that does not require long-lived passwords or API tokens to be shared with external systems.
https://pypi.org/project/dj-test-queries/
[ character is throwing the error at the end of the description.
(The [ is fine, it's the Unicode ESC before it)
https://cdn.discordapp.com/attachments/1086881712434319501/1098934622596898866/image.png
https://cdn.discordapp.com/attachments/1086881712434319501/1098934622802415697/image.png
Thanks for the details, looks like we should filter invalid characters from XML.
I'm at PyCon, so if any of y'all would add an issue https://github.com/pypi/warehouse/issues/new?assignees=&labels=bug+%3Abug%3A%2Crequires+triaging&template=bug-report.md
I can get to it later on
Best of all time username to report the issue
Now I wish I would've reported it...
You’re just a garden variety anime girl
I’m actually
unique 
I put up what I think ought to fix it here: https://github.com/pypi/warehouse/pull/13474
Thanks again!
Thanks for the quick fix Mike!
Happy to help! Thanks for the fix!
Is there anything I can do to escalate https://github.com/pypi/support/issues/2587 ? I'm at a decision point now where I'm considering withdrawing my support for the project and finding or forking an alternative. My preference would be to maintain the project.
https://pypi.org/project/repelis-ver-john-wick-4-la-pelicula-pelicula-completa-en-espanol/
Any strong opinions on... whether or not advertising on PyPI constitutes a report? lol.
Yeah, that's spam
Shot an email out then, cheers.
does anyone have any idea what this error from PyPI means? https://github.com/pypa/hatch/issues/833
The filename needs to start with github3.py, not github3_py: https://github.com/pypi/warehouse/blob/fb439afb88a7c7b70726a672a313513fb582e338/warehouse/forklift/legacy.py#L1176-L1183
isn't that the same?
oh bummer. thank you!
Careful with the new blog, you might wear it out! https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/
Some other news today as well: https://twitter.com/pypi/status/1650167745936506881
Shimmed in some rules to catch those .pyc's from https://github.com/pypi/inspector/issues/98
Hopefully no more of that nonsense.
Hi. What is the maintainers take on https://github.com/pypi/warehouse/pull/9972? Is it open to be taken over by someone else or is the hostage situation just accepted state?
I would be happy to try and deliver this feature to PyPI
Is there any recommended way to give last serial ids as hints to http caches?
What are you trying to do?
When using a http cache it seems I get stale cached entries for projects, I'd like to ensure the cache makes use of the last serial i obtain from the project list to enable early revalidate
But I can fall back to forced refresh
I think the answer is yes, it's open to be taken over, though it may be hard to do until we have a better thing for our uploads to do things async. I think there's also an open question of if that PEP still makes sense given we can put JSON on the simple API now, so is it better to invest in getting info into that instead? But I don't think either of those block that PEP, just somethings to consider
Is it just me who cannot access pypi at the moment? My Browser keeps telling me: Access Denied
My phone just downloaded a 0kb file from trying to hit it.
Same here, 403 error for https://pypi.org/ and https://test.pypi.org/
Thanks for confirming 🙏
https://status.python.org/ says Service Under Maintenance
Welcome to Python Infrastructure's home for real-time and historical data on system performance.
Ah, I was an this side before, but overlooked the banner and just saw this beautiful green saying "Operational".
https://pypi.org/ and https://test.pypi.org/ are back up
Hi, is this the place for Inspector too? It’s giving us 502(s)
I was super happy to read that organization accounts are now available for PyPI. I would like to start discussions at Canonical's sprint next week in Prague with upper management to introduce them for Canonical, but I cannot find any pricing information. Any help on that? cc @merry valve
We haven't published pricing yet, as we're still finalizing our terms of service for paid plans.
Our goal is for corporate teams to be able to finalize their signup with billing details in May.
GHA getting drunk or some infra issues today? Run python3 -m pip install --upgrade pip ERROR: HTTP error 502 while getting https://files.pythonhosted.org/packages/29/eb/5a56994b37d9141a6c7fa6ddb5c76a50b234a424eae4e79c56f33b61c686/tox-4.5.1-py3-none-any.whl (from https://pypi.org/simple/tox/) (requires-python:>=3.7) 31 ERROR: Could not install requirement tox>=4.0.0 from
I had an issue with a cron run 5 hours ago but not 4 hours ago so it seemed intermittent
We were having some issues with pythonhosted returning 5xx too
https://github.com/pypi/inspector/pull/107
Added disassembly and decompilation to the Inspector!
Woah, nice!
Thanks! 🙂
Neat! Did you add in the opcodes for 3.11? I use pycdc fairly regularly to... mixed success.
Oh never mind, saw the second screenshot.
Still extremely cool ❤️
Ah, is WITH_EXCEPT_START an opcode from 3.11?
Mmm not according to docs anyway. 
one of my backlogged projects is bringing 3.11 opcodes into pycdc so we can support new verisons, frankly I'm not sure why that's popping up there.
Edit: That makes it sound like I'm the author of pycdc, I am not. I just need... 3.11 opcodes to work lol.
haha. It looks like the disassembly was successful though.
Yep! Definitely a massive change, would love to see it go through. This would alleviate a lot of work on our end from spooling up vm's every time we get a flag somewhere in .pyc's.
Glad to hear it!
One note -- You added to the Dockerfile at the very bottom, which will force it to redownload and recompile all of that every time the code changes.
See this comment: https://github.com/pypi/inspector/blob/main/Dockerfile#L19-L20
You should move the compiling up there
Thanks for the heads-up! I'm a little new to Docker. haha
Sorry for the ping, but does this perhaps look better? https://github.com/pypi/inspector/pull/107/commits/1330b526769a6d461eb33a0f28315a7f2b45cde2
Yeah, except you probably don't want them commented out 😉
Sorry, codebase is a little messy because I just ripped out what I needed from pypi/warehouse
Hahaha, no worries. Should I uncomment it?
No don't -- Don't comment out your own work
Don't worry about trying to && them either, literally just take take what you had and cut it and paste up underneath that comment. No need to change the code itself, just the location.
Oh, got it lmao. Sorry guys, I'm such a noob with Docker.
👍
Okay, that should do it! 😅
whoever was asking for other packages with .pyc files to test on, here's one: https://inspector.pypi.io/project/projz-py/2.3.6/packages/48/05/2483fb1447e4851f05d2485a40b93eb48e3c6389e09df5600758077d5d4c/ProjZ.py-2.3.6.tar.gz/ProjZ.py-2.3.6/projz/api/secret/secret.pyc
Oh nice, thank you. I'll give it a try!
decompiler works nicely with this one!
Okay. I think that's everything. PR is ready for review @merry valve whenever you get the time! 🙂
Hi there, I have a PR that's been open for a couple months. Could I get a review on it?
https://github.com/pypi/warehouse/pull/12778 "Fill api token form when user arrives from manage project page"
hey Risto, sorry for the delay, we've had other work consuming our time lately. I'll try to prioritize reviewing this!
thanks again for this!
My pleasure!
Do you keep any logs of failed requests for short-lived API tokens (security log doesn't seem to suggest so)?
I'm wondering why this run has failed for us: https://github.com/Cog-Creators/Red-DiscordBot/actions/runs/4879413470/attempts/1 with Red-DiscordBot package on PyPI. I was forced to temporarily add a publisher with the same owner, repo, and workflow name but without an environment specified so that I can release. Looking at it post-mortem, I still see no reason why it was failing as the "Release to PyPI" job does run in an environment named "Release" and I in fact had to approve the deployment:
https://github.com/Cog-Creators/Red-DiscordBot/blob/1d654c2edcd1b10e7521a4edc995817b68978cf0/.github/workflows/publish_release.yml#L83-L106
@obtuse torrent I'm trying to diagnose what I suspect is the same or a quite similar issue at the minute
I think it might be a PyPI bug? But it's hard to tell because I can't see any debug info
But I think what's happening is that environments are case insensitive (on GitHub) but either PyPI or gh-action-pypi-publish is not treating them as such
I have used the same capitalized environment name in other repos without issues
Aha, ok, then I'm wrong clearly.
But yeah, I was thinking it is a PyPI bug too
Just not sure what kind of bug
It could technically be some sort of limitation on GH's side but idk
I only see one publisher configured for this repository/project, did you delete the one that had the environment name?
It does have an environment name
er, without the environment name, sorry
I think what's happening is that we're not normalizing the environment name everywhere, will look into this, thanks for the report!
@merry valve where would have been the right place to file this (since I came to the channel mostly to ask that initiall before seeing @obtuse torrent having the same issue :D)
warehouse?
(I suspect if https://github.com/python-jsonschema/sphinx-json-schema-spec/blob/006aecb2bc089c4ca06e475e296e1b417f5899b9/.github/workflows/ci.yml#LL75C1-L75C1 was pypi instead, this would work)
The main configuration should only be the one with env name, I temporarily added a publisher without the environment while cutting two releases today
yeah cool, that was my guess too
Cool, thanks
I was surprised because we have already tested a similar configuration with a different package/repo and it worked fine but I guess normalization may be missing for some specific case
Now that I think of it, we tested the exact same configuration too on a different branch in the same repo 😄
the logic here changed a little bit recently, I think this would have succeeded prior to a day or two ago.
Makes sense then
yep
welp, I have a fix but looks like GitHub is having an outage
Sounds like a good fix, no one can use the feature now anyhow then, problem solved.
Ok, should be fixed in a few minutes once the current deployment goes out. Thanks again for raising!
🎉
OH
BTW
@merry valve -- what's need to complete support for OIDC trusted publishing through reusable workflows?
Is that something I could help along?
Amazing, thank you!
(The fix seems to indeed have fixed me!)
are you open to adding option to return JSON for PyPI search? right now there is no viable option to search PyPI outside the web UI and that is quite hard to parse when one wants to search pypi from outside web UI. simple API is not really a solution since main page for PyPI is like 40MB and it would be quite heavy to both download in the background and parse
Thank you so much!
From pip search:
ERROR: XMLRPC request failed [code: -32500]
RuntimeError: PyPI no longer supports 'pip search' (or XML-RPC search). Please use https://pypi.org/search (via a browser) instead. See https://warehouse.pypa.io/api-reference/xml-rpc.html#deprecated-methods for more information.
And in https://warehouse.pypa.io/api-reference/xml-rpc.html#deprecated-methods
search(spec[, operator])
Permanently deprecated and disabled due to excessive traffic driven by unidentified traffic, presumably automated. See historical incident.
And the linked incident https://status.python.org/incidents/grk0k7sz6zkp
I am really curious what made those massive amount of request, because the way this is written sounds like the requests were made from a single IP or at least from only a handful of IP‘s.
Im wondering, has the culprit been identified?
How are orgs expected to work on Test PyPI? We don’t have a tab for signup on there.
We are working with the abuse contact at the owner of the IPs and trying to make contact with the maintainers of whatever tool is flooding us via other channels.
Due to the huge swath of IPs we were unable to make a more targeted block without risking more severe disruption, and were not able to receive a response from their abuse contact or direct outreach in an actionable time frame
The first sounds like at least the ip of the origin was identified, the second one sounds like it is a tool that is used by many ips.
With no evidence or knowledge, I would wager the guess that it is some tool that does something completely different but once used the xmlrpc search in the background. It probably doesnt use the result anymore, because the requests did not stop, when the service was suspended.
Maybe doing a search for a known package just to check if pypi is down or not. Would be stupid but yeah.
This is me just armchairing though.
Worst case thought: malware that uses the existence of a package as a shutdown signal. That package was never installed so now all infected systems constantly call and never will stop
it might be a defunct autoupdate for something with plugins, but as im unaware of what actual search queries where used, its unclear
It has to be something that still works even when the search fails.
hundreds of thousands of requests per hour has me skeptical that the intent behind something like that would be benign.
https://twitter.com/ESETresearch/status/1654127211287560194
Found this fairly interesting, ESET is tracking on the same stuff getting spammed up on PyPI that us and other orgs occupying a similar space are. This stuff is pretty loud when it flags on our yara rules, I do wonder if maybe we can't quietly hand over our yara crib that flags specifically on this to Dustin and crew for automatic removal. I cannot fathom a false positive that could be generated from packages like this.
Ok I have to add to my curiosity:
What terms were queried with that mass requests!
Hi! As far as I can tell the specification says that all source distributions need to be gzipped tarballs -
The file name of a sdist was standardised in PEP 625. The file name must be in the form
{name}-{version}.tar.gz, [...]
But https://pypi.org/project/pyglet/#files has a .zip as a source distribution...
Are there any checks on valid file types? ... can distributions just be any file type?
A with most things like this, compatibility with history. Not everything works, but I think there are a couple more archive formats supported. bz2 and xz or something like that.
But please stick with tar.gz if you’re making a new tool
iirc the sdist can be one of tar.gz or zip, but you can't upload both (any more)
https://peps.python.org/pep-0527/ removed a bunch of old ones like tar.bz2 and tar.xz
Python Enhancement Proposals (PEPs)
Yeah, all of mine are built automatically by the PyPA GHA, so it’s all abstracted away from me, I don’t even have the option to heck it up even if I wanted too
But I’m building tooling for reading arbitrary dists, so I get to deal with all the edge cases 🥹
Can you file an issue?
Si!
Kinda' rough to write up, because I'm not sure of the inner workings as to why that's failing, so I beg your patience with my haphazard documentation.
The malware group that goes by "KekWLTD" is at it again. This is their newest github user: https://github.com/patrickpogoda
The way inspector checks for package existence on pypi is by checking the status code of the project page at request time.
Mhm, I mentioned that in the issue.
my guess is that this is happening because somehow, pypi is randomly not returning 404
ah gotcha
It takes a little bit for PyPI to serve an actual page after a package is uploaded.
We receive notification often seconds after a malicious package is published.
This often means we fall in that window where PyPI isn't appropriately serving the webpage content yet, so we more or less always get the 'package removed' message. Waiting ~20 seconds or so and refreshing typically yields what we'd expect to see.
Ah, that makes sense
Working on code analysis features for inspector...
On the PyPI end, is there a way to check to see if an account has a workflow created to automatically upload to PyPI? We found the GitHub accounts responsible for uploading all this Kekwltd malware.
They advertise their malware with a bunch of random crap on GitHub, and in the GitHub repo, the malicious payload is installed.
They have ~1130 commits in like 15 days, so there's an automated workflow here, not sure what the internals on that might look like though.
(Regardless, to preempt the question, we've filed a report.)
Mike has joined the party.
Not sure I'm following the question. Do you mean a way to check GitHub for a workflow? Are you talking about Trusted Publishing specifically?
Yes. Sorry, that could've been worded better.
Do you mean a way for you to check, or a way for PyPI admins to check?
IIUC, you're trying to find the GitHub repo that corresponds to a given PyPI project?
The later. I'm not entirely spun up on the Trusted Publisher feature, but 1130 commits in that time makes me think that there's some heavily automated process behind this; whether that automation extends to the trusted publisher feature or not is kind of what I'm curious of. I'm not looking to breach that level of trust with PyPI users, I'm more curious if that's something that you guys have looked into-- how these guys are pipelining the spammed packages into PyPI, and specifically, if they're using the Trusted Publishing to do so
I guess to bring that to a more clear point, it's clear they've automated the updating of their repositories to add in new malicious packages.
I wonder if they are using the trusted publisher to push packages to PyPI as well?
In terms of automation, there isn't really much difference between a workflow that publishes via trusted publishing and a workflow that just uses username/password
I doubt they're using trusted publishing because it would be a little more work to set up on the PyPI side
But it would be good to highlight if it's set up during our takedown process, because it would tell us if there was an upstream repo somewhere
Actually while I've got you here, entirely tangential question. I've been following on https://github.com/pypi/warehouse/issues/12612 and doing some brainstorming.
I'm acutely aware of the volume of reports you likely receive from orgs that are doing similar monitoring of the package index. Would there be an impending need for a third party solution to this issue, or are you fairly close to a solution internally?
We are in the process of establishing an API endpoint that would allow third party, authorized individuals to direct reports through and, if the package with the version associated has been seen before, individuals using that endpoint would be appropriately notified that that package had been reported previously.
We haven't announced it yet but we've gotten funding to essentially implement that issue (and some related things around malware). So yes, we'll be implementing this ourselves.
Good news! Hope the development process goes smoothly!
As I'm... sure you're no doubt aware by the volume of reports, I suspect they (malware authors aforementioned) have now entirely automated the upload process, to include checking if the package has been yanked. 
Is there anywhere I can point people to with more information/context about the incident right now with new account and package registrations being disabled, besides just the summary in the incident report? I have a number of people asking.
Nope. What more information/context do they need?
They were mostly asking about the cause of the latest major uptick in malicious activity. Seeing the limited comments here I surmised it seemed to likely relate to increasing automation of the process by malicious package authors to increase volume and bypass or reduce the impact of previous mitigations by the PyPI team, but I wasn't sure since I couldn't find much info other places. I didn't mean to bother you at a time when your bandwidth is most critical, sorry, I just figured someone might have additional insight to share or somewhere I could redirect people.
Not sure about the cause of the uptick but the cause of the pause on project/account creation is just more malware being uploaded and less time for us to deal with it.
If there's any specific answers they're looking for, let me know and we can get it added to the incident.
Thanks! (And sorry for the delay, didn't see a ping about this).
Yeah, it wasn't anything very specific besides the cause—you could mention something like "The specific reasons for the malware increase is currently uncertain." (at the end of the first paragraph) if you wanted to be explicit that isn't currently known, but other than that there isn't anything I'd consider.
This runs maybe possibly close to some EU stuff. Ala what meta just received.
Not saying it does or so, because you know I am not a lawyer and I dont even play one on tv.
Oof, that's never fun. The transparency is refreshing (and appreciated).
One thing I was curious about is why the users in question haven't been given any notice of the subpoena. Was this a condition of the court order, an individual decision for this case on PyPI's part, or PyPI's standard policy?
Is the storage in the USA?
I read it fully and it seems like actual personal identifying information was given to a US goverment entity.
Would this also happen for EU citizens? Is there a way for me as a user to see my data, or request deletion, easily?
I am sorry for all those loaded questions, but sharing pii with US agencies is really iffy for a lack of a better term, to me.
(Is this also the real reason for the account/project creation halt? To not accept more until you have the new system running?)
Is the storage in the USA?
The bulk of PyPI is hosted in the US, I think we have a backup server in... Ireland? England? Something. And of course the Fastly CDN has POPs all over.
I read it fully and it seems like actual personal identifying information was given to a US goverment entity.
From the post:
- Internal Databases IDs (all UUIDs IIRC)
- Username (public on PyPI)
- Display Name (public on PyPI, able to be changed at will)
- Email addresses (not public, required on PyPI to contact the user)
- Journals table records (all typically public except the username and IP address of the user that made the change that caused the record to be made)
- "User Events", you can see these on https://pypi.org/manage/account/ if you scroll down to "Security History", I believe that shows all of the information that can be given in that table.
- Date Joined and Date Last Login - Date Joined is available publicly (if it exists, sufficiently old accounts don't have a date joined), date of last login is not publicly available (not sure we even make it privately available... but you can figure it out from the security events).
- Download logs of "who downloaded X package"-- they asked for IP addresses but we don't store that information, in fact the only download logs we store are in the publicly available BigQuery tables that anyone can query, which were carefully designed not to hold PII.
So from that, the data that isn't publicly available on PyPI:
- various internal database IDs, which are all just random UUIDs (some might still be integers, but I think we got rid of all of those).
- email addresses
- which specific actions that got recorded in journals table were made by you (these lists things like "package X uploaded" "file Y deleted", "package X created", stuff like that).
- When your last login was
- Various events in the security history on your account page.
- IP addresses in the journals table and the security history.
Going to ignore database IDs because I don't think anyone could call them PII.
- We can't really get rid of emails, we need them to communicate with our users, deal with password resets, etc.
- We can't really get rid of the username in the journals table, that is important information for remediating compromises or even just confusion about who caused something to happen (why did a file get deleted, who uploaded this, etc).
- Last Login, I don't really think this is PII but we can't get rid of it because we use it for security reasons (things like password reset emails get invalidated if you login with your password)
- Security History (sans IP Addresses): This probably has the most PII in it, mine is kind of janky right now because of a "bug" that has some extra admin stuff getting logged in it, but it looks like for me the data is pretty minimal, name of my WebAuthN Key, Ip Addresss, what 2FA auth I used to login, what orgs I was invited to, what emails were sent to me (no bodies, etc)
- IP Addresses, definitely PII.
We're explicitly looking at all of those things that we can get rid of to either completely eliminate them or add time gated retention policies where we can.
Would this also happen for EU citizens? Is there a way for me as a user to see my data, or request deletion, easily?
I think I mentioned what data exists where, we don't have any specific tooling to get a dump of your data or anything like that, just what was posted above.
As far as what would happen with a EU citizen, I have no idea. I would suggest emailing legal@python.org so the lawyer types can respond, but I would guess the answer is "it depends", but that's a complete guess.
I know that https://pythonhosted.org/ hosts a similar informational disclaimer.
Third party content providers represent and warrant that they have obtained the proper governmental authorizations for the export and reexport of any software or other content contributed to this web site by the third-party content provider, and further affirm that any United States-sourced cryptographic software is not intended for use by a foreign government end-user. Individuals and organizations are advised that the PyPI website is hosted in the US, with content delivery network points of presence as well as unofficial mirrors in several countries outside the US. Any uploads of packages must comply with United States export controls under the Export Administration Regulations.
Is there a way for me as a user to request deletion, easily?
You can delete your account on PyPI at anytime, as long as you don't have any projects where you are the sole owner (because we don't allow a project to get abandoned with no owner). I believe (and I've double checked by quickly looking at the code, but I haven't actually tested it) that will remove everything about your user from the database except the entries in the journals table, where the username gets set to "deleted-user" but the IP address is left as is (something I'll make sure to note in our review of the data we keep).
Any package files that you have uploaded, do not get deleted from storage ever (even if you delete them from PyPI) and are available directly from their URL, these have long hard to guess URLs so it's unlikely someone would find one without already knowing the URL, and of course package files are public on PyPI so anyone could have downloaded it previously.
Is this also the real reason for the account/project creation halt? To not accept more until you have the new system running?
No, we get a ton of malware/spam reports most every day and it's pretty much all volunteer run to manage them. Almost everyone was away for the weekend and there was a big uptick, with nobody to respond to them.
I think I answered all of that, sorry for the lengthy responses and slow rolling them. I was trying to verify my answers with the behavior of the code by popping it open in GitHub and double checking my memory.
PyPI is completely open source, and it's pretty easy to get it running locally as well. You can read the code yourself at https://github.com/pypi/warehouse or run it locally and try out different things and inspect the database and see what data is there if you want. If you want to verify what I've said or want to look for yourself.
Thank you very much for the detailed answer, no need to apologies for the lenghty responses.
Just to make it clear, my question were not directed as an accusation, but as something like „watch out for that“(but was written aggressive, I blame my lack of sleep ) I like pypi, I like using it, I have no intention of getting my data delete or so, I just dont want it to get hit by some fine or stuff.
My question in brackets was basically just a stupid random thought, I really should have left out.
Again I want to thank you for your detailed answer and especially for checking the actual results in the db!
Extremely sorry if I caused some stress, it was really not my intention!
Nah you're fine 🙂
More info about the weekend's halt:
The problem at PyPI was not so much a surge of fake accounts and subverted packages, though the tide of dubious stuff did rise from the typical rate of about 20-30 reports per day to about 40 per day over the weekend. Rather, the staff who usually vet suspect submissions had ebbed to a single person who felt unable to adequately respond.
https://www.theregister.com/2023/05/22/python_package_index_on_call/
Thank you for that info, and sorry for actually writting that tinfoil hat thought out. Would delete it from the message, but then all the answer would not make sense.
Sorry again, was not my brightest moment.
It really does sound like PyPI needs a modern retention policy, keeping user IPs around indefinitely is....urgh
We've filed an NCMEC and an IC3 report as a direct consequence of a particular portion of malware being distributed on the Python Packaging Index. While I understand the desire to respect user privacy, the information that's being sought in these subpoenas probably tracks in line with similar motivations. It's certainly a fine line to walk, but I'd also... like to see these individuals face some sort of consequences for their actions-- there's a fine line to walk between creating an environment where your platform is open and secure, and an environment where your platform is the wild west. Without a more substantial security infrastructure in place, the ever present reality that malware publishing was quite literally productionized and automated (to a volume that you'd find staggering) is a stark reminder that there are individuals out there actively looking to exploit the platform for financial (or more sinister) motivations.
We've found CobaltStrike Beacon droppers, Meterpreter payloads, etc. Fairly advanced attacks beyond just 'Steals your Discord Token', and often under the guise of perfectly legitimate packages. In the lack of a first-party infrastructure that supports the automated detection and identification of packages like these, I can see a world which necessitates the retention of that information to ensure a broader safe usage of the Python Package Index.
Also for what it's worth, the account perpetrating the sustained attack over the weekend referenced in this blog was actioned by GitHub. My team (and at least one other supply chain security org) are monitoring for any additional pop-ups that happen, hopefully to nip it in the bud sooner rather than later this time. The actual account was known at the very beginning of the month, myself and several other individuals were waiting on a response to our GitHub reports. The account was actioned on the 23rd of May. Several associated accounts were actioned at the same time, in an effort to prevent simply switching to another account.
My personal opinion is we should (and can) hold onto some of this data for a shorter period of time than "forever". We will have to figure out what that looks like over time, it won't be a fast fix.
Like the IP address that someone used to upload a package in 2005 is probably not super useful or relevant in 2023
I'll concur, though where one should draw that line is probably... fairly contested. In the threat intelligence community, we can trace some of these threat actors back for over a year (or two). While this information is being continually funneled upward to law enforcement in an effort to prevent some of the more persistent and advanced attacks, I'm personally unsure of what a 'reasonable date' might be.
I can see a world where data retention policies are revised after more infrastructure is devoted to detecting and mitigating threats and malicious behavior on a first-party basis. As it stands right now, I would imagine the security@pypi.org endpoint (especially during the timeframe that PyPI knocked over registration of new packages and users) is likely receiving ~150+ emails per day. That isn't to say that all of those are different packages, but there's a spectrum of monitoring that goes on between numerous organizations, and I'm absolutely dead confident we double-triple tap a lot on some of these packages between orgs.
I think, but don't quote me that the security engineer role is going to have the task of making better report infra
that doesn't involve emailing 4 specific people via a mail alias 😄
(though presumably it will still end up ultimately emailing 4 specific people 😢 )
Yeah we were working on a homebrewed solution that would bounce incoming reports off our database to see if the package was reported or not, and just transparently 'flip' the status of the package to reported if it hadn't already been done. But it sounds like they're going to knock that out pretty soon. Which should help, even if it's just removing duplicating reports.
We should be open sourcing soon (sans our rules themselves), which will make our dockerized package scanning and IOC accrual fairly accessible to anyone that needs some solution. Not really sure of the use cases for our little setup aside from... scanning PyPI itself, but I've learned that open source tends to just take projects and run with them if they're useful. 
Hey @merry valve, @unreal jewel et al, I just wanted to introudce my colleague and good friend @solid fable to you. Juanita is an open source scientific Python developer and community manager who previously worked with us at Spyder for several years, and is now at Scientific Python and pyOpenSci, among others.
She's currently getting her Ph.D in cybersecurity at the University of California Santa Cruz, and for her research she's really interested in working on a project related to improving the security of PyPI and the Python packaging ecosystem, and wanted to reach out to you folks to determine what ideas would be most valauble and on the potential to collaborate on that. I'll let her take it away from here!
Hi everyone! Thank you for the warm introduction @latent pier. It's a pleasure to join this community! ✨ I'm Juanita Gomez, Computer Science PhD student at UCSC. As CAM mentioned, I worked previously as a Spyder developer and currently, I'm involved with Scientific Python and pyOpenSci.
I'm reaching out because I genuinely want to focus my Ph.D. research on something that will have a positive impact on the open-source Python community. Given my research focus in security and my involvement with the community, I believe that improving the security of PyPI and the Python packaging ecosystem presents a compelling opportunity.
I would love to collaborate with you on a project focused on enhancing the security of PyPI. However, I believe it would be beneficial to discuss and identify the most valuable ideas for such an endeavor. I'm open to suggestions and eager to learn about any ongoing initiatives or challenges you may have encountered in this domain.
Hey, welcome @solid fable, glad to hear you're interested! Any idea of the size or scope of work you want to take on? For ongoing work, we do have a backlog here that might give you some ideas: https://github.com/pypi/warehouse/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity
Thanks for the info @merry valve. I'm pretty open right now regarding the scope. I just finished my class requirements in my PhD so I will start my oficial research next fall and will probably be doing it for 3 years. For now I'm working on a survey paper to understand the literature regarding this topic but will have to present my oficial advancement proposal in a year.
hi, we (the scverse people) are wondering what this request is blocked on. We'd like to play around with that org
The underlying data models for the Organizations and org requests are currently being refactored to support the request flow. I don't have an exact timeline, but that's a prerequisite for some more org work and approvals.
I see, so it’s less ready than we thought, and we just have to be patient. Thanksfor your work and the info! I’ll pass it on.
Software security is a critical aspect of developing and maintaining reliable and safe systems. In the case of large and popular open source ecosystems, such as Python, ensuring security across a wide and diverse set of users and use cases can be a daunting task.
In this talk, we will discuss the challenges of applying security improvements to ...
Gee whiz question for the class-- if we're finding exposed credentials/secrets in PyPI packages that seem or purport to be legitimate business use cases, should we be... contacting these organizations directly to let them know they have exposed secrets, or should that be routed through PyPI to evaluate/be made aware of/potentially action in the event that the individual doesn't/won't respond to a random e-mail? 
I would do both
Fair point, hooray emails.
Interesting, GitHub planned in 2020:
GitHub Packages users will have access to a public and private PyPI package management server for distributing Python packages both publicly and privately within their organization.
But now:
This is no longer planned due to a change in our strategic priorities and the allocation of our resources towards higher-priority initiatives.
They abandoned their plans for all languages, not just Python
Do we know why?
Anything more specific than "we've reevaluated our priorities"?*
I doubt it
I assume there just wasn't that many corporate clients that wanted it
Maybe because they already have their own on-premise solutions
Yeah
All these registries already offer their own self-hosted solutions
@unreal jewel any idea what's going on here? https://twitter.com/ocefpaf/status/1667111866265378817
Yea, the owner transfered it to us, and I just aligned it with our other permissions without really thinking about it. It hadn't been maintained in ~3y and was archived on github.
gotcha makes sense
@ocefpaf First off, let me just apologize for not reaching out to you personally. I should have done that, and that was my bad.
It always amuses me why peoples first choice is to make a drama on social media. Why not just reach out directly to PyPI team to discuss...
The owner of the repo also commented on an issue saying this was happening. The tweet author even commented on that 4 minutes before the tweet above https://github.com/pypi/stdlib-list/issues/55#issuecomment-1584321437
@jackmaney Thanks for this package! We are using it at https://github.com/reinout/z3c.dependencychecker and it is great! 💯 It would be nice if there could be a new release with support for 3.10 and...
Will wasn't the owner, he is a PyPI contributor who was going to help get the repo setup
Ah apologies. Still seems a little disingenuous to go to twitter and try to stir an outrage.
knowing the author, they are one of the conda-forge core people, they are just annoyed about losing permissions to a project that they contributed to as a volunteer. Easy to fix though on hindsight. Thanks for the response, @unreal jewel
I can understand it, it probably doesn't feel great to have a permission bit removed with nothing but an automated email saying that it was done. Totally just an oversight when I was just blindly copying settings between two pages. I habitually remove even myself from org owned projects in favor of teams, etc.
yeah, totally, shit happens
wondering if this has happened to them before? I know people are more sensitive to stuff like this if they experienced it before
I can certainly understand why they felt the way they did given the initial circumstances, even if they weren't actively maintaining the project. On the other hand, the efforts to apologize and offering to make things right were, IMO, supremely well handled by @hazy wagon along with @unreal jewel and @finite pulsar — thanks for that! It's a little disappointing that both seemed to have a bit of a bad taste left in their mouths, but at least things were calm and were civil.
yeah, i'm very sympathetic to their position -- it looks like they put a lot of effort into PRs that had piled up due to the primary maintainer's inactivity, so suddenly being removed without any clear explanation probably really shocked them
because they conflate faceless companies with volunteer run organizations and treat the latter like the former deserve to be treated
On a technical side: Maybe it’s possible to show in UI what changes is effective permissions a change in configuration would cause.
Then one can read out to people manually before hitting the button, like “there’ll be an automated e-mail, don’t worry about it, we can add you back if you’re interested”
I would like to reuse a name for a package on PyPI for a different project.
(The existing one has been basically abandoned for years.)
I have permission of the original author who also made me an admin of the project.
Is there some "best way" to indicate that the project has been changed?
I would like to "delete the old releases". That is: if user comes to the project site, I wouldn't want them to be shown as older releases of my project. But I don't mind having them archived properly.
There is an article about requesting name transfer, but this has already been agreed with the owner and I have full rights already, so this doesn't apply:
https://peps.python.org/pep-0541/#how-to-request-a-name-transfer
And the second question: does pypi allow binary-only packages? We have something that I would call alpha stage. The SDK (written in C++) itself still needs to walk quite some path to become really stable/fully usable, and the python bindings themselves (written in pybind11) are even a tiny bit more experimental. We would like to make the python package easy to install, but we would like to wait a few more months before publishing the sources.
PyPI certainly has binary-only packages (for example, TensorFlow).
PyPI has no requirement to upload sdists
Re: project name take overs, there's no specific rules or best practices. Note that you cannot re-use filenames on PyPI, so if it's a project named foo, that had uploaded a foo-1.0.tar.gz previously, you won't be able to upload your own foo-1.0.tar.gz even if you delete the first one
So I would recommend starting your versioning at a higher version than the previous project had previously used
You could use an epoch number to separate your project from the old one.
That's already the case (with version numbers), so this is not an issue.
Awesome, thank you very much for the helpful link.
I think you’re basically set then. Maybe also yank releases from the old project. https://pypi.org/help/#yanked
I wonder if anyone has ever actually used the epoch
Not PyPI but at work we use epochs in our private repository
yanking is a good idea. do yanked releases still show in the UI?
Do you mean 306 projects?
no, 306 invidual releases
Wow, not a lot.
I was too lazy to narrow itdown by project on my phone
it doesn't surprise me many people don't use them, it's kind of a self fulfilling prophecy in a way, hardly anyone uses them, so most peopl don't know they exist or are weirded out by them, so then hardly anyone uses them
plus a non zero number of things in the world will get really confused by them, since a number of systems (not python ecosystem stuff generally at least) just blanket assume semver
I'm looking at using the PEP 691 simple API from PyPI. It seems like PyPI returns an extra "versions" key in the response data, with a list of versions. That's super helpful, but it's not actually specificed in PEP 691. Should I not rely on that being there, and process the filenames in "files" to get the versions instead?
Thanks! Is there a place I should look for documenation about the current API in total? I had a hard time finding that, sort of just searched through the Warehouse code to find stuff.
It was 700: https://peps.python.org/pep-0700/ "A new versions key is added at the top level."
I'm using this in a bit of JS in docs to check if they are for the latest release and show a banner if not.
Well
Technically it’s supposed to be the specifications section of packaging.p.o but we’ve never copied the simple api to it
I would imagine that slew of packages that just went up is probably a strong indicator that... there's another automated upload pipeline for malware distribution being tested. =/
Does anybody have an idea how I should go about requesting the permission to produce a look-a-like PyPI.org interface for an open-source project? The aim is that the design should feel familiar, but not try to pretend it is pypi.org (e.g. and cause people to enter passwords etc.).
PyPI appears to maintain an Apache 2 license, so you should be free to modify and distribute for private or commercial use provided you don't hold PyPI liable, trademark the product, and include the original license, copyright notice, and changes.
https://github.com/pypi/warehouse/blob/main/LICENSE
Thanks for the clarification. The use of the trademarked logos etc. is my biggest concern - especially if I produce derivatives to differentiate pypi from my look-a-like. For that reason, I would love to receive specific approval / confirmation. Do you know who has such authority to grant such a request, by any chance?
I think that’s mostly covered by the PSF
I know that the PyPI administrators lurk here reasonably often. But that might require a level of discussion above what PyPI itself is responsible for for derivative use of logos with modification.
https://www.python.org/psf/trademarks/
I'm not certain that PSF holds the specific trademark against PyPI (I would suspect they do, however.)
The psf-trademarks@python.org email address from the above URL seems to be the place you want to go though.
Also entertaining embed.
I'm not certain that PSF holds the specific trademark against PyPI
They do https://pypi.org/trademarks/
Aha! Cheers TP! Then it seems my advice holds true. The PSF trademark site will be your go-to! Good luck pelson!
If the goal is to not pretend to be PyPI.org, then it should be no problem for you to simply not use the PyPI trademarks (the logo and wordmark, the name, and ideally the exact colors/trade dress), no? That way, there is no need for any trademark permission, since you aren't using anything trademarked or potentially trademarkable.
Can I get someone from PyPI team to look into this: https://github.com/python-poetry/poetry/issues/8168 ? Is git LFS even possible with PyPI? I feel like it's not, but maybe I don't know about something.
It's not possible.
Yea that’s not a thing that can work
ok, thanks for swift response 😄
Apologies in advance if this is not the right place to ask this question, I will remove this comment if it isn't but I haven't found exactly where else to ask this hence asking it here. We (dClimate) have had an organization request pending on PyPi since May 23 and we were wondering what, if anything, can be done to expedite this process. We have quite a few open source packages waiting in the wings that we want to publish but would want to do so under our org 🙏 If we need to provide any information to verify our existence/relation to the organization we are more than happy to do so.
Is there a plan to remove JSON API at some point and only leave Simple API or is it just discouraged to use it?
ok
https://github.com/pypi/warehouse/issues/284 is related
I am working on reducing usage of JSON API in Poetry and it's leaking deeper than I expected lol
yea I don't think there's any risk we're going to delete the json api in the next several years
(specifically, the existing JSON API is not standardized so it's generally not recommended to integrate against, but it's not going anywhere anytime soon)
but random keys may or may not get deleted if they become a problem
well, our case is that we determine metadata per release based on JSON API (so basically the first wheel uploaded to PyPI) and switching to Simple API would require us to either analyse all artifacts for metadata and compare it or choose one artifact (kinda at random or some kind of sort by upload time) and base the metadata on that. the second option is kinda what PyPI is offering via JSON API anyway...
Sorry for being annoying here, just bumping this message in case it was overlooked. Is there a better place to ask this question or direct requests to? Is there an email, list, or some other channel?
was it a community org or a corporate org
At this point I don't quite remember how we applied but we are a company which creates open source software for community use (and contribution!) in GIS/climate space.
I think corporate orgs are currently all pending as we setup mechanisms to onboard them and get them onto a paid plan, and community orgs are being processed (not sure offhand if there is a back log of them or not)
I was only tangently paying attention to the plans around handling corporate vs community orgs, so I have no idea what the right classification for such a org would be
I believe the pending org request should say what kind of org it was
Fair enough appreciate that insight (maybe this can be a banner on the website? as I know many people, from a quick search on twitter have the same question aside from friends of mine asking the same). Any idea when those mechanisms would be setup and/or what the backlog looks like for community orgs processing?
Not sure I see any information here re: the category
looks like it was community
We have a few libs we want to publish but they are dependencies to other packages which we would also like to publish & pin those dependencies so we are sort of blocked for doing the ones downstream until the ones upstream are published. Right now everything is just pointing to github which isn't the best (I hope that makes sense 😅 )
and we have uh, like 500+ pending community requests it looks like. I'm not sure what stage going through them is currently in, just that there are some that have been approved
so that doesn't really help you answer your real question
That's okay I guess, if there can be something done to expedite any of these for example if communities need to provide additional details for verification or something that can make the job easier please let us know 🙏 It might be worthwhile having some of this information in a banner on the PyPi site under that organizations section as it seems somewhat opaque. Really appreciate the information
I thought a banner got added, but I see it was for corporate orgs - https://github.com/pypi/warehouse/pull/14046
I'm not sure what status the community orgs are in, but I think it would be fine to open an issue asking for a banner on them or something to more clearly communicate what the status of the backlog is.
Thanks so much @unreal jewel !
@valid flame There is no metadata per release, metadata is per artifact: https://pypackaging-native.github.io/key-issues/pypi_metadata_handling/#metadata-contained-within-artifacts
As mentioned in that link, PyPI will soon have an API to get that: https://github.com/pypi/warehouse/issues/8254
PyPI already supports PEPs 658 and 714, but artifacts uploaded prior to the rollout are not covered.
Yeah, I am aware.
There's intention to backfill 714 at least, once we know that there aren't show-stopper bugs with it.
My understanding is that the resolver model is based on that assumption, so you'll basically have to rework that as well as everything it influences (the lockfile format, some of the output, some of the build logic). Is that accurate?
well, one day, maybe. for now, most probable way will be to replicate what PyPI does, which is taking the metadata from the first uploaded wheel
Oh, I'm not asking for a timeline. XD
I'm asking if my understanding of the problem you have is correct.
well, kinda. Poetry model was built around the JSON API (at least from what I dug up)
so the change of what is provided is kinda groundshaking
well, I guess once the lockfile PEP gets accepted, we will have a lot on our plate... for now I am poking Sebastien to work on PEP 621 migration 😛
If my understanding is even a rough approximation of reality -- that'd be a few months of work, and with volunteer labor, that's years.
especially since we have quite a big market share and we can't do some stuff as fast as we would want
As @gentle yacht said, you've got the problem of actually having users. :P
exactly 😄
The hardest problem in Computer Science
@merry valve ^ Whenever you get the chance 🙂
Hey all. Is anyone available to take a peek at this PR for the warehouse? https://github.com/pypi/warehouse/pull/14127
Hey, this is on my todo list this week!
Sorry for the delay
Not a problem! Am just doing some branch juggling before I start some work and figured I'd ping and also post my first message in this channel 😄
@merry valve I corrected the formatting issue. Apologies for mistakenly removing the global search bar. 😅
Bear with me, these are fairly large changes. I was also working on similar things to add a Google IdP so there is some duplication I need to untangle.
Thank you, @merry valve Please don't hesitate to request changes to either reduce the PR sizes or attempt to untangle said duplication. I tried to be mindful of the PR sizes but I'm sure I could figure out a way to shrink them into smaller logical chunks. Whatever makes your life easier.
quick q
ERROR HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/
Wheel 'Red_DiscordBot-3.5.3-py3-none-any.whl' does not contain the
required METADATA file: red_discordbot-3.5.3.dist-info/METADATA
Should PyPI be lower-casing the name here? it seems incompatible with what I get when building with setuptools rn and the normalization change that I recall was supposed to have some deprecation period
yeah this is https://github.com/pypi/warehouse/issues/14202
working on it now
ah, sorry
@obtuse torrent no worries! fix should be out in 10-15m
release went through, thanks again
Hey all. Is there a review process specifically for adding a package to the warehouse codebase? I could use a graphql package for the OIDC work I'm doing but I believe I could also very easily craft the query and send it with requests as a POST to our API. This could be a 2 part question I supposed:
- Is there a previous desire/need to have a graphql package in the warehouse codebase that has been waiting for more use cases?
- Is there a strict security policy to follow to include a new package in the code base?
-
Not that I know of. Most of our http calls are either handled by a service-specific SDK (e.g. boto3) or
requestsalone. I don't know of a grahpql client library that is the best-in-class yet, and there aren't any other APIs that I'm aware of so far. -
Nothing strict yet, but generally we like to look at the project maintainers, license, and history as a few indicators.
for 2: as a PyPI dependency, it would also be designated a "critical project", requiring 2FA for the maintainers
but 2FA will be required for all projects by the end of 2023, so we're all enabling 2FA already, right? 🙂
@tribal sedge , thanks for that. I've written the code with requests so far and I suspect that will be just fine.
For point 2, I'm curious, are there any open source licenses that are not allowed in the PyPI warehouse code base?
@pliant obsidian thanks for that additional note. Is "critical project" a term in the PyPI docs that i can read more about?
there's something at https://pypi.org/security-key-giveaway/
Thank you!
and about requiring 2FA for all from the end of the year https://blog.pypi.org/posts/2023-05-25-securing-pypi-with-2fa/
That's really cool. Has the community been onboard with the idea? And does anyone know if other language ecosystems require this yet or plan to?
some folk were less than pleased when it was first announced for critical projects, but I think there was some confusion to do with the giveaway, and things seem better now.
also GitHub will be requiring 2FA by the end of the year. and npm for at least the top 500
Understood and sounds about right. But great that PyPI is doing this.
Hey all, question for whomever. Please, only off the top of your head answer though. Don't dig, I'll dig to find the answer eventually if needed.
I'm currently working in some Jinja templates here: https://github.com/pypi/warehouse/blob/main/warehouse/templates/manage/account/publishing.html#L211
I want to add a change along these lines:
{% if request.flags.disallow_oidc("disallow-activestate-oidc") == false %}
{% set publishers = publishers ++ ("ActiveState", activestate_form(request, pending_activestate_publisher_form)) %}
{% endif %}
I want to do that for both GitHub and ActiveState. The issue I'm having is that flags.disallow_oidc takes admin.flags.AdminFlagValue Enum class and I'm not sure how to get access to that in a Jinja template. I should be able to find it by digging but if anyone knows off the top of their head how I can do that, please let me know.
Oh i think I found it. Looks like I can add it to the View class that renders the template.
Ahh. I was still wrong. Turns out I didn't need to do that at all and I can just do
if not request.flags.disallow_oidc(AdminFlagValue.DISALLOW_ACTIVESTATE_OIDC)
in the Jinja template. Applogies for the noise.
Hey all. I had a question in one of my PRs a few weeks ago but haven't had a respones yet. Does anyone have time to take a look and advice if my idea is ok? https://github.com/pypi/warehouse/pull/14063#issuecomment-1633304299
Hey Carey, sorry for the delay here, this is still on my todo list, will try to get to it this week
If someone has the "Member" role in an organization, do they have permission to upload releases to any projects in the organization, or do they have to be added directly or as a team to a project?
The docs are a little unclear on exactly who has permission to do what regarding users/teams/orgs/projects combinations.
And is it possible to require all members of an org have 2FA enabled?
Not yet, but may become a moot point by the end of 2023 when we expect to require all users to have 2FA enabled.
No, because it doesn't explicitly say who can publish releases to projects in the org.
"Own/maintain specific projects" ?
Which specific projects? All the projects specific to the org? Or projects they've been specifically added to? And does own/maintain mean "publish releases", or "modify the project settings in PyPI"
Have you read this already? A lot of details are in there. https://docs.pypi.org/organization-accounts/roles-entities/
yes, I've read that, it does not clarify the table, it basically just restates it
in fact it just has the same table at the bottom
There are two problems with the table that I need clarification on. It deals with the organization overall, but there's no description of how permissions apply to individual projects in the organization. And it doesn't distinguish "publish release" as a type of action/permission.
The "Project roles" section makes statements to explain the difference between Owner and Maintainer, which solves part of your question
Can you explain it to me in different language? I'm obviously not getting it.
Let's say I'm an owner of an org, and another person is a member. Can the member upload releases to a project owned by the org?
Only if they are a Collaborator for the project, which they could inherit if the Team they are on is added to the project
Projects have Collaborators, Collaborators have Roles (Owner, Maintainer). A Collaborator can be a Member, or a Team (of Members)
Adding a Project to an Org doesn't grant any Members other than the Org's Owners the Collaborator role of Owner.
Does this help?
Thanks, that's the clarification I needed.
Great! If you think of some ways of expressing that more clearly in the docs, please send a PR!
@tribal sedge congratulations! https://blog.pypi.org/posts/2023-08-04-pypi-hires-safety-engineer/
https://github.com/pypi/inspector/issues/145
Tossing this up here, not imminent nor urgent.
This should serve the side effect of yanking code that has intentionally been whitespaced right (such as in malicious codebases) back into the parent element.
This whitespace wrapping is going to be the death of me. I cannot be the first one that has attempted to fix this. 🥴
Aha.
Hi, is slur an abandoned or prohibited project name?
Hello all, is there any chance PyPI changed the procedure for rendering the Description core metadata?
We received the following issue: https://github.com/pypa/setuptools/issues/4008#issuecomment-1670448675 saying that https://pypi.org/project/setuptools/61.0.0/ (and following versions) is not rendering correctly.
I checked on the wayback machine, and https://web.archive.org/web/20230329061036/https://pypi.org/project/setuptools/61.0.0/ seems to indicate the project page used to render correctly for v61 in 29/Mar/2023.
My hypothesis was that maybe there was a syntax error. So I did the following to verify (I am just verifying the latest version of the package, which also has problems to render):
addr="$(curl -sI https://files.pythonhosted.org/packages/py3/s/setuptools/setuptools-68.0.0-py3-none-any.whl | sed -n 's/location:\s*\(.*\)/\1/p' | tr -d '[\000-\037]\177')"
curl -s "${addr}.metadata" -o /tmp/setuptools-METADATA
tail -n 70 /tmp/setuptools-METADATA | pipx run rstcheck -
# ...
# Success! No issues detected.
This seems to indicate that the restructured text part of the file is fine.
My next hypothesis was that maybe the METADATA file itself had problems. So I did the following to verify (using on going work on https://github.com/pypa/packaging/pull/686):
rm -rf /tmp/.venv
python3.11 -m venv /tmp/.venv
/tmp/.venv/bin/python -m pip install 'packaging @ git+https://github.com/brettcannon/packaging@ef1be866e0f56939e50106f4393f4e9437bcc676'
/tmp/.venv/bin/python -c 'from packaging.metadata import Metadata; print(Metadata.from_email(open("/tmp/setuptools-METADATA", "rb").read()))'
# ...
# packaging.metadata.ExceptionGroup: ('unparsed', [InvalidMetadata("unrecognized field: 'license-file'")])
The validation complains about the non-cannonical license-file field, but no other errors. My understanding is that PyPI has been OK with projects using license-field for a while.
So my next hypothesis is that maybe this is related to the abscence of Description-Content-Type, but according to the spec this should be fine (assuming there is no error in the RST):
If a Description-Content-Type is not specified, then applications should attempt to render it as
text/x-rst; charset=UTF-8and fall back totext/plainif it is not valid rst.
I was wondering if anyone could help me to understand where is the problem.
It seems to be related to https://github.com/pypi/warehouse/issues/14064 - I did search the issue tracker before writing here, but I haven't seen this one because it was closed as solved 😝
The solution mentions "an admin page feature to rerender the page with the fix". I don't have the rights to access the management page for setuptools, so I checked the management interface of one of my projects, but I cannot see a button or link to rerender the page (the "options" drop-down in https://pypi.org/manage/project/validate-pyproject/release/0.13/ just shows Download, View Hashes and Delete).
Manually issuing a rerender of all versions between setuptools 61.0.0 and 68.0.0 is not an ideal solution :P, is there any other alternative?
closing loop on the above - https://github.com/pypa/setuptools/issues/4008#issuecomment-1671419873
I've requested for a company organization (it's now pending), although we are a research group at a university. It there any way to change it to a community organization?
(I'm not sure where I am supposed to ask.)
Just hit the 404 page for pypi. Well done all. 4 mins of my life i'll not get back, nor do i want it back. i love that sketch.
@merry valve have you done any work or had any thoughts about how to make the Pending Publishers table be more generic? I finally got to hooking up the ActiveState form to create a pending publisher and can now create but the table to display your pending publishers is specific to GitHub. Wanted to ask if you had anything in mind there before I try anything.
Here's a screenshot to job memories:
Note that shows a bug in my code that it allowed me to create two for the same named project. I'm fixing that.
could I bother anyone to take action on https://github.com/pypi/support/issues/3094 ? Mypy is a pretty important project and it's currently blocked from making any more releases
Project URL https://pypi.org/project/mypy/ Does this project already exist? Yes New limit 20 Update issue title I have updated the title. Which indexes PyPI About the project Mypy is a type checker...
I'll take a pass through the current queue of limit requests today
great, thanks!
Yes, I actually have a branch for this almost ready
I thought so. Cool. Thanks for the infos
General PR practice question that I haven't seen referenced in the docs: What's the practice around squashing commits before merging a PR? Internally at work we fairly aggressively squash commits, often down to just one for a PR. Is that desired? Something you all would want to avoid? No opinion and I can go ahead and do that if it's what i'm used to?
no need for you to do it, happens automatically on merge because we use the "Squash and merge" option.
Ahhh perfect. Thank you.
Hi! I stumbled upon a problem with pytorch caused by incorrect metadata in pypi's json - the wheels themselves are fine.
I wonder 1) how did that happen? Isn't json metadata based on the wheel metadata? Could that be a pypi problem? 2) is there a way to fix it for me as an external controbutor (pull request to pytorch that would fix it for the next release) or perhaps it requires a maintainer status on pypi?
🐛 Describe the bug Asking Pypi.org for a list of dependencies for torch does not provide a complete list of dependencies. See the list from pypi's json: ~$ curl -s 'https://pypi.org/pypi/to...
The /pypi/ API simply inspects the first artifact of a release and shows its metadata. It is famously inaccurate and you should not use it.
It is poetry that uses it in my case
https://github.com/pytorch/pytorch/issues/104259#issuecomment-1680225094
Is there an alternative that poetry should be using? I assume, other than downloading all wheels, a few GBs more than the one needed
Or maybe the solution would be for all wheels to have the same metadata? It should be OK for a windows wheel to have additional lines with platform_system == "Linux" and platform_machine == "x86_64" right?
this is exactly how it should be done
Hopefully easy question: I was going to get some work done on a flight yesterday, I'm too cheap to pay for wifi on the plane. When I tried to run the warehouse tooling with the make commands it hangs on trying to download some metadata for one of the images from docker.io. Has anyone else run into this and is there an easy way to work around it? Or am I doing something wrong? Maybe if I had run the make commands right before my flight things would have been updated and not needed to download metadata?
I have run these commands many times before so it wan't that the images we building from scratch for the first time (I don't think any way. I haven't dug into how the make commands are implemented)
Hrm, now that I'm on internet and running the command again, it looks like it's downloading a lot more than it normally would in this context. I think something else, not repo/tooling related, might have occurred and it needed to rebuild more than i was expecting. I'll remember to run the commands I need before my flight back home this time 😄
Beyond make build to build the container images for changes, run make initdb && make serve once to pull any dependencies and compile anything else needed to run the stack
Has the conversation about starjacking ever occured to any real capacity?
We've noticed an uptick in Thonny related packages being uploaded, utilizing the default Thonny metadata, and subsequently presenting themselves heavily as an official Thonny client. It appears they are benign educational packages from a class being taught somewhere in China.
The concern I have is that a novice user could easily search Thonny and unintentionally choose 'x-thonny' from the above image, which is directly utilizing Thonny's metadata to populate things like GitHub stars and whatnot. This is a common attack, as I'm sure you're entirely aware of.
My proposition for solving this would be to detect already utilized Github links, or allow critical packages to reserve their Github links in the package metadata, and refuse the upload of packages utilizing already-occupied Github page metadata, or to automatically peel this metadata from the package's source before it's allowed.
There's some extensions to this idea; notably that packages being uploaded through Trusted Publishing likely have the ability to enumerate the repository that they came from, and subsequently, assert that the owner of the account is also the owner of said repository. There's some caveats to this, but I think it does a lot to cut down on potential attack vectors such as this.
Deconflicting this would be remarkably simple for individuals, to simply assert that they are the owners of the repository which is linked; and largely would be automated for the vast majority of packages that are using GHA and Trusted Publishing to push their packages from GH to PyPI.
is there a site for download popularity per package version it seems https://pypistats.org/packages/typing-extensions doesn't have it
PyPI Download Stats
I'm looking to see how popular >=4.0.0 is vs <=3.10.0 to justify updating a min-version for typing_extensions.Self
Have you tried using the download statistics dataset? https://warehouse.pypa.io/api-reference/bigquery-datasets.html#download-statistics-table
Yeah I used it a while ago it's very easy to consume all the free credit
you can do things like pypinfo -pc 'pip==21.*' pyversion version and pypinfo -pc 'numpy==1.23rc3' pyversion version with https://github.com/ofek/pypinfo, see usage in the readme
add --days 1 to reduce amount of quota used
This indeed came up recently. The tracking issue dates back to 2020, and I made bascially the same conclusion that Trusted Publishing is probably the way to go, since that's the onyl true real link we have.
https://github.com/pypi/warehouse/issues/8462#issuecomment-1701741254
Trying to deduplicate becomes an arms race of sort, similar to namesquatting, and introduces more complexity to the data models and validation, so I'd prefer to stay away from that if possible.
Would you be interested in submitting a PR that takes advantage of Trusted Publishing, and only displays the GH sidebar if the release was uploaded via TP?
Wouldn't it be possible to do validation by requesting that the project upload a particular file to the repository?
@meager umbra not a bad idea - very similar to other confirmation mechanisms, but we've already created the mechanism for Trusted Publishing for GitHub, and that confirms that both sides are correct, and is specific to a release, not a project or URL
I'll work on it in the coming weeks; school is just kicking off for me so I'm juggling a few plates, that was more of a braindump than any sort of refined idea lol.
Shockingly entertaining that this conversation came up a mere 24 hours prior to... me bringing that up.
PSA: For any folks here involved in FOSS supply chain security and related topics that will be in or somewhat near-ish the Bay Area on Thursday September 28, @solid fable is looking for speakers for a panel on supply chain security in open source, to be held in University of California Santa Cruz Engineering Building 2 (room 506) from 2:50 PM - 4:15 PM PDT as part of the 2023 UC Open Source Symposium. You can follow up with Juanita via Discord at @solid fable , or by email at jgomez91@ucsc.edu . Thanks!
FYI, from #build, is there a way to see all taken filenames for a project, even if they were from before the project was released? build 1.0.1 seems to be unavailable (but 1.0.0 was fine...)
Also, what's the status of orgs? I requested scikit-hep immediatly back when they were announced at PyCON (Apr 23), and it's still pending. (Maybe I should have requested it from the scikit-hep user 🙂 )
This is https://github.com/pypi/warehouse/issues/12724, the short answer is: not via the PyPI web UI, but you could get it via the BigQuery metadata dataset: https://warehouse.pypa.io/api-reference/bigquery-datasets.html#project-metadata-table
we have a very large backlog and are slowly working through requests for community orgs. company orgs are currently on hold while we finalize details of the terms of service.
It is a community org. That's fine, no rush at all, just wondered what the status was. Given I submitted it within an hour of the announcement, that sounds like a scarily large backlog!
Okay, I'll see if I can set up the query, then, should be easy.
That's not showing a file with that name. I only see the 42 files I know about.
SELECT filename
FROM `bigquery-public-data.pypi.distribution_metadata`
WHERE name = "build"
This is really good stuff Mike, thanks! Helps us reconcile some of our internal statistics to PyPI's numbers for efficacy in our reporting/detection.
Glad to hear it, @spice hull ! 🤞 the next evolution of an API-based reporting will allow for y'all to get more direct feedback on actions taken as a result of a report.
should PyPI validate wheel tags on upload? All of the Mac wheels here have an invalid ABI tag: https://pypi.org/project/greenlet/3.0.0rc3/#files
I suppose that would mean referencing the entire list of known tags for all plats
😄 should've searched
This is I guess more of a maturin question than a PyPI one, but am I the first person to try to get maturin to use trusted publishers? Google+GitHub suggests maybe (by not showing me anything)
Anyone know how to trick it into using it? My latest attempt is to set MATURIN_PYPI_PASSWORD to "", I'll find out in... 3 minutes whether this time that works
My attempts are going here: https://github.com/crate-py/url/blob/main/.github/workflows/CI.yml#L192 in case anyone has done this before, otherwise I'll try a few times before giving up.
It workksssss. Hooray, I won't blow on it, lest it fall over. 4th time is a charm.
Curious, if you're already downloading the artifacts from a prior step, any reason to not use the trusted publisher action instead?
+1, is there some advantage to publishing with maturin?
~1,200 "security placeholder packages" uploaded by the Yandex Security Team
"to prevent Dependency Confusion attacks against Yandex": https://pypi.org/user/yandex-bot/
botocore a la carte pushes are my favorite.
Or tencent pushes.
Quite interesting and possible some work coming up for pypi, in regards to possible sanctions of yandex itself.
It's what the maturin template generates really is the only reason, but yeah maybe I'll look into removing it
could someone take a look at these two requests? https://github.com/vmware/vsphere-automation-sdk-python/issues/38#issuecomment-1780409031
Are there any PyPI admins around that can help with a friendly PEP 541 transfer?
Hey guys, can someone kindly help me with a banderswitch config issue?
I am only trying to download 1 specific version of python (all files for windows/linux/egg/tgz/ etc...) of Python 3.6.8 -- to host in my offline lab (using bandersnatch)
Does this config file do what I need?
Please verify that this config file below will only download Python python 3.6.8 (and no other versions)?
# cat /etc/bandersnatch.conf | grep -v '^;' | sed '/^$/d'
[mirror]
directory = /mnt/mylabnas01/repos/pypi
json = false
release-files = true
cleanup = false
master = https://pypi.org/
timeout = 10
global-timeout = 1800
workers = 5
hash-index = false
simple-format = ALL
stop-on-error = false
storage-backend = filesystem
verifiers = 3
compare-method = hash
[allowlist]
platforms =
py3.6.8
Just preparing a post to discuss.python.org for the next few days. Is it fair to say that warehouse has limited its scope to PyPI operations only, and that there is no intention for it to be used for anything other than PyPI.org, or are there plans that warehouse might one-day be a tool for running a repository in the same way that devPi is (and are there people already doing that?)?
I would say that's true. I am not an authority on the topic though
Warehouse is specifically the codebase for the official Python Package Index, and thus focuses on architecture and features for PyPI and Test PyPI. People and groups who want to run their own package indexes usually use other tools, like devpi.
You can use warehouse for other repositories but it is architectured for large scale which means it uses a bunch of services that you might not really need on a smaller scale. I think it's designed such that you can use it without some of those services (i.e. Fastly) but I'm not sure if that's the case for all and how it could affect the experience
same as Secrus, am not an authority on the topic, just trying to help in the meantime
PyPI is global scale and it shows in it's code
Warrehouse makes zero attempts to be usable outside of PyPI itself, it’s open source and we’re not going to actively try and make it painful if someone wants to setup their own instance. But we’re going to put zero effort into supporting it or thinking about it as a use case, so it probably won’t be the best experience to try and do that.
The stuff we have to make it possible to use without a specific service is mostly us trying to not integrate too deeply with one particular provider at any point in time, because we rely on donated services and those donations can always go away, so we try to add a layer of abstraction where we can to make it easier to swap if needed.
We also have to support running locally for development, so some services don’t have a reasonable local option so we need abstractions to make it possible to swap to a more local friendly option in development.
So to answer your question, it’s intended only to be for PyPI itself, but if someone wanted to put up with the above and run their own instance we’re not gonna be mad about it, we just won’t support that configuration either.
Thanks all. You've confirmed my understanding 👍.
Hello, I placed a question for bandersnatch, if anyone can help me --> https://discordapp.com/channels/267624335836053506/1169417293806194808
This is the #pypi channel.
There’s a #bandersnatch channel at the top of the channel list.
Also, anyone who’s not already in that other server, which is a fair number of the people here, are not going to be able to access that link. It’s just going to say “this link goes to a server that you are not a part of”. If you’re going to link to different servers, it’s good to let people to know which server it is, or even add an invite for it.
Oh you actually did post there first 
Thought I would follow-up to say that I posted something: https://discuss.python.org/t/a-pypi-like-interface-to-browse-and-search-packages-in-any-pep-503-compliant-simple-repository/38048.
Thanks again for your responses. 👍
Hello, Can someone please help me. I posted my question on stackoverflow
I am trying to create a offline repo for pypi for only python 3.6.8 packages with bandersnatch mirror. It downloaded everything but skipped the "requests" package, and I dont know why?
Please, can someone kindly help me. Thank you so much!
New blogs:
@pypi@fosstodon.org has completed its first security audit
Read all about it in this 3-part blog series:
https://blog.pypi.org/posts/2023-11-14-1-pypi-completes-first-security-audit/
https://blog.pypi.org/posts/2023-11-14-2-security-audit-remediation-warehouse/
https://blog.pypi.org/posts/2023-11-14-3-security-audit-remediation-cabo...
this is really great to see, props to everyone involved :)
How is test.pypi.org managed? We recently completed a friendly transfer of snowflake on pypi.org, but can't do any testing on test.pypi.org because someone unrelated to the real package is already a registered owner on test.
TestPyPI is a separate instance of the Python Package Index (PyPI) that allows you to try out the distribution tools and process without worrying about affecting the real index.
-- https://packaging.python.org/en/latest/guides/using-testpypi/
mostly it's not
well technically I think ou can still do PEP 541 on it, ut it's so low down the priority
Yeah, that's what I suspected. It'd be kind of nice if like the whole test.pypi.org db got reset once per day or some such. Anyway, for now, I won't worry about it, and will revisit if needed.
Perhaps a silly question but should I be able to tell what was the miss in this test coverage report? https://github.com/pypi/warehouse/actions/runs/6974763806/job/18980936684?pr=14063
I'm running the tests locally to try and get more information but they are taking a very long time to complete.
Ahhh line 242.
Any tricks for making the tests run faster and still get a full coverage report? I'm waiting close to an hour each change to see if I've fixed the missing coverage.
Or is there a way to get the coverage report without actually running all the tests?
I'm kinda surprised it's taking more than 5 minutes to run the test suite, since we worked pretty hard to make it speedy. Testing in GitHub Actions, which usually has fewer resources than most developer laptops takes about 3-4, and my fancy Mac M2 is only a minute or so.
Tell me more about your dev setup, OS, Docker versions, etc?
We also have the ability to pass specific test scope with T=tests/unit/accounts make tests which will run a subset, but the resulting coverage report may look weird, since not all code will be covered, but it might help some.
Yes, I've been doing this as I worked through broken tests but as you said, the coverage report looks wrong. I didn't take a very close look at it though. I'll take a closer look and see if it has the info I need buried in the other data.
oh that actually wouldn't be that hard

