#pypi

1 messages · Page 1 of 1 (latest)

tribal sedge
unreal jewel
#

Done

tribal sedge
#

Thanks!

ancient compass
#

Is there a twine channel? I just got the "please use a token email" and I'm wondering if it can support a web workflow instead

obtuse torrent
#

Are you thinking of it remembering the token after you login once or more so being able to login each time you upload without twine remembering your credentials as you can do until August?

#

I was personally wondering about the latter case because generating token each time in web UI, later copying it to twine and then deleting after upload seems more annoying than simply logging in with 2FA each time without having to the create and delete token dance

#

All I did was wonder about it though since I personally use single repo token in CI and have the whole workflow automated myself but it did seem interesting to me that this use case is going to be hurt in August

ancient compass
#

Ideally the web viewer would show a list of files to release and I'd just click "go"

obtuse torrent
#

so authorization each time then

ancient compass
#

Yeah but I'd already be logged in in the browser

obtuse torrent
#

Ye, that's similar to what I was thinking about

obtuse torrent
ancient compass
#

So like atomic releases and web auth

obtuse torrent
#

since the cookie would still be valid. I don't go to PyPI that often which is why I mentioned having to login

#

Anyhow, I don't have my stake in it, was just curious if it's similar to what I was once thinking about 😄

ancient compass
#

How long is the cookie valid for?

obtuse torrent
#

1 day

ancient compass
#

Ah fair

#

Well I'd rather do 2fa auth each time than save a __token__ on my keyring

obtuse torrent
#

If I used twine locally, honestly same

ancient compass
#

I tend to just upload stuff by the web UI then kidnap the next person who PRs it and make them setup automatic releases

#

"you broke it you bought it"

obtuse torrent
#

I mostly enjoy playing with CI stuff lol

#

GH has some cool things like environments

#

I like the feeling when I just click the button and all the things starts happening one by one as I watch it

#

when it works

unreal jewel
#

I thought it might be interesting to support a token that requires having re-authenicated to 2FA within the last N amount of time

tribal sedge
unreal jewel
#

conceptually a similar idea, but that's strict for the UI

tribal sedge
#

Oh, you're talking about the token itself for API

unreal jewel
#

yea

tribal sedge
#

Got it.

#

Same, but different. 😝

unreal jewel
#

I'm not sure if it actually makes sense to be clear, but in theory you could add what's called a "third party caveat", which is basically just "for this token to be valid, you have to get a second token from X service", then Warehouse could just have a 2FA endpoint.

That's a little different, because with that, anyone could attenuate a token, before passing it on to some(one|thing) else, to require 2FA, but might not be worthwhile over just making a web auth flow that will mint a short lived token that does the full username/pw + 2fa login.

tribal sedge
#

Hmmm. I'm thinking that it sounds a little bit like oauth2's access tokens vs refresh tokens, and when needing to use a refresh token, that's when the 2fa is invoked. Haven't seen it in practice yet, but something to think about.

#

Still needs some thinking for use cases like CI, where a user doesn't really input a 2fa there. I've been wanting to read more about the newer GitHub to AWS delegation via IAM roles, since that's an interesting auth flow which feels more sustainable

#

(for CI, of course)

unreal jewel
#

yea you just wouldn't add that caveat for CI

#

but for at least some CI providers, the answer is basically the OIDC stuff that is being worked on

#

that's probably the AWS thing you're thinking of too

tribal sedge
#

Aside, I often see OIDC and my brain reads "Oh, I don't care"

unreal jewel
#

I wish it stayed that way

#

OIDC itself is very ugh

#

but GH is using it to enable a bunch of stuff so

#

OIDC is the demon marriage of OAuth 2.0 and a cryptographic token standard called JWT.

#

anyways, the really interesting thing about third party caveats for pypi's token, is that you can implement them without warehouse having any knowledge of them what so ever

#

(the client of course has to be aware of them)

#

But since our API tokens are Macaroons, and Macaroons allow the holder of a macaroon to add an arbitrary additional restrictions to the token, with "third party caveats", you can do something like:

  1. Stand up a service called say "GithubUserInSpecificOrg" which when a user logs in, will check if they're a member of a specific organization.
  2. Create an API token on Warehouse, then locally you add a third party caveat that says "to validate this token, you also need a second macaroon that validates for X secret key"
  3. Give that attenuated token out to people.
  4. When people go to use that token, their client would have to take their PyPI token from (3), sign into "GithubUserISpecificOrg", get the token from that, then submit both to Warehouse
#

This works basically because you can store encrypted data in a macaroon, and a macaroon's trust is rooted from some secret key, so you store an encrypted caveat that just basically says "to validate this caveat, the person needs to send a second macaroon that is valid for X secret key", which the bearer won't be able to access that secret key because it's encrypted, but the person who made the token in step (2) or Warehouse itself can.

The person who made the token in step (2) gives that secret key to the GithubUserInSpecificOrg service, and whenever the person logs in, it just makes a short lived macaroon from that secret key.

#

I'm not sure anyone would ever actually use this capability! but it wouldn't be hard to support in Warehouse, we basically just need to accept the additional macaroons instead of just the 1

ancient compass
#

I just want to add a GitHub user or org to a pypi protect and let any authentic CI run upload ala coverage tools

tribal sedge
# unreal jewel https://fly.io/blog/api-tokens-a-tedious-survey/ is pretty interesting

Thanks for this, it's a really nice dive into a lot of the complexity. I appreciate the author's perspective on 'random tokens are underrated' which is probably true for most services that have a relatively simple go/no-go authorization policy.
It seems to me that most of the added functionality for Macarons and Biscuits is for when you want to delegate some of the authorization policies to another party, hence the term "third party caveat", I think I get it now

I wonder if there's examples of Macaroon caveats that are used outside of a tech-centric ecosystem (i.e. platforms like pypi or GitHub), in a more consumer -focused integration universe. 🤔

lethal meadow
merry valve
lethal meadow
#

Thanks @merry valve and @unreal jewel ❤️

tribal sedge
#

@unreal jewel if I were to try again for the MathJax implementation, is there something else I'd need to consider for the cache busting to not break that page load? (I'm also unclear why the page details needs the JavaScript block cache busted...)

unreal jewel
#

any static files hosted by warehouse have to be included the in static manifest

#

that's the thing that turns a path like "foo.png" into "foo.cachebust.png"

#

@tribal sedge if my memory is correct, you need to make sure that the files will be picked up by manifest

#

er

#

you might be able to just add some ** in there, or just add the other vendor folders?

tribal sedge
#

Thanks for the pointers, I'll give it a shot!
Curious - how much should I balance putting into the vendored approach vs the CDN approach?

unreal jewel
#

I think CSP3 supports putting the resource integrity hashes for external resources in the script-src instead of putting the domain

#

I'm not sure what browser support looks like for that

#

but if browser support is there, I think all the arguments against cdn approach go away?

tribal sedge
unreal jewel
#

I think you don't include the url if you do that?

#

in the CSP policy

#

because I think just the URL allows anything from that domain

tribal sedge
#

it all worked out - I might try it again without the URL and only the hashes - but the connect-src needs to be there since the js downloads other files

#

and I don't want to allow anything from the domain, only from the mathjax paths

#

well, I'll keep going with the vendored approach for another hour or so before flipping back to the CDN approach

#

oh yeah, this won't work - unless the internal references in the mathjax code get updated to point to the cachebust hashed resources.
I'll go back to CDN-style and try my best to trim as much as possible from the CSP.

tribal sedge
#

I submitted a new PR, this one using CDN.

ancient compass
fleet grove
#

what exactly is last_serial in the json api? I can't find good documentation on what it represents and how to mimic it. Additionally, when trying to find the source for xmlrpc.client I couldn't find that either 😅

obtuse torrent
#

it's mostly useful for the events api

fleet grove
#

but where is the source of xmlrpc.client

#

it uses that as the example, but i cannot find the module

obtuse torrent
#

xmlrpc.client is part of stdlib

fleet grove
#

ah

violet fable
#

Wait what? I don't think Black is its own organisation on PyPI?

plucky quest
#

it has started to be rolled out

#

and lukasz is involved with the PSF

#

so perhaps it has

violet fable
#

Huh, I'll ask him about it then

obtuse torrent
#

nah, it's happening on any project that has a sole owner

violet fable
#

Would've expected an email if they converted projects to organisations

obtuse torrent
plucky quest
#

I think they are gathering the initial feedback IIRC

obtuse torrent
#

before it gets like, deployed to PyPI

plucky quest
#

humokay

obtuse torrent
#

all the test rounds until at least 3 were happening outside of regular PyPI

plucky quest
#

I may be mistaken

#

alright then

obtuse torrent
#

saying until 3 since that's the last one I've been on

#

lol

plucky quest
#

it seems something at least partially got deployed because of this message though 🤣

violet fable
#

yeah, very confusing messaging for sure

#

"your organisation" ... but what organisation lol

violet fable
obtuse torrent
#

well okay, I don't know if it isn't perhaps deployed to PyPI but like, disabled through env var or sth

violet fable
#

I'm an owner of the project on PyPI alongside Łukasz

obtuse torrent
#

yeah, nevermind on the "sole owner" part, it's like that on all projects

#

I guess this is the UI for transferring projects from individual account to an org

violet fable
#

but the messaging claims there's a pre-existing org though hahaha

obtuse torrent
violet fable
#

I'll shut up :)

obtuse torrent
violet fable
#

I agree this is probably the UI for managing the organisational ownership of a package project.

#

.. and anyway the second box goes counter the first one signifying that the shadow organisation the first says exists actually does not lol 😆

obtuse torrent
#

Seems like they only considered how it will look if it's an org-owned project

#

so two problems:
a) "Cannot remove project from an organization" should only show if a project is in an organization.
b) Both of these should not be shown when AdminFlags.DISABLE_ORGANIZATIONS is set

#

ah yes, warehouse's infamous check for generated translation files has failed, time to run the docker thing for that

#

oh no, it doesn't support podman because it uses a docker lib that connects to the socket

#

or maybe it just doesn't like being ran sudoless

#

yeah...

#

that's kinda weird for podman tbh, maybe the docker compatibility layer does that

violet fable
#

Is there a public list of prohibited / reserved names on PyPI somewhere? Presumably not, but I can't find the issue where its visibility was clarified :/

merry valve
#

There is not.

violet fable
#

Understood, thanks!

wet flax
gentle frost
tribal sedge
#

for folks that develop on warehouse, is there any appetite to include pre-commit that can run some actions in isolated (non-container) environments for fast-catch things like black? I find myself running either the tests or lint checkers and sometimes missing one or the other, only or have the CI catch my error.

#

I can imagine adding other checks like editorconfig to help prevent annoyances from creeping in

unreal jewel
#

I personally very much dislike pre-commit, and I think it's existence in a repo is a footgun for new contributors

valid flame
unreal jewel
#

pre-commit wants you to install it as a pre-commit hook, which makes actually contributing to the repository awful (git commit becomes super slow, randomly doesn't work if you have a linting problem, etc).

Everyone I know who works on a project that uses pre-commit says "oh I never install the hooks, that would be awful, I just manually run it".

Except that requires knowing that you can do that. If you follow the setup instructions on the pre-commit page, they treat installing the hooks as a mandatory step, and manually running command as some optional thing. Which means that someone who isn't familiar with the tool, just following along the setup instructions is almost practically guaranteed a frustrating experience.

#

I'm very much not a fan of tools where the "golden path" has problems like that, and seemingly nobody actually uses it that way. It just ends up being a trap that ensnares unsuspecting people

pliant obsidian
#

Everyone I know who works on a project that uses pre-commit says "oh I never install the hooks, that would be awful, I just manually run it".

I usually install the hooks, the first run is slow, but subsequent ones should be quick: it's important not to put very slow things in there

unreal jewel
#

(Actually the first time I ever encountered pre-commit I just quickly ran through the setup instructions and didn't pay attention to what it was doing, so I didn't notice it was installing git hooks, as soon as I noticed that the project I was trying to contribute to's dev setup broke my git commit using that, I got mad enough I just deleted all my local work and moved on instead of contributing the fix I had for a bug)

pliant obsidian
#

pre-commit wants you to install it as a pre-commit hook, which makes actually contributing to the repository awful (git commit becomes super slow, randomly doesn't work if you have a linting problem, etc).

if you don't want to install it locally, it's useful as a pinning mechanism for the CI, the pinning should help avoid random linting problems

unreal jewel
#

we already pin our linting

pliant obsidian
#

okay, that's good!

#

one nice thing about pre-commit is it has a command for auto-updating the pins

unreal jewel
#

all our dependencies are generally pinned, so it seems more useful to have tooling that works generally across all of them. Lint dependencies don't really feel special to me

#

FWIW, even if you remove the speed aspect of it, I also just generally think linting as part of git commit is fundamentally the wrong thing to do

#

it frustrates me for the same reason go erroring out on unused variables frustrates me

#

it assumes that the only state code can be in is a final state

#

and there are no interim states

pliant obsidian
#

personally I don't mind the "fail early" aspect of it, but totally agree it's not for everyone.

and even if using pre-commit, if you don't like that aspect, you don't need to install it locally and can let the CI deal with it

#

another thing I personally like: if also using pre-commit.ci, it can send updates to PRs with lint fixes

for example, a contributor comes along and they didn't run Black. pre-commit.ci runs Black and updates the PR for them

tribal sedge
#

@unreal jewel thanks for sharing! I had no ida it was that polarizing - I've had an entirely different experience

#

I've enjoyed using pre-commit so much, and noticed that some other repos in pypa-land use it, so I thought I'd ask opinions

unreal jewel
#

Yea it was one of those that I happened to trash my entire local repo out of frustration

#

I also am not a huge fan of having ci automatically change peoples PRs

#

I’ve seen that before and it ends up causing merge conflicts when I’ve added more changes locally

#

Which isn’t the end of the world, but frustrating

obtuse torrent
#

yeah, to me the ci automatically changing PRs is likely to cause confusion to users who aren't experienced with git

#

like, it's not great to expect the user to know what to do if they made additional changes locally after some automatic tool pushed new change to their PR

gentle frost
#

oh wow nice, I thought I was the only one that strongly disliked pre-commit

obtuse torrent
#

git pull --rebase is (probably?) the best option in most cases but I'm guessing it might not be the first option a new user may find when trying to solve this

#

and it can introduce conflicts anyway

obtuse torrent
#

But (I think) I make it clear in contributing guidelines that it's optional to install

#

and also tell people how to run it separately

#

though looking at it, the current phrasing I have is:

(optional but recommended) Install pre-commit hook which automatically ensures that you meet our style guide when you make a commit:
which does still say it's recommended

unreal jewel
#

My big thing is the more stuff you cram into pre commit the less optional it is

obtuse torrent
#

but at least I describe later on how to check:

If you've done the optional step of installing a pre-commit hook 4.1 Setting up your development environment section,
you actually don't have to worry about anything as all of these style checks are ran automatically whenever you make a commit. However, if you chose not to, you can:

  • run all hooks on currently staged (git added) files with:
pre-commit
  • or run all hooks on all files with:
pre-commit run --all-files
unreal jewel
#

It also tends to break editor integration from what I can tell unless you’re very careful to never configure your linters via pre commit

obtuse torrent
obtuse torrent
#

since I already describe how to run it separately

unreal jewel
#

Sure but what’s the value add over nox or tox? Neither of those have the problem where unsuspecting users might accidentally install it as a hook

obtuse torrent
#

I personally like having the commit hook

#

so it allows me to do that

#

and tox adds a lot of overhead

#

so it's unsuitable as a hook

gentle frost
obtuse torrent
#

speed-wise

#

it's not going to run in less than a second

#

while ensuring consistent environment

gentle frost
#

🤔 in both cases a virtual env is used

obtuse torrent
#

it's not designed for running as a pre-commit hook

#

pre-commit runs on staged files

gentle frost
#

Hatch (and I've heard tox4) is as fast as you want

obtuse torrent
#

tox 4 is not released :)

#

But yeah, it's not that it wouldn't be possible to do this with enough effort. But it's not going to be as simple as me running pre-commit install after cloning a repo

#

like, do you even have functionality to run only on staged files

unreal jewel
#

I’m glad that workflow works for you

obtuse torrent
#

or is it another thing I would have to implement myself

#

oh ye, sorry, this is #pypi channel aniblobsweat

unreal jewel
#

It’s completely opposed to how I want tools to work tbh

#

But cool that it does work for you

obtuse torrent
#

Yeah, to me it's just that pre-commit install is an optional step

unreal jewel
#

I don’t think I’ve ever had a positive experience trying to use a pre commit hook in git

#

No idea why those two experiences are so different

#

I hate anything that either slows down commit or prevents me from making less than perfect commits

obtuse torrent
#

So you can just do pre-commit run --all-files same as you would tox lint or hatch run lint or something else while also giving additional benefit who do want to have pre-commit hooks

unreal jewel
#

I often make wip commits for instance

valid flame
#

git commit --no-verify

#

and pre-commit is off

unreal jewel
#

Sure but that requires me to think about whether my commit is perfect or not

#

Or I git commit

#

It fails

#

Then I get mad

#

And i do the just commit the damn thing flag

obtuse torrent
#

I may not do it enough for this to be a problem but when I do it, I'm fine with --no-verify. But it does sort of depend on how many things you put in pre-commit configuration. Formatting with black, isort-ing, spurious whitespace are the sort of things I would want to do before commiting anyway.

#

But I recognise it's not for everyone which is why I list pre-commit run --all-files

unreal jewel
#

I also often times comment out large blocks of random code and that tends to make linters really mad

obtuse torrent
#

Anyhow, I actually came here to say I dislike CI autofixing as well, I kind of went on to discussing this which wasn't really my intention but ehh, it happened

unreal jewel
#

I mean it’s cool tbh it’s interesting to see other perspectives

#

I don’t care if other people use it

#

I’d just prefer warehouse doesn’t because then I’ll end up having to use it 🙂

obtuse torrent
#

I consider it a replacement for running each auto-formatter separately with the pre-commit hook being a nice (optional) bonus

tribal sedge
#

Thanks for the conversation! It was in the back of my head, not remembering that someone had already started this effort, and I’d been reviewing it all along. 🤦
Here’s the context of what the proposed implementation looks like so far https://github.com/pypi/warehouse/pull/11309

vapid coral
#

What I want (and may be coming in tox4? Or, may be possible now & I've missed it?) are tox meta-environments. Declare [env:lint] meta = flake8, mypy, interrogate and then tox -e lint runs those three environments.

midnight shard
#

Yes this isn't a thing just yet

tribal sedge
merry valve
#

Thanks for the boost. We have a big backlog of account recovery requests and are quite aware of delays here. We're working on some things to improve this but don't have additional details right now.

storm adder
#

Hey @merry valve I'm from Sourcegraph and we currently index ~4600 Python packages. We would like to increase that to 404k in the near future.

With that said, we want to be able to work together so we don't hammer your origin servers.

I attached our current config and was wondering if you could help provide a URL/endpoint that would be able to handle the large amount of requests we would be making.

Thanks

gentle frost
#

why not have a mirror?

midnight shard
#

Yeah seems like you'd want a bandersnatch mirror you run yourself

unreal jewel
#

It's pretty hard to hammer /simple/ hard enough that we notice, that page is cached really aggressively and has limited variants, but in general limiting concurrency is a good thing.

#

the /pypi/*/json endpoints are OK, but you need to limit concurrency or your overall rate, especially if you're hitting version specific URLs for all of PyPI or something like that.

#

The XMLRPC APIs are really bad and you should do whatever you can to not use them, but if you have to use them, you should do so really slowly

storm adder
#

thanks all! @unreal jewel this is great info, I'll keep you posted.

merry valve
#

What Donald said! Also, please put something identifiable in your user agent so we can know who to contact if there's issues

storm adder
#

will do @merry valve

unreal jewel
#

Yea, if we can't find a way to contact you from the requests you're making, and you're causing problems we will just block your IP addresses and wait for you to contact us 😄

storm adder
#

lol

fringe pine
#

Shoot, I just remembered that Poetry uses the default requests User-Agent 😆

#

Should probably fix that sometime

gentle frost
gentle frost
sharp sapphire
#

Hey everybody! Currently doing a release for the PyTorch project and running into some issues uploading binaries. Turns out our binaries are a bit too big and we are in need of a size increase, is there anyone who can help expedite this to unblock our release?

Would be much appreciated!

https://github.com/pypa/pypi-support/issues/2341

fleet grove
#

I'm not sure where this occurs but where exactly in the build/upload prcoess does a Project-url with a home-page label become the homepage (without a hypen) on pypi?

#

is that label special cased in build tools, or does pypi change it from home-page to homepage?

obtuse torrent
#

Home-page is core metadata field

#

So PyPI transforms it into Homepage keys in the urls dict

fleet grove
#

so where would it be provided in a pep 621 configuration, and as what label?

#

I think it would be named Homepage in the [project.urls] toml dict

fleet grove
#

yeah but i mean

#

one sec

#

is there any benefit to continuing to provide Homepage in [project.urls]?

gentle frost
#

yes PyPI uses that table

fleet grove
#

yeah

gentle frost
#

pyproject.toml/pep 621 only support urls with that table

fleet grove
#

but any benefit to have the url with the label Homepage?

#

the only reason i can think of having it is for when packages want to look at the metadata of a package and get the "main" url

gentle frost
proud bison
#

While we're talking about it: i started specifying URLs like this, because why should they have their own section?

[project]
name = '...'
urls.Homepage = '...'
urls.Source = '...'
valid flame
#

From TOML side, it’s the same thing

proud bison
#

Of course, that's why it's an option.

fluid cypress
#

Hello 👋 Wondering if there's an up-to-date index (JSON, CSV, or otherwise) of all packages in PyPi?

grim rose
#

I currently can’t accept a project invitation, the error is simply “something went wrong, please check the PyPi status page” - any info on that? Status says everything is operational

agile sinew
#

Is there a reason that yank permissions require an owner? A maintainer can make a bad release and can’t yank it if it’s bad. I understand delete permissions being a bit stronger maybe, but yanking is pretty safe?

#

This happened on ninja today. It’s busted and we likely can’t get an owner to yank till Monday.

#

While I am a maintainer and the person who made the release is a maintainer too

merry valve
#

I currently can’t accept a project

#

Is there a reason that yank permissions

gentle frost
cinder dome
serene fern
#

It used to be the norm before PyPI merged the release creation and file upload API

#

Before the change you used to need to first create an empty release and then upload files for it

cinder dome
#

Thanks @serene fern, I see: the devs might have forgotten to upload the file.

pliant obsidian
wicked wind
#

I still lament the loss of the Cheeseshop name.

gentle frost
lament needle
#

Hi folks, is there a place where I can find some upload stats? Such as the number of new packages submitted per hour?

merry valve
#

For example, here's uploads per day for the last 30 days:

SELECT 
  extract(DAY from upload_time) as d,
  COUNT(*) as c
FROM
  `bigquery-public-data.pypi.distribution_metadata`
WHERE
  DATE(upload_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  AND CURRENT_DATE()
GROUP BY d
ORDER BY d
lament needle
#

@merry valve, thank you! I will play with it.

unborn nacelle
#

if i wanted to download as many metadata files (json is also good) as possible, how would i best go about that?

#

additionally, if i also wanted to download the metadata files for the n most popular packages (where n is maybe 10000), what would be the easiest way to do that

merry valve
#

Depends on what you're doing. Do you just need the metadata, or do you need the actual files?

unborn nacelle
#

just the metadata

merry valve
#

How do you measure 'popularity'?

unborn nacelle
#

the exact metrics isn't important, "downloads last month" probably

merry valve
unborn nacelle
#

i want to tst code i wrote and also collect some statistics about what edge cases are actually used in the wild

#

i'm aware of the bigquery sets, but can i actually use them to download large batches of metadata?

merry valve
#

What exactly do you mean by 'metadata'? Usually this refers to things like the project name, version classifiers, etc.

unborn nacelle
#

(i've previously used bigquery for version numbers, it's really neat that this exists!)

#

PEP 508 metadata

merry valve
#

All of which you can get from a BigQuery query

unborn nacelle
#

oh that's neat!

#

thanks!

unborn nacelle
#

is it expected that mirroring name, version, filename and requires_dist from the the-psf.pypi.distribution_metadata dataset takes ~1h?

#

i'm just using the default client.list_rows from the bigquery python client:

#
from pathlib import Path
from tqdm import tqdm
import json

from google.cloud import bigquery

client = bigquery.Client(project="jupyter-local-project")
table_id = "the-psf.pypi.distribution_metadata"
table = client.get_table(table_id)

selected_fields = [
    field
    for field in table.schema
    if field.name in ["name", "version", "filename", "requires_dist"]
]
with Path("pipy_requires_dist.ndjson").open("w") as fp:
    rows_iter = client.list_rows(table_id, selected_fields=selected_fields)
    for row in tqdm(rows_iter):
        fp.write(json.dumps(dict(row)))
        fp.write("\n")
#

(i want to parse the marker fields, so i can't do it in sql)

#

the script i spends most time idling, io/network are not a bottleneck

#

total table size is 8,165,801 fwiw

merry valve
unborn nacelle
#

i kinda wanted to avoid setting up cloud storage/job handling/etc just to fetch those rows

#

what's confusing me is that i'm not even making a lot of round trips, i always spend many seconds until the api send the next batch, which is then quickly downloaded and written

cobalt grail
#

drive by comment: the layout shift of pypi.org because of the banner on each page is a bit annoying. I always move my mouse into the wrong place initially because things shift away

unborn nacelle
#

Would it be feasible for pypi to indicate whether the metadata of all wheels is consistent? E.g. if i check https://pypi.org/pypi/numpy/1.2.0/json, it would be nice to have wheel_info_consistent key that is true if all wheels (an with metadata 2.2 also the sdist) have the same metadata as the info key

serene fern
#

Feasible yes, but first PyPI needs to learn how to extract metadata from a wheel (which is in progress but stalled a bit due to security considerations, see attached link).
https://github.com/pypi/warehouse/pull/9972

#

And if you want that functionality in the API response you’d want to write a PEP

unborn nacelle
#

i see, thanks!

wet flax
#

(there is interest in that, FWIW)

merry valve
#

(RE: metadata consistentcy)

merry valve
unborn nacelle
wet flax
#

What do you wanna change about them?

#
  • guy who just read through them a lot more than he'd like a couple of weeks ago
unborn nacelle
#

for pep 440, i've written that down in the second part of https://cohost.org/konstin/post/514863-reimplementing-pep-4

#

For PEP 508, fix the grammar in the main document (the parsley grammar seems correct, but parsley is unmaintained and imo the main text grammar should be the reference)

merry valve
#

Nice writeup! I love this line:

So python has a hard time doing modern packaging because they were trying to do modern packaging before it was being invented.

unborn nacelle
#

Also for PEP 508, restrict the marker grammar to essentially wsp* env_var wsp* marker_op wsp* python_str | wsp* python_str wsp* marker_op wsp* env_var

#

on a related note, packaging and thus pypi seem to allow the non-PEP markers os.name , sys.platform, python_implementation and platform.python_implementation (that's an exhaustive list from the entire ~8 mio. release - thanks again @merry valve being able to use bigquery is great)

wet flax
#

Where are these numbers from:

For comparison, firefox estimates 16–20 minutes for the semver spec, but 57–73 minutes for PEP 440.

#

Is that reading time?

unborn nacelle
#

yes, this reader icon

wet flax
#

nods.

unborn nacelle
#

was the quickest proxy for i could find, copying and counting words doesn't work well for these kind of docs

wet flax
#

Yea, makes sense tho. PEP 440 does go into a lot more detail + discussion.

wet flax
unborn nacelle
#

sure, and we have more historic complexity to handle, but i still think it could be simplified quite a bit, both the content as well as the text

wet flax
unborn nacelle
wet flax
#

PEP-345

unborn nacelle
unborn nacelle
#

re PEP 508, i'd also like to define clear rules which fields are PEP 440 and which are stringly typed. If i define implementation_version, python_version and python_full_version as PEP 440 and the remainder as stringly i see a lot of comparisons that are mostly accidents or will often not do what was intended

#

Things like platform_release < '12.0', platform_machine >= 'armv0l' or python_version < '3.8.'

#

imho pypa/packaging should at least warn when there's a marker such python_version < '3.8.' that is clearly a typo

#

and there's numpy==1.2; python_version >= '3.4, <3.7' which is kinda fine but also a challenge to parse

wet flax
#

nods

#

Those all sound reasonable improvements.

#

*like

unborn nacelle
unreal jewel
#

I don't think it's particularly worth trying to remove stuff from PEP 440

#

the pain of extra stuff is mostly borne by a handful of people who are writing packaging tools, and typically even a smaller set of people who are writing the version parsing stuff

#

the pain of removing stuff is borne by the wider ecosystem, and the benefit feels nebulous to me.

valid flame
#

Also, removing stuff is harder than adding it, since people rely on it

serene fern
merry valve
merry valve
serene fern
merry valve
serene fern
unreal jewel
#

Well

#

PEP 658 vs PEP 691 is an interesting thing to consider

#

I don't think PEP 691 definitely kills 658

#

Like there's 0% chance that we put everything that PEP 658 exposes into the simple index, since it includes stuff like the entire long description for every file ,etc

#

So something we have to decide is whether the extra information that 658 exposes is still useful, or if the only thing that really matters is the limited subset that pip would want in PEP 691

#

even in that case of that limited subset, it'd still be useful to determine how much that would bloat the PEP 691 response (with and without gzip), and whether having a larger response is better or worse than multiple smaller responses

#

I do think we need to fix more than just the way Warehouse stores metadata, the source of metadata inside Warehouse does not have to match the metadata inside the artifact itself

#

IOW, you can have metadata on Warehouse that thinks it depends on X, but when you read the metadata inside of the artifact, it says it depends on Y

serene fern
#

If nothing else, PEPs 503 + 658 is still a much easier solution for alternative indexes

#

But for PyPI specifically I think it makes sense to only expose what’s relevant for pip

valid flame
obtuse torrent
unreal jewel
#

if clients can't understand it in a meaningful way anymore, then it's major

pseudo bramble
merry valve
pseudo bramble
#

yup!

valid flame
#

I am no expert, but I'd say that this is what Simple API was made for. It provides you direct links to packages

merry valve
# pseudo bramble yup!

We don't make any guarantees but generally they don't change (and when they do we provide redirects)

#

The Simple API will always have the 'most accurate' file URLs for a given release, though

dire vessel
unreal jewel
#

practically speaking we've never broken a pythonhosted URL that was linked from a simple index page unless the file was deleted

#

but as di mentioned, that's not a promise in our API structure

valid flame
#

I was looking through the issues, but didn't find any answer. Is there any plan/feature request to add search API? basically nothing PEP-based, just the same output website search has, but in JSON for it to be easier to parse? In theory one might take Simple API index page, but for PyPI it's like ~23MB, so even the downloading could be slow on lower bandwidth. Another option would be to parse search results HTML page, but that's either requiring one of those big libs for HTML parsing, writing parser based on html.parser package or (with proper offer to dark forces) regexing that stuff.

gentle frost
frail spindle
#

thanks

dire vessel
#

does anyone know where requires_dist comes from for a .whl file in the pypi BigQuery dataset? Is it from the wheel's metadata?

#

Could different .whl files for the same package+version have different requires_dist values in the dataset?

serene fern
#

Yes and yes

dire vessel
#

great, thanks

dire vessel
#

if my query is correct, out of 310,000+ latest versions of packages that ship wheels, there are 275 that ship wheels with different sets of requirements

proud bison
unborn nacelle
#

is there a preferred/dedicated way to check whether there a new releases or new files for a larger number of projects compare to the local cache (i.e. an entire lockfile)? Currently i'm sending a lot of parallel requests against https://pypi.org/simple/{project}/?format=application/vnd.pypi.simple.v1+json with If-None-Match but it feels wasteful making 100 requests for that

merry valve
#

is there a preferred dedicated way to

fleet grove
#

I'm reading https://warehouse.pypa.io/api-reference/ and trying to make sense of the following line

Requests to the JSON, RSS and Legacy APIs also provide an ETag header. If you’re making a lot of repeated requests, ensure your API consumer will respect this header to determine whether to actually repeat a request or not.
I understand this provides an ETag header, but I'm not sure how to provide this header to the API when I make a request

#

it just seems to always return a 200

#

in addition, the releases key is deprecated from the json API, but the simple api in json form does NOT provide nearly the same amount of information

meager umbra
#

Have you tried the If-None-Match request header?

fleet grove
#

I did

#

lemme try one more time

meager umbra
#

Seems to work for me:

$ curl -i -H 'If-None-Match: "vthI01QTzZkEAMR/jk8Lug"' https://pypi.org/pypi/installer/json
HTTP/2 304
date: Wed, 29 Mar 2023 22:19:50 GMT
cache-control: max-age=900, public
etag: "vthI01QTzZkEAMR/jk8Lug"
x-served-by: cache-hhn-etou8220068-HHN
x-cache: HIT
x-cache-hits: 1
x-timer: S1680128391.952514,VS0,VE1
vary: Accept-Encoding, Accept-Encoding
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-frame-options: deny
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
x-permitted-cross-domain-policies: none
unreal jewel
#

Yea, that's a conditional http request

fleet grove
#

odd, it worked now

#

i probably set the header wrong

unreal jewel
#

it can depend on ~things too

unreal jewel
#

if the information you want is only available in that format, then you don't have a choice

#

we're not planning to remove that currently, but it was a problem on other responses so it's a warning that if you can find another way to get that information that you might be better off doing that

fleet grove
#

noted, begins to scrape the user facing html pages

unreal jewel
#

😦

#

That's how the simple API got invented

#

so you're standing on the shoulders of giants

meager umbra
#

I have to say, that documentation fragment is written kinda weirdly.

Requests to the JSON, RSS and Legacy APIs also provide an ETag header.
Surely it's the responses that have the ETag header, not the requests?
If you’re making a lot of repeated requests, ensure your API consumer will respect this header to determine whether to actually repeat a request or not.
You can't actually determine whether to repeat a request based on the ETag, can you? You can make a conditional request, but that's still a request.

fleet grove
#

ye

#

my initial implementation was lazy so i just cached all of my requests locally for two hours

#

though that leads to issues with a package releases an update, the information doesn't show up for up to 2 hours

unreal jewel
#

(for those that don't know, originally there were no installers, PyPI was just a bunch of user facing html pages for people to manually look at, then easy_install came along and started scraping those pages, then that stated using a lot of bandwidth, so they made a "simple" html page that easy_install could scrape instead with just the links)

#

also for awhile there was a commented out <th><tr> in the simple api because easy_install used a regex to parse html and required that to work

unreal jewel
unreal jewel
#

aiohttp?

fleet grove
#

aiohttp, but for caching its my own custom implementation with an LRUCache

#

I'm refactoring it right now to use etags and redis

#

as i've done with all of my github related requests

unreal jewel
#

the traditional way to implement conditional requests is to cache the response, and if the conditonal request returns a 304, turn that into a 200 using the cached response

fleet grove
#

ah

#

integrating with the ClientSession

unreal jewel
#

yea

fleet grove
#

will consider that

#

think oh that actually wouldn't be that hard

unreal jewel
#

I haven't used aiohttp, but for requests I've used https://pypi.org/project/CacheControl/ which does things correctly. If you're like me and find working examples helpful, that might be a useful thing to poke at.

fleet grove
#

ah i was considering writing it myself

#

so i can use redis and whatnot

#

wrap the existing aiohttp _request method with a wrapper method that simply checks for etag headers and sets them on request

unreal jewel
#

Yea, basically you check if responses have an ETag, and if they do you store them for later use, then when later comes you check if you have a cached response, if so you add the If-None-Match, and if you get a 304 you just re-use the cached response, and if not you use the new response, caching it if it has an Etag. Then your client code doesn't have to think about it, so it's a nice abstraction

fleet grove
#

hm although i might need to cache the original headers as well in the event that there was something to check kek

#

@unreal jewel thx

#

it mostly works kek

fleet grove
#

aside from header finagling

unborn nacelle
wet flax
fleet grove
valid flame
fleet grove
spice hull
#

Bit of an internal discussion that we're looking for clarification on if anyone has some insight into the lower levels of how warehouse/pypi works.

  • A malicious package is uploaded under the name package-a and version 1.0. The package is reported and removed appropriately.
  • Since names cannot be recycled (AFAIK) package-a and version 1.0 cannot exist anymore on PyPI.
  • Does this mean that package-a can exist on version 2.0 even if the package was removed, or does it need to have a distinctly unique title from that point on, since the version can no longer be updated without a reupload?
obtuse torrent
#

can file an issue later

obtuse torrent
#

alright, no need for an issue then I presume?

#

(thanks for the beta invitation btw)

merry valve
obtuse torrent
#

In relation to this, it says that the risk can be mitigated using a dedicated environment but I think this is not currently possible? It doesn't seem like PyPI checks environment name during OIDC request (though it seems it could, based on OIDC documentation from GitHub) so it would seem that a committer could still modify the publishing workflow to simply not use that environment?

#

I do use a dedicated environment with manual approval but it seems like switching to OIDC would mean that the risk is increased in that regard since one could circumvent the environment while currently they can't since the token is only present in the environment that needs to be approved.

merry valve
dry nebula
#

@merry valve thanks for the oidc beta invite. I just made a pip-deepfreeze release with it. Everything went smoothly. 👍

white pawn
#

hi, where There could be checked why package is removed from pypi (in this case "codecov")

acoustic panther
gentle frost
#

what is the recommended way to upload a package for the first time?

#

as far as I understand one has to use the account password or create a temporary token with permissions for everything

#

is there another option?

merry valve
gentle frost
#

oh well that is awesome, I didn't fully understand it before now

gentle frost
merry valve
#

You should put your personal GitHub username -- we'll use this to invite you to a private repo to discuss the beta

acoustic panther
#

What do you guys think about proactive detection of known malicious code in pypi?

#

I don't want to keep bothering @merry valve with more reports 😅

unreal jewel
#

the problem with proactive detection typically comes down to the false positive rate

#

atm PyPI relies on third parties to report malware, so that they do the work of sorting through the false positive rate in whatever means they're using to detect malware

acoustic panther
#

if there is one

merry valve
# acoustic panther Do you know what the current approach is for detecting this?

Not sure if I understand the question, but there's some more context here: https://twitter.com/di_codes/status/1562160283178745858

A bit of context here...

The 'malware check' mentioned in the article isn't actually used on @pypi.

It was created as a proof of concept for a prototype detection system, and yeah, it's noisy (many false positives), but it also misses things & is hard to make better.

Why?

acoustic panther
#

There's some really interesting stuff in that thread.

steel trout
# acoustic panther What do you guys think about proactive detection of known malicious code in pypi...

Not exactly malicious per se, but a simple regex that flags some package names would fix a lot of the autogenerated clutter.

Like anything ending in _robux or how the roblox currency is called. Same for fortnight currency and other.

The number of false positives would probably be very low, from a quick test of mine it would currently be zero. Even false positives in the range of 1-10 could easily be handled via support tickets.

spice hull
fleet grove
#

wew

spice hull
#

@rare wadi unclogged the pipe, the juice is flowing again.

obtuse torrent
#

Would there be any way to speed up registration process for the trusted publisher beta for a maintainer (username: Kowlin) that has access to a project that already has it enabled (I enabled it on few such projects since I already have access)? Trying to see if we can avoid bus factor of 1 for the time being

spice hull
#

@merry valve I finally got around to reading that Twitter post you threw up. We have access to OSSF/Backstabbers/Chainguard malware collections and are working on consolidating information. Do you locally mirror any of the packages we report, or is that something we should be building up in our repository so we can expand the current malicious package datasets for research purposes? Our false positive rates aren't... nearly what Chainguard seems to be implying (though they're still quite high, roughly 15-20% if I had to guess.)
I have an interest in sharing what we're doing, because I believe it's wildly effective, but I have no context to what companies like Chainguard/Phylum/Snyk's report rates are with your org.

merry valve
# spice hull <@801208012207030343> I finally got around to reading that Twitter post you thre...

We keep essentially everything uploaded to PyPI. You'll note that inspector.pypi.io links from malware reports still work, that's because the underlying file is still available on files.pythonhosted.org. We don't do a great job of linking a malware report -> project -> files on PyPI currently so it's non trivial to come up with that "dataset" but it's not impossible and I do hope to do that eventually.

spice hull
#

Actually, that's helpful enough. We have the list of known-bad packages. So long as the items still exist, we can likely query them effectively I believe. That might be a priority in the coming days.

merry valve
spice hull
#

That might be something that we can effectively generate, but our false positive rate would have to go down substantially if we were using our locally provided index of malicious packages as opposed to any single field queried by the warehouse API itself.

I shall ponder.

spice hull
#

When a package is uploaded, even if it's the first 'instance' of that package appearing on PyPI, does that package still appear on the RSS feed for recently updated packages? pithink

obtuse torrent
#

It's appreciated :)

spice hull
#

Antimalware team "Try not to break the automated mailing system for one consecutive day" challenge. Can the team do it? Find out next time on Open Source Security Z.

#

(My heartfelt condolences to Dustin and crew for dealing with our scuffed systems.)

dire vessel
gentle frost
serene fern
#

pypi-data is maintained by Tom Forbes (GH orf), he’s a pretty active community member as well, but yeah it’s hard to beat Seth

narrow field
#

anyone aware of sensible ways to list only packages starting with a prefix instead of all of them? also do any of the pip data libs support caching by last serial?
(i want to reduce the size/number of http requests the pytest plugin list updater makes without having to invent anything)

narrow field
#

it seems tho, as if just using requests-cache already is enough to avoid any more issues for now

tribal sedge
spice hull
spice hull
tribal sedge
rare wadi
serene fern
spice hull
#

Now I wish I would've reported it...

rare wadi
tribal sedge
spice hull
rare wadi
cobalt bone
#

Is there anything I can do to escalate https://github.com/pypi/support/issues/2587 ? I'm at a decision point now where I'm considering withdrawing my support for the project and finding or forking an alternative. My preference would be to maintain the project.

spice hull
merry valve
#

Yeah, that's spam

spice hull
#

Shot an email out then, cheers.

gentle frost
merry valve
gentle frost
pliant obsidian
merry valve
spice hull
valid flame
#

I would be happy to try and deliver this feature to PyPI

narrow field
#

Is there any recommended way to give last serial ids as hints to http caches?

merry valve
narrow field
# merry valve What are you trying to do?

When using a http cache it seems I get stale cached entries for projects, I'd like to ensure the cache makes use of the last serial i obtain from the project list to enable early revalidate

#

But I can fall back to forced refresh

unreal jewel
# valid flame Hi. What is the maintainers take on https://github.com/pypi/warehouse/pull/9972?...

I think the answer is yes, it's open to be taken over, though it may be hard to do until we have a better thing for our uploads to do things async. I think there's also an open question of if that PEP still makes sense given we can put JSON on the simple API now, so is it better to invest in getting info into that instead? But I don't think either of those block that PEP, just somethings to consider

silent grotto
#

Is it just me who cannot access pypi at the moment? My Browser keeps telling me: Access Denied

rare wadi
#

My phone just downloaded a 0kb file from trying to hit it.

pliant obsidian
silent grotto
#

Thanks for confirming 🙏

pliant obsidian
silent grotto
#

Ah, I was an this side before, but overlooked the banner and just saw this beautiful green saying "Operational".

pliant obsidian
rare wadi
#

Hi, is this the place for Inspector too? It’s giving us 502(s)

daring scaffold
#

I was super happy to read that organization accounts are now available for PyPI. I would like to start discussions at Canonical's sprint next week in Prague with upper management to introduce them for Canonical, but I cannot find any pricing information. Any help on that? cc @merry valve

merry valve
hybrid zenith
#

GHA getting drunk or some infra issues today? Run python3 -m pip install --upgrade pip ERROR: HTTP error 502 while getting https://files.pythonhosted.org/packages/29/eb/5a56994b37d9141a6c7fa6ddb5c76a50b234a424eae4e79c56f33b61c686/tox-4.5.1-py3-none-any.whl (from https://pypi.org/simple/tox/) (requires-python:>=3.7) 31 ERROR: Could not install requirement tox>=4.0.0 from

obtuse torrent
#

I had an issue with a cron run 5 hours ago but not 4 hours ago so it seemed intermittent

restive sapphire
acoustic panther
acoustic panther
#

Thanks! 🙂

spice hull
#

Neat! Did you add in the opcodes for 3.11? I use pycdc fairly regularly to... mixed success.

#

Oh never mind, saw the second screenshot.

#

Still extremely cool ❤️

acoustic panther
spice hull
#

Mmm not according to docs anyway. pithink

#

one of my backlogged projects is bringing 3.11 opcodes into pycdc so we can support new verisons, frankly I'm not sure why that's popping up there.
Edit: That makes it sound like I'm the author of pycdc, I am not. I just need... 3.11 opcodes to work lol.

acoustic panther
#

haha. It looks like the disassembly was successful though.

spice hull
#

Yep! Definitely a massive change, would love to see it go through. This would alleviate a lot of work on our end from spooling up vm's every time we get a flag somewhere in .pyc's.

rare wadi
acoustic panther
merry valve
#

Sorry, codebase is a little messy because I just ripped out what I needed from pypi/warehouse

acoustic panther
#

Hahaha, no worries. Should I uncomment it?

rare wadi
acoustic panther
#

Oh, got it lmao. Sorry guys, I'm such a noob with Docker.

rare wadi
#

👍

acoustic panther
#

Okay, that should do it! 😅

acoustic panther
#

decompiler works nicely with this one!

acoustic panther
#

Okay. I think that's everything. PR is ready for review @merry valve whenever you get the time! 🙂

tranquil tulip
merry valve
acoustic panther
#

My pleasure!

obtuse torrent
#

Do you keep any logs of failed requests for short-lived API tokens (security log doesn't seem to suggest so)?
I'm wondering why this run has failed for us: https://github.com/Cog-Creators/Red-DiscordBot/actions/runs/4879413470/attempts/1 with Red-DiscordBot package on PyPI. I was forced to temporarily add a publisher with the same owner, repo, and workflow name but without an environment specified so that I can release. Looking at it post-mortem, I still see no reason why it was failing as the "Release to PyPI" job does run in an environment named "Release" and I in fact had to approve the deployment:
https://github.com/Cog-Creators/Red-DiscordBot/blob/1d654c2edcd1b10e7521a4edc995817b68978cf0/.github/workflows/publish_release.yml#L83-L106

chrome zealot
#

@obtuse torrent I'm trying to diagnose what I suspect is the same or a quite similar issue at the minute

#

I think it might be a PyPI bug? But it's hard to tell because I can't see any debug info

#

But I think what's happening is that environments are case insensitive (on GitHub) but either PyPI or gh-action-pypi-publish is not treating them as such

obtuse torrent
#

I have used the same capitalized environment name in other repos without issues

chrome zealot
#

Aha, ok, then I'm wrong clearly.

obtuse torrent
#

But yeah, I was thinking it is a PyPI bug too

#

Just not sure what kind of bug

#

It could technically be some sort of limitation on GH's side but idk

merry valve
obtuse torrent
merry valve
#

er, without the environment name, sorry

obtuse torrent
#

I only kept it while cutting the releaee

#

Since it's a security risk

merry valve
#

I think what's happening is that we're not normalizing the environment name everywhere, will look into this, thanks for the report!

chrome zealot
#

@merry valve where would have been the right place to file this (since I came to the channel mostly to ask that initiall before seeing @obtuse torrent having the same issue :D)

#

warehouse?

merry valve
obtuse torrent
#

The main configuration should only be the one with env name, I temporarily added a publisher without the environment while cutting two releases today

chrome zealot
obtuse torrent
#

I was surprised because we have already tested a similar configuration with a different package/repo and it worked fine but I guess normalization may be missing for some specific case

#

Now that I think of it, we tested the exact same configuration too on a different branch in the same repo 😄

merry valve
#

the logic here changed a little bit recently, I think this would have succeeded prior to a day or two ago.

obtuse torrent
#

Makes sense then

merry valve
#

welp, I have a fix but looks like GitHub is having an outage

chrome zealot
#

Sounds like a good fix, no one can use the feature now anyhow then, problem solved.

merry valve
#

Ok, should be fixed in a few minutes once the current deployment goes out. Thanks again for raising!

rare wadi
#

🎉

#

OH
BTW
@merry valve -- what's need to complete support for OIDC trusted publishing through reusable workflows?
Is that something I could help along?

chrome zealot
#

Er, good job me.

merry valve
chrome zealot
#

(The fix seems to indeed have fixed me!)

valid flame
#

are you open to adding option to return JSON for PyPI search? right now there is no viable option to search PyPI outside the web UI and that is quite hard to parse when one wants to search pypi from outside web UI. simple API is not really a solution since main page for PyPI is like 40MB and it would be quite heavy to both download in the background and parse

quaint ocean
# valid flame are you open to adding option to return JSON for PyPI search? right now there is...

From pip search:

ERROR: XMLRPC request failed [code: -32500]
RuntimeError: PyPI no longer supports 'pip search' (or XML-RPC search). Please use https://pypi.org/search (via a browser) instead. See https://warehouse.pypa.io/api-reference/xml-rpc.html#deprecated-methods for more information.

And in https://warehouse.pypa.io/api-reference/xml-rpc.html#deprecated-methods

search(spec[, operator])
Permanently deprecated and disabled due to excessive traffic driven by unidentified traffic, presumably automated. See historical incident.

steel trout
#

I am really curious what made those massive amount of request, because the way this is written sounds like the requests were made from a single IP or at least from only a handful of IP‘s.

narrow field
#

Im wondering, has the culprit been identified?

serene fern
#

How are orgs expected to work on Test PyPI? We don’t have a tab for signup on there.

steel trout
# narrow field Im wondering, has the culprit been identified?

We are working with the abuse contact at the owner of the IPs and trying to make contact with the maintainers of whatever tool is flooding us via other channels.

Due to the huge swath of IPs we were unable to make a more targeted block without risking more severe disruption, and were not able to receive a response from their abuse contact or direct outreach in an actionable time frame

The first sounds like at least the ip of the origin was identified, the second one sounds like it is a tool that is used by many ips.

With no evidence or knowledge, I would wager the guess that it is some tool that does something completely different but once used the xmlrpc search in the background. It probably doesnt use the result anymore, because the requests did not stop, when the service was suspended.

Maybe doing a search for a known package just to check if pypi is down or not. Would be stupid but yeah.

This is me just armchairing though.

#

Worst case thought: malware that uses the existence of a package as a shutdown signal. That package was never installed so now all infected systems constantly call and never will stop

narrow field
steel trout
#

It has to be something that still works even when the search fails.

spice hull
#

hundreds of thousands of requests per hour has me skeptical that the intent behind something like that would be benign.

#

https://twitter.com/ESETresearch/status/1654127211287560194
Found this fairly interesting, ESET is tracking on the same stuff getting spammed up on PyPI that us and other orgs occupying a similar space are. This stuff is pretty loud when it flags on our yara rules, I do wonder if maybe we can't quietly hand over our yara crib that flags specifically on this to Dustin and crew for automatic removal. I cannot fathom a false positive that could be generated from packages like this.

steel trout
#

Ok I have to add to my curiosity:
What terms were queried with that mass requests!

rare wadi
serene fern
#

A with most things like this, compatibility with history. Not everything works, but I think there are a couple more archive formats supported. bz2 and xz or something like that.

#

But please stick with tar.gz if you’re making a new tool

pliant obsidian
#

iirc the sdist can be one of tar.gz or zip, but you can't upload both (any more)

rare wadi
spice hull
#

uhh

merry valve
#

Can you file an issue?

spice hull
#

Si!

#

Kinda' rough to write up, because I'm not sure of the inner workings as to why that's failing, so I beg your patience with my haphazard documentation.

acoustic panther
acoustic panther
spice hull
#

Mhm, I mentioned that in the issue.

acoustic panther
#

my guess is that this is happening because somehow, pypi is randomly not returning 404

acoustic panther
spice hull
#

It takes a little bit for PyPI to serve an actual page after a package is uploaded.
We receive notification often seconds after a malicious package is published.
This often means we fall in that window where PyPI isn't appropriately serving the webpage content yet, so we more or less always get the 'package removed' message. Waiting ~20 seconds or so and refreshing typically yields what we'd expect to see.

acoustic panther
#

Ah, that makes sense

acoustic panther
#

Working on code analysis features for inspector...

spice hull
#

On the PyPI end, is there a way to check to see if an account has a workflow created to automatically upload to PyPI? We found the GitHub accounts responsible for uploading all this Kekwltd malware.
They advertise their malware with a bunch of random crap on GitHub, and in the GitHub repo, the malicious payload is installed.
They have ~1130 commits in like 15 days, so there's an automated workflow here, not sure what the internals on that might look like though.

#

(Regardless, to preempt the question, we've filed a report.)

spice hull
#

Party Mike has joined the party.

merry valve
spice hull
#

Yes. Sorry, that could've been worded better.

merry valve
#

Do you mean a way for you to check, or a way for PyPI admins to check?

#

IIUC, you're trying to find the GitHub repo that corresponds to a given PyPI project?

spice hull
#

The later. I'm not entirely spun up on the Trusted Publisher feature, but 1130 commits in that time makes me think that there's some heavily automated process behind this; whether that automation extends to the trusted publisher feature or not is kind of what I'm curious of. I'm not looking to breach that level of trust with PyPI users, I'm more curious if that's something that you guys have looked into-- how these guys are pipelining the spammed packages into PyPI, and specifically, if they're using the Trusted Publishing to do so

#

I guess to bring that to a more clear point, it's clear they've automated the updating of their repositories to add in new malicious packages.
I wonder if they are using the trusted publisher to push packages to PyPI as well?

merry valve
#

In terms of automation, there isn't really much difference between a workflow that publishes via trusted publishing and a workflow that just uses username/password

#

I doubt they're using trusted publishing because it would be a little more work to set up on the PyPI side

#

But it would be good to highlight if it's set up during our takedown process, because it would tell us if there was an upstream repo somewhere

spice hull
#

Actually while I've got you here, entirely tangential question. I've been following on https://github.com/pypi/warehouse/issues/12612 and doing some brainstorming.
I'm acutely aware of the volume of reports you likely receive from orgs that are doing similar monitoring of the package index. Would there be an impending need for a third party solution to this issue, or are you fairly close to a solution internally?

#

We are in the process of establishing an API endpoint that would allow third party, authorized individuals to direct reports through and, if the package with the version associated has been seen before, individuals using that endpoint would be appropriately notified that that package had been reported previously.

merry valve
#

We haven't announced it yet but we've gotten funding to essentially implement that issue (and some related things around malware). So yes, we'll be implementing this ourselves.

spice hull
#

Good news! Hope the development process goes smoothly!

spice hull
#

As I'm... sure you're no doubt aware by the volume of reports, I suspect they (malware authors aforementioned) have now entirely automated the upload process, to include checking if the package has been yanked. pithink

latent pier
#

Is there anywhere I can point people to with more information/context about the incident right now with new account and package registrations being disabled, besides just the summary in the incident report? I have a number of people asking.

merry valve
latent pier
# merry valve Nope. What more information/context do they need?

They were mostly asking about the cause of the latest major uptick in malicious activity. Seeing the limited comments here I surmised it seemed to likely relate to increasing automation of the process by malicious package authors to increase volume and bypass or reduce the impact of previous mitigations by the PyPI team, but I wasn't sure since I couldn't find much info other places. I didn't mean to bother you at a time when your bandwidth is most critical, sorry, I just figured someone might have additional insight to share or somewhere I could redirect people.

merry valve
#

Not sure about the cause of the uptick but the cause of the pause on project/account creation is just more malware being uploaded and less time for us to deal with it.

#

If there's any specific answers they're looking for, let me know and we can get it added to the incident.

latent pier
merry valve
steel trout
#

This runs maybe possibly close to some EU stuff. Ala what meta just received.

Not saying it does or so, because you know I am not a lawyer and I dont even play one on tv.

spice hull
latent pier
#

One thing I was curious about is why the users in question haven't been given any notice of the subpoena. Was this a condition of the court order, an individual decision for this case on PyPI's part, or PyPI's standard policy?

steel trout
#

Is the storage in the USA?

I read it fully and it seems like actual personal identifying information was given to a US goverment entity.

Would this also happen for EU citizens? Is there a way for me as a user to see my data, or request deletion, easily?

I am sorry for all those loaded questions, but sharing pii with US agencies is really iffy for a lack of a better term, to me.

(Is this also the real reason for the account/project creation halt? To not accept more until you have the new system running?)

unreal jewel
#

Is the storage in the USA?

The bulk of PyPI is hosted in the US, I think we have a backup server in... Ireland? England? Something. And of course the Fastly CDN has POPs all over.

I read it fully and it seems like actual personal identifying information was given to a US goverment entity.

From the post:

  • Internal Databases IDs (all UUIDs IIRC)
  • Username (public on PyPI)
  • Display Name (public on PyPI, able to be changed at will)
  • Email addresses (not public, required on PyPI to contact the user)
  • Journals table records (all typically public except the username and IP address of the user that made the change that caused the record to be made)
  • "User Events", you can see these on https://pypi.org/manage/account/ if you scroll down to "Security History", I believe that shows all of the information that can be given in that table.
  • Date Joined and Date Last Login - Date Joined is available publicly (if it exists, sufficiently old accounts don't have a date joined), date of last login is not publicly available (not sure we even make it privately available... but you can figure it out from the security events).
  • Download logs of "who downloaded X package"-- they asked for IP addresses but we don't store that information, in fact the only download logs we store are in the publicly available BigQuery tables that anyone can query, which were carefully designed not to hold PII.

So from that, the data that isn't publicly available on PyPI:

  • various internal database IDs, which are all just random UUIDs (some might still be integers, but I think we got rid of all of those).
  • email addresses
  • which specific actions that got recorded in journals table were made by you (these lists things like "package X uploaded" "file Y deleted", "package X created", stuff like that).
  • When your last login was
  • Various events in the security history on your account page.
  • IP addresses in the journals table and the security history.

Going to ignore database IDs because I don't think anyone could call them PII.

  • We can't really get rid of emails, we need them to communicate with our users, deal with password resets, etc.
  • We can't really get rid of the username in the journals table, that is important information for remediating compromises or even just confusion about who caused something to happen (why did a file get deleted, who uploaded this, etc).
  • Last Login, I don't really think this is PII but we can't get rid of it because we use it for security reasons (things like password reset emails get invalidated if you login with your password)
  • Security History (sans IP Addresses): This probably has the most PII in it, mine is kind of janky right now because of a "bug" that has some extra admin stuff getting logged in it, but it looks like for me the data is pretty minimal, name of my WebAuthN Key, Ip Addresss, what 2FA auth I used to login, what orgs I was invited to, what emails were sent to me (no bodies, etc)
  • IP Addresses, definitely PII.

We're explicitly looking at all of those things that we can get rid of to either completely eliminate them or add time gated retention policies where we can.

#

Would this also happen for EU citizens? Is there a way for me as a user to see my data, or request deletion, easily?

I think I mentioned what data exists where, we don't have any specific tooling to get a dump of your data or anything like that, just what was posted above.

As far as what would happen with a EU citizen, I have no idea. I would suggest emailing legal@python.org so the lawyer types can respond, but I would guess the answer is "it depends", but that's a complete guess.

spice hull
#

Third party content providers represent and warrant that they have obtained the proper governmental authorizations for the export and reexport of any software or other content contributed to this web site by the third-party content provider, and further affirm that any United States-sourced cryptographic software is not intended for use by a foreign government end-user. Individuals and organizations are advised that the PyPI website is hosted in the US, with content delivery network points of presence as well as unofficial mirrors in several countries outside the US. Any uploads of packages must comply with United States export controls under the Export Administration Regulations.

unreal jewel
#

Is there a way for me as a user to request deletion, easily?

You can delete your account on PyPI at anytime, as long as you don't have any projects where you are the sole owner (because we don't allow a project to get abandoned with no owner). I believe (and I've double checked by quickly looking at the code, but I haven't actually tested it) that will remove everything about your user from the database except the entries in the journals table, where the username gets set to "deleted-user" but the IP address is left as is (something I'll make sure to note in our review of the data we keep).

Any package files that you have uploaded, do not get deleted from storage ever (even if you delete them from PyPI) and are available directly from their URL, these have long hard to guess URLs so it's unlikely someone would find one without already knowing the URL, and of course package files are public on PyPI so anyone could have downloaded it previously.

#

Is this also the real reason for the account/project creation halt? To not accept more until you have the new system running?

No, we get a ton of malware/spam reports most every day and it's pretty much all volunteer run to manage them. Almost everyone was away for the weekend and there was a big uptick, with nobody to respond to them.

#

I think I answered all of that, sorry for the lengthy responses and slow rolling them. I was trying to verify my answers with the behavior of the code by popping it open in GitHub and double checking my memory.

PyPI is completely open source, and it's pretty easy to get it running locally as well. You can read the code yourself at https://github.com/pypi/warehouse or run it locally and try out different things and inspect the database and see what data is there if you want. If you want to verify what I've said or want to look for yourself.

steel trout
#

Thank you very much for the detailed answer, no need to apologies for the lenghty responses.

Just to make it clear, my question were not directed as an accusation, but as something like „watch out for that“(but was written aggressive, I blame my lack of sleep ) I like pypi, I like using it, I have no intention of getting my data delete or so, I just dont want it to get hit by some fine or stuff.

My question in brackets was basically just a stupid random thought, I really should have left out.

Again I want to thank you for your detailed answer and especially for checking the actual results in the db!

Extremely sorry if I caused some stress, it was really not my intention!

unreal jewel
#

Nah you're fine 🙂

pliant obsidian
# steel trout Is the storage in the USA? I read it fully and it seems like actual personal id...

More info about the weekend's halt:

The problem at PyPI was not so much a surge of fake accounts and subverted packages, though the tide of dubious stuff did rise from the typical rate of about 20-30 reports per day to about 40 per day over the weekend. Rather, the staff who usually vet suspect submissions had ebbed to a single person who felt unable to adequately respond.

https://www.theregister.com/2023/05/22/python_package_index_on_call/

steel trout
#

Thank you for that info, and sorry for actually writting that tinfoil hat thought out. Would delete it from the message, but then all the answer would not make sense.

Sorry again, was not my brightest moment.

buoyant anchor
#

It really does sound like PyPI needs a modern retention policy, keeping user IPs around indefinitely is....urgh

spice hull
# buoyant anchor It really does sound like PyPI needs a modern retention policy, keeping user IPs...

We've filed an NCMEC and an IC3 report as a direct consequence of a particular portion of malware being distributed on the Python Packaging Index. While I understand the desire to respect user privacy, the information that's being sought in these subpoenas probably tracks in line with similar motivations. It's certainly a fine line to walk, but I'd also... like to see these individuals face some sort of consequences for their actions-- there's a fine line to walk between creating an environment where your platform is open and secure, and an environment where your platform is the wild west. Without a more substantial security infrastructure in place, the ever present reality that malware publishing was quite literally productionized and automated (to a volume that you'd find staggering) is a stark reminder that there are individuals out there actively looking to exploit the platform for financial (or more sinister) motivations.

We've found CobaltStrike Beacon droppers, Meterpreter payloads, etc. Fairly advanced attacks beyond just 'Steals your Discord Token', and often under the guise of perfectly legitimate packages. In the lack of a first-party infrastructure that supports the automated detection and identification of packages like these, I can see a world which necessitates the retention of that information to ensure a broader safe usage of the Python Package Index.

spice hull
# pliant obsidian More info about the weekend's halt: > The problem at PyPI was not so much a sur...

Also for what it's worth, the account perpetrating the sustained attack over the weekend referenced in this blog was actioned by GitHub. My team (and at least one other supply chain security org) are monitoring for any additional pop-ups that happen, hopefully to nip it in the bud sooner rather than later this time. The actual account was known at the very beginning of the month, myself and several other individuals were waiting on a response to our GitHub reports. The account was actioned on the 23rd of May. Several associated accounts were actioned at the same time, in an effort to prevent simply switching to another account.

unreal jewel
#

My personal opinion is we should (and can) hold onto some of this data for a shorter period of time than "forever". We will have to figure out what that looks like over time, it won't be a fast fix.

#

Like the IP address that someone used to upload a package in 2005 is probably not super useful or relevant in 2023

spice hull
#

I'll concur, though where one should draw that line is probably... fairly contested. In the threat intelligence community, we can trace some of these threat actors back for over a year (or two). While this information is being continually funneled upward to law enforcement in an effort to prevent some of the more persistent and advanced attacks, I'm personally unsure of what a 'reasonable date' might be.
I can see a world where data retention policies are revised after more infrastructure is devoted to detecting and mitigating threats and malicious behavior on a first-party basis. As it stands right now, I would imagine the security@pypi.org endpoint (especially during the timeframe that PyPI knocked over registration of new packages and users) is likely receiving ~150+ emails per day. That isn't to say that all of those are different packages, but there's a spectrum of monitoring that goes on between numerous organizations, and I'm absolutely dead confident we double-triple tap a lot on some of these packages between orgs.

unreal jewel
#

I think, but don't quote me that the security engineer role is going to have the task of making better report infra

#

that doesn't involve emailing 4 specific people via a mail alias 😄

#

(though presumably it will still end up ultimately emailing 4 specific people 😢 )

spice hull
#

Yeah we were working on a homebrewed solution that would bounce incoming reports off our database to see if the package was reported or not, and just transparently 'flip' the status of the package to reported if it hadn't already been done. But it sounds like they're going to knock that out pretty soon. Which should help, even if it's just removing duplicating reports.

#

We should be open sourcing soon (sans our rules themselves), which will make our dockerized package scanning and IOC accrual fairly accessible to anyone that needs some solution. Not really sure of the use cases for our little setup aside from... scanning PyPI itself, but I've learned that open source tends to just take projects and run with them if they're useful. pithink

latent pier
#

Hey @merry valve, @unreal jewel et al, I just wanted to introudce my colleague and good friend @solid fable to you. Juanita is an open source scientific Python developer and community manager who previously worked with us at Spyder for several years, and is now at Scientific Python and pyOpenSci, among others.

She's currently getting her Ph.D in cybersecurity at the University of California Santa Cruz, and for her research she's really interested in working on a project related to improving the security of PyPI and the Python packaging ecosystem, and wanted to reach out to you folks to determine what ideas would be most valauble and on the potential to collaborate on that. I'll let her take it away from here!

solid fable
#

Hi everyone! Thank you for the warm introduction @latent pier. It's a pleasure to join this community! ✨ I'm Juanita Gomez, Computer Science PhD student at UCSC. As CAM mentioned, I worked previously as a Spyder developer and currently, I'm involved with Scientific Python and pyOpenSci.

I'm reaching out because I genuinely want to focus my Ph.D. research on something that will have a positive impact on the open-source Python community. Given my research focus in security and my involvement with the community, I believe that improving the security of PyPI and the Python packaging ecosystem presents a compelling opportunity.

I would love to collaborate with you on a project focused on enhancing the security of PyPI. However, I believe it would be beneficial to discuss and identify the most valuable ideas for such an endeavor. I'm open to suggestions and eager to learn about any ongoing initiatives or challenges you may have encountered in this domain.

merry valve
solid fable
#

Thanks for the info @merry valve. I'm pretty open right now regarding the scope. I just finished my class requirements in my PhD so I will start my oficial research next fall and will probably be doing it for 3 years. For now I'm working on a survey paper to understand the literature regarding this topic but will have to present my oficial advancement proposal in a year.

proud bison
#

hi, we (the scverse people) are wondering what this request is blocked on. We'd like to play around with that org

tribal sedge
proud bison
#

I see, so it’s less ready than we thought, and we just have to be patient. Thanksfor your work and the info! I’ll pass it on.

spice hull
spice hull
#

Gee whiz question for the class-- if we're finding exposed credentials/secrets in PyPI packages that seem or purport to be legitimate business use cases, should we be... contacting these organizations directly to let them know they have exposed secrets, or should that be routed through PyPI to evaluate/be made aware of/potentially action in the event that the individual doesn't/won't respond to a random e-mail? pithink

unreal jewel
#

I would do both

spice hull
#

Fair point, hooray emails.

pliant obsidian
#

Interesting, GitHub planned in 2020:

GitHub Packages users will have access to a public and private PyPI package management server for distributing Python packages both publicly and privately within their organization.

But now:

This is no longer planned due to a change in our strategic priorities and the allocation of our resources towards higher-priority initiatives.

https://github.com/github/roadmap/issues/94

obtuse torrent
#

They abandoned their plans for all languages, not just Python

rare wadi
#

Anything more specific than "we've reevaluated our priorities"?*

obtuse torrent
#

I doubt it

#

I assume there just wasn't that many corporate clients that wanted it

#

Maybe because they already have their own on-premise solutions

rare wadi
#

Yeah
All these registries already offer their own self-hosted solutions

ivory python
unreal jewel
#

yea moment, about to post a response

#

just woke up

ivory python
#

gotcha

#

thanks 🙂

#

I take it you're consolidating in a pypi org?

unreal jewel
#

Yea, the owner transfered it to us, and I just aligned it with our other permissions without really thinking about it. It hadn't been maintained in ~3y and was archived on github.

ivory python
#

gotcha makes sense

unreal jewel
valid flame
#

It always amuses me why peoples first choice is to make a drama on social media. Why not just reach out directly to PyPI team to discuss...

swift axle
#

The owner of the repo also commented on an issue saying this was happening. The tweet author even commented on that 4 minutes before the tweet above https://github.com/pypi/stdlib-list/issues/55#issuecomment-1584321437

GitHub

@jackmaney Thanks for this package! We are using it at https://github.com/reinout/z3c.dependencychecker and it is great! 💯 It would be nice if there could be a new release with support for 3.10 and...

unreal jewel
#

Will wasn't the owner, he is a PyPI contributor who was going to help get the repo setup

swift axle
#

Ah apologies. Still seems a little disingenuous to go to twitter and try to stir an outrage.

ivory python
#

knowing the author, they are one of the conda-forge core people, they are just annoyed about losing permissions to a project that they contributed to as a volunteer. Easy to fix though on hindsight. Thanks for the response, @unreal jewel

unreal jewel
#

I can understand it, it probably doesn't feel great to have a permission bit removed with nothing but an automated email saying that it was done. Totally just an oversight when I was just blindly copying settings between two pages. I habitually remove even myself from org owned projects in favor of teams, etc.

ivory python
#

yeah, totally, shit happens

#

wondering if this has happened to them before? I know people are more sensitive to stuff like this if they experienced it before

unreal jewel
latent pier
#

I can certainly understand why they felt the way they did given the initial circumstances, even if they weren't actively maintaining the project. On the other hand, the efforts to apologize and offering to make things right were, IMO, supremely well handled by @hazy wagon along with @unreal jewel and @finite pulsar — thanks for that! It's a little disappointing that both seemed to have a bit of a bad taste left in their mouths, but at least things were calm and were civil.

finite pulsar
#

yeah, i'm very sympathetic to their position -- it looks like they put a lot of effort into PRs that had piled up due to the primary maintainer's inactivity, so suddenly being removed without any clear explanation probably really shocked them

proud bison
#

On a technical side: Maybe it’s possible to show in UI what changes is effective permissions a change in configuration would cause.

Then one can read out to people manually before hitting the button, like “there’ll be an automated e-mail, don’t worry about it, we can add you back if you’re interested”

boreal dagger
#

I would like to reuse a name for a package on PyPI for a different project.
(The existing one has been basically abandoned for years.)
I have permission of the original author who also made me an admin of the project.

Is there some "best way" to indicate that the project has been changed?
I would like to "delete the old releases". That is: if user comes to the project site, I wouldn't want them to be shown as older releases of my project. But I don't mind having them archived properly.

There is an article about requesting name transfer, but this has already been agreed with the owner and I have full rights already, so this doesn't apply:
https://peps.python.org/pep-0541/#how-to-request-a-name-transfer

#

And the second question: does pypi allow binary-only packages? We have something that I would call alpha stage. The SDK (written in C++) itself still needs to walk quite some path to become really stable/fully usable, and the python bindings themselves (written in pybind11) are even a tiny bit more experimental. We would like to make the python package easy to install, but we would like to wait a few more months before publishing the sources.

meager umbra
#

PyPI certainly has binary-only packages (for example, TensorFlow).

unreal jewel
#

PyPI has no requirement to upload sdists

#

Re: project name take overs, there's no specific rules or best practices. Note that you cannot re-use filenames on PyPI, so if it's a project named foo, that had uploaded a foo-1.0.tar.gz previously, you won't be able to upload your own foo-1.0.tar.gz even if you delete the first one

#

So I would recommend starting your versioning at a higher version than the previous project had previously used

meager umbra
#

You could use an epoch number to separate your project from the old one.

boreal dagger
#

That's already the case (with version numbers), so this is not an issue.

boreal dagger
serene fern
#

I think you’re basically set then. Maybe also yank releases from the old project. https://pypi.org/help/#yanked

unreal jewel
#

I wonder if anyone has ever actually used the epoch

serene fern
#

Not PyPI but at work we use epochs in our private repository

unreal jewel
#

apparently 306 versions on PyPI use epoch

#

neat

pliant obsidian
#

yanking is a good idea. do yanked releases still show in the UI?

unreal jewel
#

they do

#

with a marker

#

pip has had a yanked release if you look at their history

meager umbra
pliant obsidian
unreal jewel
meager umbra
#

Wow, not a lot.

unreal jewel
#

I was too lazy to narrow itdown by project on my phone

#

it doesn't surprise me many people don't use them, it's kind of a self fulfilling prophecy in a way, hardly anyone uses them, so most peopl don't know they exist or are weirded out by them, so then hardly anyone uses them

#

plus a non zero number of things in the world will get really confused by them, since a number of systems (not python ecosystem stuff generally at least) just blanket assume semver

wet maple
#

I'm looking at using the PEP 691 simple API from PyPI. It seems like PyPI returns an extra "versions" key in the response data, with a list of versions. That's super helpful, but it's not actually specificed in PEP 691. Should I not rely on that being there, and process the filenames in "files" to get the versions instead?

unreal jewel
#

there's another PEP that adds that

#

700? 701? something in the 7xx series

wet maple
#

Thanks! Is there a place I should look for documenation about the current API in total? I had a hard time finding that, sort of just searched through the Warehouse code to find stuff.

#

I'm using this in a bit of JS in docs to check if they are for the latest release and show a banner if not.

unreal jewel
#

Well

#

Technically it’s supposed to be the specifications section of packaging.p.o but we’ve never copied the simple api to it

spice hull
#

I would imagine that slew of packages that just went up is probably a strong indicator that... there's another automated upload pipeline for malware distribution being tested. =/

surreal niche
#

Does anybody have an idea how I should go about requesting the permission to produce a look-a-like PyPI.org interface for an open-source project? The aim is that the design should feel familiar, but not try to pretend it is pypi.org (e.g. and cause people to enter passwords etc.).

spice hull
# surreal niche Does anybody have an idea how I should go about requesting the permission to pro...

PyPI appears to maintain an Apache 2 license, so you should be free to modify and distribute for private or commercial use provided you don't hold PyPI liable, trademark the product, and include the original license, copyright notice, and changes.
https://github.com/pypi/warehouse/blob/main/LICENSE

GitHub

The Python Package Index. Contribute to pypi/warehouse development by creating an account on GitHub.

surreal niche
serene fern
#

I think that’s mostly covered by the PSF

spice hull
#

I know that the PyPI administrators lurk here reasonably often. But that might require a level of discussion above what PyPI itself is responsible for for derivative use of logos with modification.
https://www.python.org/psf/trademarks/
I'm not certain that PSF holds the specific trademark against PyPI (I would suspect they do, however.)
The psf-trademarks@python.org email address from the above URL seems to be the place you want to go though.

#

Also entertaining embed.

serene fern
#

I'm not certain that PSF holds the specific trademark against PyPI
They do https://pypi.org/trademarks/

spice hull
#

Aha! Cheers TP! Then it seems my advice holds true. The PSF trademark site will be your go-to! Good luck pelson!

latent pier
valid flame
merry valve
#

It's not possible.

unreal jewel
#

Yea that’s not a thing that can work

valid flame
#

ok, thanks for swift response 😄

winter lion
#

Apologies in advance if this is not the right place to ask this question, I will remove this comment if it isn't but I haven't found exactly where else to ask this hence asking it here. We (dClimate) have had an organization request pending on PyPi since May 23 and we were wondering what, if anything, can be done to expedite this process. We have quite a few open source packages waiting in the wings that we want to publish but would want to do so under our org 🙏 If we need to provide any information to verify our existence/relation to the organization we are more than happy to do so.

valid flame
#

Is there a plan to remove JSON API at some point and only leave Simple API or is it just discouraged to use it?

unreal jewel
#

no actual plans

#

it's just an API that isn't well defined and is harder to scale

valid flame
#

ok

merry valve
valid flame
#

I am working on reducing usage of JSON API in Poetry and it's leaking deeper than I expected lol

unreal jewel
#

yea I don't think there's any risk we're going to delete the json api in the next several years

merry valve
#

(specifically, the existing JSON API is not standardized so it's generally not recommended to integrate against, but it's not going anywhere anytime soon)

unreal jewel
#

but random keys may or may not get deleted if they become a problem

valid flame
#

well, our case is that we determine metadata per release based on JSON API (so basically the first wheel uploaded to PyPI) and switching to Simple API would require us to either analyse all artifacts for metadata and compare it or choose one artifact (kinda at random or some kind of sort by upload time) and base the metadata on that. the second option is kinda what PyPI is offering via JSON API anyway...

winter lion
unreal jewel
#

was it a community org or a corporate org

winter lion
unreal jewel
#

I think corporate orgs are currently all pending as we setup mechanisms to onboard them and get them onto a paid plan, and community orgs are being processed (not sure offhand if there is a back log of them or not)

#

I was only tangently paying attention to the plans around handling corporate vs community orgs, so I have no idea what the right classification for such a org would be

#

I believe the pending org request should say what kind of org it was

winter lion
#

Fair enough appreciate that insight (maybe this can be a banner on the website? as I know many people, from a quick search on twitter have the same question aside from friends of mine asking the same). Any idea when those mechanisms would be setup and/or what the backlog looks like for community orgs processing?

#

Not sure I see any information here re: the category

unreal jewel
#

looks like it was community

winter lion
#

We have a few libs we want to publish but they are dependencies to other packages which we would also like to publish & pin those dependencies so we are sort of blocked for doing the ones downstream until the ones upstream are published. Right now everything is just pointing to github which isn't the best (I hope that makes sense 😅 )

unreal jewel
#

and we have uh, like 500+ pending community requests it looks like. I'm not sure what stage going through them is currently in, just that there are some that have been approved

#

so that doesn't really help you answer your real question

winter lion
#

That's okay I guess, if there can be something done to expedite any of these for example if communities need to provide additional details for verification or something that can make the job easier please let us know 🙏 It might be worthwhile having some of this information in a banner on the PyPi site under that organizations section as it seems somewhat opaque. Really appreciate the information

unreal jewel
winter lion
#

Thanks so much @unreal jewel !

proud bison
# valid flame well, our case is that we determine metadata per release based on JSON API (so b...

@valid flame There is no metadata per release, metadata is per artifact: https://pypackaging-native.github.io/key-issues/pypi_metadata_handling/#metadata-contained-within-artifacts

As mentioned in that link, PyPI will soon have an API to get that: https://github.com/pypi/warehouse/issues/8254

GitHub

Currently a number of projects are trying to work around the fact that in order to resolve dependencies in Python you have to download the entire wheel in order to read the metadata. I am aware of ...

serene fern
#

PyPI already supports PEPs 658 and 714, but artifacts uploaded prior to the rollout are not covered.

wet flax
wet flax
valid flame
wet flax
valid flame
#

well, kinda. Poetry model was built around the JSON API (at least from what I dug up)

#

so the change of what is provided is kinda groundshaking

wet flax
#

nods.

#

100% understand why that would be a "one day, maybe" sort of thing then. :)

valid flame
#

well, I guess once the lockfile PEP gets accepted, we will have a lot on our plate... for now I am poking Sebastien to work on PEP 621 migration 😛

wet flax
valid flame
wet flax
#

As @gentle yacht said, you've got the problem of actually having users. :P

valid flame
#

exactly 😄

gentle yacht
acoustic panther
#

Working on an "analysis summary" box thingy

acoustic panther
#

@merry valve ^ Whenever you get the chance 🙂

oak pebble
merry valve
#

Sorry for the delay

oak pebble
acoustic panther
#

@merry valve I corrected the formatting issue. Apologies for mistakenly removing the global search bar. 😅

merry valve
oak pebble
obtuse torrent
#

quick q

#
ERROR    HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/        
         Wheel 'Red_DiscordBot-3.5.3-py3-none-any.whl' does not contain the     
         required METADATA file: red_discordbot-3.5.3.dist-info/METADATA        

Should PyPI be lower-casing the name here? it seems incompatible with what I get when building with setuptools rn and the normalization change that I recall was supposed to have some deprecation period

merry valve
#

working on it now

obtuse torrent
#

ah, sorry

merry valve
#

@obtuse torrent no worries! fix should be out in 10-15m

obtuse torrent
#

yup, saw it

#

thank you!

obtuse torrent
#

release went through, thanks again

oak pebble
#

Hey all. Is there a review process specifically for adding a package to the warehouse codebase? I could use a graphql package for the OIDC work I'm doing but I believe I could also very easily craft the query and send it with requests as a POST to our API. This could be a 2 part question I supposed:

  1. Is there a previous desire/need to have a graphql package in the warehouse codebase that has been waiting for more use cases?
  2. Is there a strict security policy to follow to include a new package in the code base?
tribal sedge
pliant obsidian
#

for 2: as a PyPI dependency, it would also be designated a "critical project", requiring 2FA for the maintainers

#

but 2FA will be required for all projects by the end of 2023, so we're all enabling 2FA already, right? 🙂

oak pebble
#

@tribal sedge , thanks for that. I've written the code with requests so far and I suspect that will be just fine.
For point 2, I'm curious, are there any open source licenses that are not allowed in the PyPI warehouse code base?
@pliant obsidian thanks for that additional note. Is "critical project" a term in the PyPI docs that i can read more about?

pliant obsidian
oak pebble
#

Thank you!

pliant obsidian
oak pebble
#

That's really cool. Has the community been onboard with the idea? And does anyone know if other language ecosystems require this yet or plan to?

pliant obsidian
#

some folk were less than pleased when it was first announced for critical projects, but I think there was some confusion to do with the giveaway, and things seem better now.

also GitHub will be requiring 2FA by the end of the year. and npm for at least the top 500

oak pebble
#

Understood and sounds about right. But great that PyPI is doing this.

oak pebble
#

Hey all, question for whomever. Please, only off the top of your head answer though. Don't dig, I'll dig to find the answer eventually if needed.

I'm currently working in some Jinja templates here: https://github.com/pypi/warehouse/blob/main/warehouse/templates/manage/account/publishing.html#L211

I want to add a change along these lines:

{% if request.flags.disallow_oidc("disallow-activestate-oidc") == false %}
        {% set publishers = publishers ++ ("ActiveState", activestate_form(request, pending_activestate_publisher_form)) %}
  {% endif %}

I want to do that for both GitHub and ActiveState. The issue I'm having is that flags.disallow_oidc takes admin.flags.AdminFlagValue Enum class and I'm not sure how to get access to that in a Jinja template. I should be able to find it by digging but if anyone knows off the top of their head how I can do that, please let me know.

#

Oh i think I found it. Looks like I can add it to the View class that renders the template.

oak pebble
#

Ahh. I was still wrong. Turns out I didn't need to do that at all and I can just do
if not request.flags.disallow_oidc(AdminFlagValue.DISALLOW_ACTIVESTATE_OIDC)
in the Jinja template. Applogies for the noise.

oak pebble
#

Hey all. I had a question in one of my PRs a few weeks ago but haven't had a respones yet. Does anyone have time to take a look and advice if my idea is ok? https://github.com/pypi/warehouse/pull/14063#issuecomment-1633304299

GitHub

Make it easier to mint tokens with additional publishers
This PR splits warehouse.oidc.view.mint_token_from_oidc into two functions. 1, mint_token, is OIDC provider specific. The other, mint_toke...

merry valve
wet maple
#

If someone has the "Member" role in an organization, do they have permission to upload releases to any projects in the organization, or do they have to be added directly or as a team to a project?

#

The docs are a little unclear on exactly who has permission to do what regarding users/teams/orgs/projects combinations.

wet maple
#

And is it possible to require all members of an org have 2FA enabled?

tribal sedge
wet maple
tribal sedge
wet maple
#

Which specific projects? All the projects specific to the org? Or projects they've been specifically added to? And does own/maintain mean "publish releases", or "modify the project settings in PyPI"

wet maple
#

yes, I've read that, it does not clarify the table, it basically just restates it

#

in fact it just has the same table at the bottom

#

There are two problems with the table that I need clarification on. It deals with the organization overall, but there's no description of how permissions apply to individual projects in the organization. And it doesn't distinguish "publish release" as a type of action/permission.

tribal sedge
#

The "Project roles" section makes statements to explain the difference between Owner and Maintainer, which solves part of your question

wet maple
#

Can you explain it to me in different language? I'm obviously not getting it.

#

Let's say I'm an owner of an org, and another person is a member. Can the member upload releases to a project owned by the org?

tribal sedge
#

Only if they are a Collaborator for the project, which they could inherit if the Team they are on is added to the project

#

Projects have Collaborators, Collaborators have Roles (Owner, Maintainer). A Collaborator can be a Member, or a Team (of Members)

#

Adding a Project to an Org doesn't grant any Members other than the Org's Owners the Collaborator role of Owner.

#

Does this help?

wet maple
#

Thanks, that's the clarification I needed.

tribal sedge
#

Great! If you think of some ways of expressing that more clearly in the docs, please send a PR!

pliant obsidian
spice hull
#

https://github.com/pypi/inspector/issues/145
Tossing this up here, not imminent nor urgent.
This should serve the side effect of yanking code that has intentionally been whitespaced right (such as in malicious codebases) back into the parent element.

GitHub

Closes #144
Significant whitespace overflows, such as those that may exist when many tabs or whitespace characters are intentionally inserted into the codebase, will now wrap to maintain a fixed wi...

spice hull
#

This whitespace wrapping is going to be the death of me. I cannot be the first one that has attempted to fix this. 🥴

spice hull
astral quarry
#

Hi, is slur an abandoned or prohibited project name?

cinder dome
#

Hello all, is there any chance PyPI changed the procedure for rendering the Description core metadata?

We received the following issue: https://github.com/pypa/setuptools/issues/4008#issuecomment-1670448675 saying that https://pypi.org/project/setuptools/61.0.0/ (and following versions) is not rendering correctly.
I checked on the wayback machine, and https://web.archive.org/web/20230329061036/https://pypi.org/project/setuptools/61.0.0/ seems to indicate the project page used to render correctly for v61 in 29/Mar/2023.

My hypothesis was that maybe there was a syntax error. So I did the following to verify (I am just verifying the latest version of the package, which also has problems to render):

addr="$(curl -sI https://files.pythonhosted.org/packages/py3/s/setuptools/setuptools-68.0.0-py3-none-any.whl | sed -n 's/location:\s*\(.*\)/\1/p' | tr -d '[\000-\037]\177')"
curl -s "${addr}.metadata" -o /tmp/setuptools-METADATA
tail -n 70 /tmp/setuptools-METADATA | pipx run rstcheck -
# ...
# Success! No issues detected.

This seems to indicate that the restructured text part of the file is fine.

#

My next hypothesis was that maybe the METADATA file itself had problems. So I did the following to verify (using on going work on https://github.com/pypa/packaging/pull/686):

rm -rf /tmp/.venv
python3.11 -m venv /tmp/.venv
/tmp/.venv/bin/python -m pip install 'packaging @ git+https://github.com/brettcannon/packaging@ef1be866e0f56939e50106f4393f4e9437bcc676'
/tmp/.venv/bin/python -c 'from packaging.metadata import Metadata; print(Metadata.from_email(open("/tmp/setuptools-METADATA", "rb").read()))'
# ...
# packaging.metadata.ExceptionGroup: ('unparsed', [InvalidMetadata("unrecognized field: 'license-file'")])

The validation complains about the non-cannonical license-file field, but no other errors. My understanding is that PyPI has been OK with projects using license-field for a while.

So my next hypothesis is that maybe this is related to the abscence of Description-Content-Type, but according to the spec this should be fine (assuming there is no error in the RST):

If a Description-Content-Type is not specified, then applications should attempt to render it as text/x-rst; charset=UTF-8 and fall back to text/plain if it is not valid rst.

I was wondering if anyone could help me to understand where is the problem.

cinder dome
#

It seems to be related to https://github.com/pypi/warehouse/issues/14064 - I did search the issue tracker before writing here, but I haven't seen this one because it was closed as solved 😝
The solution mentions "an admin page feature to rerender the page with the fix". I don't have the rights to access the management page for setuptools, so I checked the management interface of one of my projects, but I cannot see a button or link to rerender the page (the "options" drop-down in https://pypi.org/manage/project/validate-pyproject/release/0.13/ just shows Download, View Hashes and Delete).

Manually issuing a rerender of all versions between setuptools 61.0.0 and 68.0.0 is not an ideal solution :P, is there any other alternative?

tribal sedge
young niche
#

I've requested for a company organization (it's now pending), although we are a research group at a university. It there any way to change it to a community organization?
(I'm not sure where I am supposed to ask.)

oak pebble
#

Just hit the 404 page for pypi. Well done all. 4 mins of my life i'll not get back, nor do i want it back. i love that sketch.

oak pebble
#

@merry valve have you done any work or had any thoughts about how to make the Pending Publishers table be more generic? I finally got to hooking up the ActiveState form to create a pending publisher and can now create but the table to display your pending publishers is specific to GitHub. Wanted to ask if you had anything in mind there before I try anything.

#

Here's a screenshot to job memories:

#

Note that shows a bug in my code that it allowed me to create two for the same named project. I'm fixing that.

subtle ember
merry valve
merry valve
oak pebble
oak pebble
#

General PR practice question that I haven't seen referenced in the docs: What's the practice around squashing commits before merging a PR? Internally at work we fairly aggressively squash commits, often down to just one for a PR. Is that desired? Something you all would want to avoid? No opinion and I can go ahead and do that if it's what i'm used to?

merry valve
vagrant parcel
#

Hi! I stumbled upon a problem with pytorch caused by incorrect metadata in pypi's json - the wheels themselves are fine.

I wonder 1) how did that happen? Isn't json metadata based on the wheel metadata? Could that be a pypi problem? 2) is there a way to fix it for me as an external controbutor (pull request to pytorch that would fix it for the next release) or perhaps it requires a maintainer status on pypi?

https://github.com/pytorch/pytorch/issues/105731

GitHub

🐛 Describe the bug Asking Pypi.org for a list of dependencies for torch does not provide a complete list of dependencies. See the list from pypi's json: ~$ curl -s 'https://pypi.org/pypi/to...

serene fern
#

The /pypi/ API simply inspects the first artifact of a release and shows its metadata. It is famously inaccurate and you should not use it.

vagrant parcel
# serene fern The `/pypi/` API simply inspects the first artifact of a release and shows its m...

It is poetry that uses it in my case

https://github.com/pytorch/pytorch/issues/104259#issuecomment-1680225094

Is there an alternative that poetry should be using? I assume, other than downloading all wheels, a few GBs more than the one needed

GitHub

🐛 Describe the bug import torch,then: import torch,then: Versions import torch,then: cc @seemethere @malfet @ptrblck

#

Or maybe the solution would be for all wheels to have the same metadata? It should be OK for a windows wheel to have additional lines with platform_system == "Linux" and platform_machine == "x86_64" right?

valid flame
oak pebble
#

Hopefully easy question: I was going to get some work done on a flight yesterday, I'm too cheap to pay for wifi on the plane. When I tried to run the warehouse tooling with the make commands it hangs on trying to download some metadata for one of the images from docker.io. Has anyone else run into this and is there an easy way to work around it? Or am I doing something wrong? Maybe if I had run the make commands right before my flight things would have been updated and not needed to download metadata?

I have run these commands many times before so it wan't that the images we building from scratch for the first time (I don't think any way. I haven't dug into how the make commands are implemented)

oak pebble
#

Hrm, now that I'm on internet and running the command again, it looks like it's downloading a lot more than it normally would in this context. I think something else, not repo/tooling related, might have occurred and it needed to rebuild more than i was expecting. I'll remember to run the commands I need before my flight back home this time 😄

tribal sedge
#

Beyond make build to build the container images for changes, run make initdb && make serve once to pull any dependencies and compile anything else needed to run the stack

spice hull
#

Has the conversation about starjacking ever occured to any real capacity?
We've noticed an uptick in Thonny related packages being uploaded, utilizing the default Thonny metadata, and subsequently presenting themselves heavily as an official Thonny client. It appears they are benign educational packages from a class being taught somewhere in China.

The concern I have is that a novice user could easily search Thonny and unintentionally choose 'x-thonny' from the above image, which is directly utilizing Thonny's metadata to populate things like GitHub stars and whatnot. This is a common attack, as I'm sure you're entirely aware of.

My proposition for solving this would be to detect already utilized Github links, or allow critical packages to reserve their Github links in the package metadata, and refuse the upload of packages utilizing already-occupied Github page metadata, or to automatically peel this metadata from the package's source before it's allowed.

There's some extensions to this idea; notably that packages being uploaded through Trusted Publishing likely have the ability to enumerate the repository that they came from, and subsequently, assert that the owner of the account is also the owner of said repository. There's some caveats to this, but I think it does a lot to cut down on potential attack vectors such as this.

Deconflicting this would be remarkably simple for individuals, to simply assert that they are the owners of the repository which is linked; and largely would be automated for the vast majority of packages that are using GHA and Trusted Publishing to push their packages from GH to PyPI.

ancient compass
#

I'm looking to see how popular >=4.0.0 is vs <=3.10.0 to justify updating a min-version for typing_extensions.Self

ancient compass
#

Yeah I used it a while ago it's very easy to consume all the free credit

pliant obsidian
#

you can do things like pypinfo -pc 'pip==21.*' pyversion version and pypinfo -pc 'numpy==1.23rc3' pyversion version with https://github.com/ofek/pypinfo, see usage in the readme
add --days 1 to reduce amount of quota used

tribal sedge
# spice hull Has the conversation about starjacking ever occured to any real capacity? We've...

This indeed came up recently. The tracking issue dates back to 2020, and I made bascially the same conclusion that Trusted Publishing is probably the way to go, since that's the onyl true real link we have.
https://github.com/pypi/warehouse/issues/8462#issuecomment-1701741254

Trying to deduplicate becomes an arms race of sort, similar to namesquatting, and introduces more complexity to the data models and validation, so I'd prefer to stay away from that if possible.

Would you be interested in submitting a PR that takes advantage of Trusted Publishing, and only displays the GH sidebar if the release was uploaded via TP?

meager umbra
#

Wouldn't it be possible to do validation by requesting that the project upload a particular file to the repository?

tribal sedge
#

@meager umbra not a bad idea - very similar to other confirmation mechanisms, but we've already created the mechanism for Trusted Publishing for GitHub, and that confirms that both sides are correct, and is specific to a release, not a project or URL

spice hull
#

Shockingly entertaining that this conversation came up a mere 24 hours prior to... me bringing that up.

latent pier
#

PSA: For any folks here involved in FOSS supply chain security and related topics that will be in or somewhat near-ish the Bay Area on Thursday September 28, @solid fable is looking for speakers for a panel on supply chain security in open source, to be held in University of California Santa Cruz Engineering Building 2 (room 506) from 2:50 PM - 4:15 PM PDT as part of the 2023 UC Open Source Symposium. You can follow up with Juanita via Discord at @solid fable , or by email at jgomez91@ucsc.edu . Thanks!

agile sinew
#

FYI, from #build, is there a way to see all taken filenames for a project, even if they were from before the project was released? build 1.0.1 seems to be unavailable (but 1.0.0 was fine...)

#

Also, what's the status of orgs? I requested scikit-hep immediatly back when they were announced at PyCON (Apr 23), and it's still pending. (Maybe I should have requested it from the scikit-hep user 🙂 )

merry valve
merry valve
agile sinew
#

It is a community org. That's fine, no rush at all, just wondered what the status was. Given I submitted it within an hour of the announcement, that sounds like a scarily large backlog!

#

Okay, I'll see if I can set up the query, then, should be easy.

#

That's not showing a file with that name. I only see the 42 files I know about.

#
SELECT filename
FROM `bigquery-public-data.pypi.distribution_metadata`
WHERE name = "build"
tribal sedge
spice hull
tribal sedge
buoyant anchor
#

I suppose that would mean referencing the entire list of known tags for all plats

merry valve
buoyant anchor
#

😄 should've searched

chrome zealot
#

This is I guess more of a maturin question than a PyPI one, but am I the first person to try to get maturin to use trusted publishers? Google+GitHub suggests maybe (by not showing me anything)

#

Anyone know how to trick it into using it? My latest attempt is to set MATURIN_PYPI_PASSWORD to "", I'll find out in... 3 minutes whether this time that works

#

It workksssss. Hooray, I won't blow on it, lest it fall over. 4th time is a charm.

tribal sedge
#

Curious, if you're already downloading the artifacts from a prior step, any reason to not use the trusted publisher action instead?

merry valve
#

+1, is there some advantage to publishing with maturin?

pliant obsidian
#

~1,200 "security placeholder packages" uploaded by the Yandex Security Team
"to prevent Dependency Confusion attacks against Yandex": https://pypi.org/user/yandex-bot/

spice hull
#

Or tencent pushes.

steel trout
chrome zealot
gentle frost
wicked wind
#

Are there any PyPI admins around that can help with a friendly PEP 541 transfer?

candid trout
#

Hey guys, can someone kindly help me with a banderswitch config issue?

#

I am only trying to download 1 specific version of python (all files for windows/linux/egg/tgz/ etc...) of Python 3.6.8 -- to host in my offline lab (using bandersnatch)

Does this config file do what I need?

Please verify that this config file below will only download Python python 3.6.8 (and no other versions)?

# cat /etc/bandersnatch.conf | grep -v '^;' | sed '/^$/d'

[mirror]
directory = /mnt/mylabnas01/repos/pypi
json = false
release-files = true
cleanup = false
master = https://pypi.org/
timeout = 10
global-timeout = 1800
workers = 5
hash-index = false
simple-format = ALL
stop-on-error = false
storage-backend = filesystem
verifiers = 3
compare-method = hash
[allowlist]
platforms =
py3.6.8

surreal niche
#

Just preparing a post to discuss.python.org for the next few days. Is it fair to say that warehouse has limited its scope to PyPI operations only, and that there is no intention for it to be used for anything other than PyPI.org, or are there plans that warehouse might one-day be a tool for running a repository in the same way that devPi is (and are there people already doing that?)?

valid flame
#

I would say that's true. I am not an authority on the topic though

obtuse torrent
#

Warehouse is specifically the codebase for the official Python Package Index, and thus focuses on architecture and features for PyPI and Test PyPI. People and groups who want to run their own package indexes usually use other tools, like devpi.

#

You can use warehouse for other repositories but it is architectured for large scale which means it uses a bunch of services that you might not really need on a smaller scale. I think it's designed such that you can use it without some of those services (i.e. Fastly) but I'm not sure if that's the case for all and how it could affect the experience

#

same as Secrus, am not an authority on the topic, just trying to help in the meantime

valid flame
#

PyPI is global scale and it shows in it's code

unreal jewel
#

Warrehouse makes zero attempts to be usable outside of PyPI itself, it’s open source and we’re not going to actively try and make it painful if someone wants to setup their own instance. But we’re going to put zero effort into supporting it or thinking about it as a use case, so it probably won’t be the best experience to try and do that.

#

The stuff we have to make it possible to use without a specific service is mostly us trying to not integrate too deeply with one particular provider at any point in time, because we rely on donated services and those donations can always go away, so we try to add a layer of abstraction where we can to make it easier to swap if needed.

We also have to support running locally for development, so some services don’t have a reasonable local option so we need abstractions to make it possible to swap to a more local friendly option in development.

unreal jewel
surreal niche
#

Thanks all. You've confirmed my understanding 👍.

candid trout
rare wadi
#

This is the #pypi channel.
There’s a #bandersnatch channel at the top of the channel list.

Also, anyone who’s not already in that other server, which is a fair number of the people here, are not going to be able to access that link. It’s just going to say “this link goes to a server that you are not a part of”. If you’re going to link to different servers, it’s good to let people to know which server it is, or even add an invite for it.

#

Oh you actually did post there first firHide

candid trout
#

Hello, Can someone please help me. I posted my question on stackoverflow

https://stackoverflow.com/questions/77400104/how-can-i-download-pypi-repository-which-only-contains-packages-for-python-3-6-8

I am trying to create a offline repo for pypi for only python 3.6.8 packages with bandersnatch mirror. It downloaded everything but skipped the "requests" package, and I dont know why?

Please, can someone kindly help me. Thank you so much!

violet fable
#

this is really great to see, props to everyone involved :)

wicked wind
#

How is test.pypi.org managed? We recently completed a friendly transfer of snowflake on pypi.org, but can't do any testing on test.pypi.org because someone unrelated to the real package is already a registered owner on test.

tribal sedge
unreal jewel
#

well technically I think ou can still do PEP 541 on it, ut it's so low down the priority

wicked wind
#

Yeah, that's what I suspected. It'd be kind of nice if like the whole test.pypi.org db got reset once per day or some such. Anyway, for now, I won't worry about it, and will revisit if needed.

oak pebble
#

I'm running the tests locally to try and get more information but they are taking a very long time to complete.

oak pebble
#

Ahhh line 242.

oak pebble
#

Any tricks for making the tests run faster and still get a full coverage report? I'm waiting close to an hour each change to see if I've fixed the missing coverage.

#

Or is there a way to get the coverage report without actually running all the tests?

tribal sedge
tribal sedge
oak pebble