#internals-and-peps

1 messages ยท Page 18 of 1

spark magnet
#

It might be that it needs new syntax to handle def f(**kwargs): properly, so we can use def f(***all_args):

dusk comet
#
args = {'a': 1, 0: 98, 1: 99, 'b': 2}
def f(*args, **kwargs): return args, kwargs
``` if you pass `**args` to this function, positional args will be pulled out into `*args`, they will not be in `**kwargs`
spark magnet
#

right, so we might need triple-star

raven ridge
#

hm, yeah - my initial reaction is that it ought to raise TypeError when unpacking a dict containing positional args for a call to a function that doesn't accept positional args, but you're right that this would mean that wrapper functions would need to keep accepting *args.

#

maybe triple star is a good workaround for that... And reasonably intuitive, I guess

#

though if we had a triple star, I think we ought to have it both at the call site and the parameter list

faint river
#

***bargs = both args and kwargs lemon_smile

#

oh god imagine saying star star star bargs outloud in a class

raven ridge
#

this seems like a reasonably good idea to me. I've wished for something like this occasionally. You can usually just pass all parameters by keyword, which isn't so awkward - but that doesn't work if the function has positional-only arguments, and then you are stuck building up both a sequence and a mapping for the positional vs keyword arguments

#

actually, though - if that's the only major use case for this - using one mapping to provide all the arguments to a function that uses positional-only arguments - maybe we don't need new syntax at all. Maybe we just need something new in functools

faint river
raven ridge
#

yeah, that's what I'd expect

faint river
#

can't wait to see this in python 3.17 lemon_fingerguns

spark magnet
faint river
#
def f(a): ...
f(***{"a": "abc", 0: "def"})

what happens here?

raven ridge
#

an error, just like if you did f(*["def"], **{"a": "abc"})

spark magnet
#

i like that @raven ridge is writing the PEP for me!

faint river
#
def f(a, **kwargs): ...
f(***{0: "def", "a": "abc"})

how about this?

#

would you get "a" in kwargs?

spark magnet
faint river
#

would think as such

#

now for the kicker

#
def f(a, **kwargs): ...
f(***{"a": "abc", 0: "def"})
#

(order is swapped)

spark magnet
faint river
#

so first you must filter out all int keys and sort them I guess

spark magnet
#

yes

#

and decide what this means: f(***{99: "a"})

#

(a typeerror)

raven ridge
#

Maybe something in functools like ```py
def apply_mixed_args(func, mapping):
pos = {}
kwargs = {}
for key, val in mapping.items():
if isinstance(key, int):
pos[key] = val
else:
kwargs[key] = val

args = []
try:
    for i in range(len(pos)):
        args.append(pos[i])
except KeyError:
    raise TypeError("Arguments to pass positionally must be contiguous integers beginning with 0")

return func(*args, **kwargs)
#

I've wanted this rarely enough that solving it with a new stdlib function seems better to me than solving it with new syntax

faint river
#

it does seem like a very niche problem for new syntax yeah. matmul operator is cowering

raven ridge
#

naming that function might be the toughest part, heh

faint river
#

apply_bargs lemon_fingerguns_shades

spark magnet
#

barge_into_function

raven ridge
#

heh

faint river
#

apply_neds_bats

feral cedar
#

kool_aid

raven ridge
#

the idea of a new stdlib function sidesteps a lot of the issues and ambiguities we mentioned above, too. There's no question of what happens if you do ```py
def foo(**kwargs):
pass

apply_mixed_args(foo, {0: 42})
``` It's an error, because you pass positional args to a function that doesn't take them. There's no question of what happens if you do foo(*args, **kwargs, ***bargs) in one function call, because - well, you can't.

#

and I suspect that selling people on triple star would be considerably harder than selling them on an enhanced double star, but you're right - the enhanced double star couldn't be transparently proxied, so callables that currently accept *args and **kwargs would need to keep doing so in the future even if f(**{0: 42}) could be used at the call site.

#

there's another nasty case that happens if you allow two-star f(**{0: 42}) actually: the implementation would need to detect f(**{0: 42, 1: 43}, **{0: 10}) and raise a TypeError for that as well

#

I guess my opinion is that triple star is a bad idea (too magical for too niche a feature, especially for something that's possible today and just inconvenient). I think a new stdlib function would be enough, but if we did any new syntax, I don't think it should be more than just allowing integer keys in mappings unpacked with ** in a function call, without changing the behavior for **kwargs parameters in a function at all - they'd still only receive the keyword arguments, and you'd still need to use *args to receive positional arguments.

dusk comet
dusk comet
# raven ridge an error, just like if you did `f(*["def"], **{"a": "abc"})`

even simpler example: ```py

def f(a): ...
...

f(0, a=0)
โ•ญโ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ in <module> โ”‚
โ”‚ โ•ญโ”€ locals โ”€โ•ฎ โ”‚
โ”‚ โ”‚ f = f โ”‚ โ”‚
โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
TypeError: f() got multiple values for
argument 'a'

#
def _(a, b=0, /, c=1, *args, *, d=2, **e, ***f): ...

syntax hell

charred wagon
fallen slateBOT
#

Modules/clinic/posixmodule.c.h line 8067

os_ftruncate_impl(PyObject *module, int fd, Py_off_t length);```
fallen slateBOT
#

Modules/posixmodule.c line 11784

os_ftruncate_impl(PyObject *module, int fd, Py_off_t length)```
charred wagon
#

Thanks. GitHub search did not pick that up for some reason

#

I wonder why its docs say

Truncate the file corresponding to file descriptor fd, so that it is at most length bytes in size

When the linux manual says

The truncate() and ftruncate() functions cause the regular file
named by path or referenced by fd to be truncated to a size of
precisely length bytes.

My guess is the Python docs are trying to be more general, and for some platforms they cannot make the stronger guarantee of "precisely length bytes". Python doesn't seem to do anything special with the length when it calls ftruncate.

#

Or maybe I am misinterpreting the way it's worded

raven ridge
#

POSIX says:

If fildes refers to a regular file, the ftruncate() function shall cause the size of the file to be truncated to length. If the size of the file previously exceeded length, the extra data shall no longer be available to reads on the file. If the file previously was smaller than this size, ftruncate() shall increase the size of the file. If the file size is increased, the extended area shall appear as if it were zero-filled. The value of the seek pointer shall not be modified by a call to ftruncate().

#

Old versions instead said:

If the file previously was smaller than this size, ftruncate() shall either increase the size of the file or fail. [XSI] [Option Start] XSI-conformant systems shall increase the size of the file. [Option End]
but even that doesn't allow for ftruncate to succeed without setting the size of the file to exactly the given length. So... ๐Ÿคทโ€โ™‚๏ธ

charred wagon
#

Thanks. Yeah, that is confusing. I will just trust the linux manual on this.

#

It's the behaviour I observed in practical tests anyway

grave jolt
#

the opposite of cringe

spark magnet
grave jolt
#

yes!

spark magnet
#

it always strikes me as the opposite. I'll have to try to remember ๐Ÿ™‚

sand goblet
#

They mention it makes initialization faster. but it looks like it makes it MUCH faster, at least on my Windows 10.

#
static int
bytearray___init___impl(PyByteArrayObject *self, PyObject *arg,
                        const char *encoding, const char *errors)
/*[clinic end generated code: output=4ce1304649c2f8b3 input=1141a7122eefd7b9]*/
{
    void *sval;
    Py_ssize_t count;
    PyObject *it;
    PyObject *(*iternext)(PyObject *);
    
    //
    // existing code
    //
    
    /* Is it an int? */
    if (_PyIndex_Check(arg)) {
        count = PyNumber_AsSsize_t(arg, PyExc_OverflowError);
        if (count == -1 && PyErr_Occurred()) {
            if (!PyErr_ExceptionMatches(PyExc_TypeError))
                return -1;
            PyErr_Clear();  /* fall through */
        }
        else {
            if (count < 0) {
                PyErr_SetString(PyExc_ValueError, "negative count");
                return -1;
            }
            if (count > 0) {
                if (self->ob_alloc == 0) { // new bytearray
                    if (!_canresize(self))
                        return -1;
                    // remember to avoid overflow by using size_t. see issue #22335.
                    sval = PyObject_Calloc((size_t)count + 1, 1); // + 1 for null terminator
                    if (sval == NULL) {
                        PyErr_NoMemory();
                        return -1;
                    }
                    self->ob_bytes = self->ob_start = sval;
                    Py_SET_SIZE(self, count);
                    self->ob_alloc = (size_t)count + 1;
                    return 0;
                }
                if (PyByteArray_Resize((PyObject *)self, count))
                    return -1;
                memset(PyByteArray_AS_STRING(self), 0, count);
            }
            return 0;
        }
    }
#

here's how I timed this change:

#
from timeit import timeit

setup = """
def f(n):
    b = bytearray(n)
    return b
"""

for n in range(12):
    print(timeit(stmt=f"f({n**10})", setup=setup, number=1000))
#

and here are the timing results:

times using calloc:
0.00034580007195472717
0.00039679999463260174
0.0008060999680310488
0.0025179999647662044
0.011856100056320429
0.011817399994470179
0.013045100029557943
0.016800999990664423
0.03710949991364032
0.04992749996017665
0.1916151000186801
0.5652574999257922

times without using calloc:
0.00024830002803355455
0.00025889999233186245
0.000635799951851368
0.0014845000114291906
0.2839431999018416
2.6696265999926254
15.101725699962117
74.59119629999623
I gave up here
#

someone should update the __init__ method so it does uses calloc.

#

They mention a bug with the other person's change not detecting an existing memoryview, but that's solved easily by just checking for it using _canresize

rose schooner
#

maybe just a local change

rose schooner
#

although uh

sand goblet
#

Maybe it's because it's a little faster with smaller sizes

rose schooner
sand goblet
#

Yeah

rose schooner
#

and there's not many cases where someone needs to allocate 1 MB with bytearray()... right?

sand goblet
#

Are there a lot of cases where they need to allocate a large number of small bytearray?

#

Either way, allocating a large one is still a one-time thing, so I guess it shouldn't matter either way

dusk comet
static hinge
#

do/while will always run at least once because the condition is after the block.

#

it's kind of like shutes and ladders

misty oxide
#

I'm writing a Bytecode -> Bytecode transform, and trying to support nonlocal variables.

My idea was to replace all freevar LOAD_DEREF instructions with a chain of instructions that does fn.__closure__[instr.arg - len(code.co_cellvars)].cell_contents, where fn is the python object of the currently executing function, and code is the code object being translated. Then, I turn all cellvars into local variables.

The replacing is actually sound. It's what the cpython does under the hood, so I'm just replacing one instruction with many.

What I'm worried about is what happens if I decide to empty the list of cellvars and freevars. Am I allowed to do that? Will something in python function creation or execution go wrong if I don't leave a proper trail of freevars and cellvars?

feral island
#

you probably need to make sure that the relevant fields on the code object match reality

misty oxide
#

Probably

#

But, even if there are no STORE_DEREF/LOAD_DEREF instructions in the bytecode?

misty oxide
#

A related follow-up question that I just thought of. In python 3.11+, is it okay if there's a nonlocal variable and a local variable with the same name?

feral island
#

it isn't possible. if you generate your own bytecode and code objects you can probably make it work, but it will be fragile

misty oxide
#

๐Ÿ‘

tacit hawk
#

Is there some C memcpy equivalent for Python's bytearrays?

buffer = bytearray(16)
data = b'1234'
buffer[:8] = data # len is now 16 - 4 = 12, it should be still 16
pliant tusk
#

!e ```py
data = b'1234'

what you did:

buffer = bytearray(16)
buffer[:8] = data
print('replace first 8 with data:', buffer)

replace first 4

buffer = bytearray(16)
buffer[:4] = data
print('replace first 4 with data:', buffer)

replace last 4 of first half

buffer = bytearray(16)
buffer[4:8] = data
print('replace last 4 of first half with data:', buffer)```

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | replace first 8 with data: bytearray(b'1234\x00\x00\x00\x00\x00\x00\x00\x00')
002 | replace first 4 with data: bytearray(b'1234\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
003 | replace last 4 of first half with data: bytearray(b'\x00\x00\x00\x001234\x00\x00\x00\x00\x00\x00\x00\x00')
swift imp
#

Pep 727 proposal looks delicious

#

Opens up a lot of code de-duplication

steel solstice
#

how so?

swift imp
#

Like pandas has their doc decorators that transforms and appends docstrings between various methods and functions. Can instead define types that are annotated with a doc and bake it into the type annotation

fallen slateBOT
#

pandas/util/_decorators.py line 408

class Substitution:```
fallen slateBOT
#

pandas/util/_decorators.py line 455

class Appender:```
dusk comet
#

Editor developers (VS Code and PyCharm) have shown some interest, while showing concerns about the verbosity of the proposal, although not about the implementation (which is what would affect them the most). And they have shown they would consider adding support for this if it were to become an official standard. In that case, they would only need to add support for rendering, as support for editing, which is normally non-existing for other standards, is already there, as they already support editing standard Python syntax.

What does it mean "support for rendering"? Editors already can render code

tacit hawk
pliant tusk
#

you can just write only the first 4 instead of the first 8 with the slice

pliant tusk
#

then you don't need to align it

feral island
dusk comet
#

Ah, ok

peak spoke
#

I thought of the annotated docstring a while back and the editor support has been the main thing that came to mind, the code would look a bit too busy imo with something longer on the docstring and if the editor couldn't collapse it

#

though the doc call looks a bit weird when everything else you'd see in annotations uses brackets instead of parentheses

tacit hawk
ripe tinsel
#

I have a feature proposal to improve python as a language: an asyncronous/multithreaded for loop.

It occurred to me that the majority of the "for item in list" loops in the code I have optimised are processes that can run independently. This appears to be the general case with for loops, but not with while loops. Therefore, it would be convenient for a user to have a build-in keyword like mfor (multi-threaded for) or afor (async for). Alternative syntax could be something like "for item in list.async()". Await() and join() functions will be necessary for these loops.

This will be convenient for developers and beginners alike, and should allow users to speed up loops in many instances with minimal code.

#

Also, if the compiler notices that only mathematical operations are happening inside a loop, you can have the compiler send it to the GPU is CUDA is available

flat gazelle
#

Unfortunately, due to the GIL, this is mostly useless. Unless the for loop is doing IO, splitting it across threads will do just about nothing, even if each iteration is independent.

#

This is something best left to the .map methods on threadpools etc.

urban sandal
feral cedar
#

there's also Executor in concurrent.futures

dark umbra
#

Anyon knows book Begaining python how is it?

willow torrent
#

can I count on you?

feral island
static hinge
#

Guido has come for your GIL

feral island
#

Sam Gross rather

static hinge
#

I don't think we have a sticker of him

umbral plume
#

a mfor keyword for multi-threaded for loops still sounds like syntactical sugar for sending stuff to be ran by a threadpool, which in turn means there'd have to be some trickery where the following indented block of code is secretly a function - it all sounds a little messy

static hinge
#

it introduces a new scope

#

Didn't we just eliminate a scope for list comprehension?

#

one step forward, 2 steps back

feral island
#

we also added one for type parameters ๐Ÿ˜„

#

and listcomps still have their own scope, it's mostly just an implementation change

static hinge
#

I would suggest just giving all for loops their own scope, but that would absolutely break things

umbral plume
#

if it had its own scope, with the same rules as classes or functions, that'd unfortunately break things as simple aspy count = 0 for i in range(10): if i % 2 == 0: count += 1 since count is now no longer a local variable within the loop

feral island
#

the bigger problem is the data race

dusk comet
#

maybe we should introduce "weak scopes":

  • if variable you are assigning to appears in surrounding scopes - use it
  • if not - it is a local variable
count = 0
for i in range(10):
    is_even = i % 2 == 0 # local
    if is_even:
        count += 1 # nonlocal
count # some value
is_even # error
umbral plume
#

i can't quite find the words, but such scoping rules sound a little.. arbitrary, i dunno, since it kinda lulls you into a false sense of security of loops having their own scope, until you accidentally shadow a variable name from an outer scope

dusk comet
#

agree

umbral plume
#

i like the idea, but at that point we're approaching just bringing in a let or var keyword into python (though such an idea does sound super interesting!)

dusk comet
#

there is already a thing that does the same thing as let/var: it is an annotation: x: T, it forces x to be a local variable

umbral plume
#
>>> stuff = [5,6,7,8]
>>> def foo():
...     stuff: list
...     stuff.append(9)
...
>>> foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in foo
UnboundLocalError: cannot access local variable 'stuff' where it is not associated with a value
``` TIL
dark umbra
#

Hello anyone know book Begaining python?

urban sandal
jade raven
static hinge
sand goblet
#

There's no way to do this in the current API

random thistle
#

Anybody else looking forward to the โ€œnogilโ€ project? Itโ€™s going to be quite complex to keep reference counting in a threadsafe way. I know there were some advocating moving to a pure garbage-collected model, like Java. But then even a simple script can end up swallowing all of memory, if it is left running long enough. And we would like to avoid that.

Also I know some people want to avoid the name โ€œPython 4.0โ€, after all the pain of the 2.xโ†’3.x transition. But I think the threadsafeness changes will have sufficient implications for backward (in)compatibility that calling the new version โ€œ4.0โ€ would be a good idea. What do you think?

spark magnet
raven ridge
#

and to limit backwards incompatibility to things using the C API, as I understand. The intention is for nothing in the Python API to need to change.

feral island
#

I really don't think it will be anything like Python 2/3. Pure Python code will not need to change to support nogil; by contrast, pretty much every Python program had to be changed to support Python 3. C extension code will more often require changes and that migration will take a lot of ecosystem effort, but most people aren't writing C extensions

raven ridge
#

yeah. Extension module devs are a very important subset of the Python developer ecosystem, but they're a tiny portion - I'd wager that fewer than 1 in 1000 Python users ever interacts with the C API

random thistle
worthy sandal
#

Hi guys, i saw a programming question named knightโ€™s sequence. What does it mean by a knight sequence. What is 10-key sequence of knight?

spark magnet
quick snow
#

I think the more interesting compatibility break is in the other direction: New code that is developed on nogil Python, and runs (but with terrible performance) on older versions. We have these breaks whenever a new Python version comes out, but usually it's easy to tell (because the code just won't work on older versions).

urban sandal
raven ridge
#

Things using Cython should get it for free, too.

#

Though there will still be separate gil/nogil ABI tags for a while, and that will need handling from each project

feral island
#

It's going to be pretty hard in Python code to distinguish two threads running truly concurrently (with nogil) or switching at arbitrary points (current situation)

sand goblet
#

So threading will make python faster like how it does in other languages?

feral island
#

And it will also make your code more prone to hard-to-debug race conditions

sand goblet
#

Compared to multiprocessing?

raven ridge
#

Yes, actually. Threads share more state than processes, so there's more ways to have data dependency bugs

urban sandal
#

people shouldn't have anything but the minimal neccessary shared state (Which is often "nothing") when doing things concurrently, and use the appropriate stategy for guarding that shared state based on the concurrency patterns in use. But what people "Should" do is very far away from a lot of real world code.

feral island
#

Yes, I guess the practical effect of nogil is that it will become a lot more important to make pure-Python code threadsafe, even if technically what you need to do to achieve thread-safety isn't terribly different from the current world

dusk comet
# feral island That assumption is already wrong though. You can already use threading and threa...

This is not true. Threads can switch only if GIL is dropped, which can happen (usually) after executing any bytecode instruction => threads can't switch in-between instruction

For example, consider operation x[::] = y. Lets assume that x and y are huge lists. This operation happens in C with acquired GIL, so no other thread can so anything => this operation is atomic in some sense.

But if there is no GIL, other threads can do different things to x and y while data is copied, which can result in broken data.

I can imagine a lot of code with these implicit assumptions about atomicness of some operations. And some atomic operations (that happens in C code) will become not atomic in noGIL, which will break code

feral island
raven ridge
#

also switching only happening between bytecode instructions or when the GIL is explicitly released doesn't help as much as you might assume, because all sorts of operations that look atomic can cause a Python __del__ method to fire, which runs new bytecode and gives new places where a context switch can happen.

#

!e ```py
class C:
def del(self):
print(x)

x = {1: C(), 2: C()}
x.update({1: 1, 2: 2})

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | {1: 1, 2: <__main__.C object at 0x7f75834eb2d0>}
002 | {1: 1, 2: 2}
raven ridge
#

the __del__ there sees an intermediate state where x is half-updated.

raven ridge
#

this is a persistent annoyance when writing C extensions: any Py_DECREF of an object of user-controlled type can cause a Python __del__ to run, and that __del__ could examine the state of your extension module or call into it (or more likely yield control to another thread that does), so the extension module dev needs to guarantee that their module and all of their objects are in a sane state that's OK to be exposed to users whenever they call Py_DECREF

random thistle
feral island
#

Python code doesn't explicitly give up the GIL though

urban sandal
#

^. Any python code which breaks by changing this would have to have been relying relying on extremely subtle internal behaviors that aren't even guranteed to be consistent across implementations, or on native code that was being called holding the GIL for them.

sand goblet
#

What does getting rid of it accomplish?

raven ridge
#

by "it" you mean the GIL? Greater parallelism for multi-threaded code

sand goblet
#

Will that do anything besides make multithreaded pure python faster? Because it seems like that wouldn't matter that much since it would still be slower than using Cython / C-extension.

raven ridge
#

it will make multithreaded C extensions faster as well

#

C extensions currently need to acquire the GIL whenever they want to create a Python object, store a reference to a Python object, call a Python callable, allocate memory with the Python allocators, inspect an object using the Python C API, etc. There's many things that extension modules can do today without holding the GIL, but there's also many that they can't, and so the GIL forces some operations to be done in serial rather than in parallel

dusk comet
#

Why there is no noGIL single-threaded python build already?
Isnt that super simple? Just remove gil, any kinds of locks, forbid threading and that's it. It is way to get free performance in single-threaded apps

#

I heard that dropping and reacquiring GIL constantly is pretty slow

feral island
sand goblet
#

You have to stop holding it manually, right? So in a single threaded program, you can just hold it the whole time?

feral island
#

the GIL is very nice for single-threaded performance, because it means interpreter-internal data structures don't have to worry much about threading

#

that's why past attempts to remove the GIL tended to lead to huge performance regressions

feral island
dusk comet
sand goblet
#

Shouldn't it not affect single threaded python programs?

#

Just never release it. Then there's no overhead for releasing/acquiring it.

dusk comet
raven ridge
raven ridge
sand goblet
#

So basically, it starts making a difference when running multithreaded C code which has a lot of interactions with Python

#

Or when running multithreaded pure python.

raven ridge
#

right, just in general: it allows multithreaded code to have greater parallelism

radiant garden
rotund furnace
#

hi

cyan raven
#

What is the best way of proposing a pep, I mean I have to attach some code snippet that adds the stuff to the source code or at least try to explain it.
Just create a patch? or add the stuff on my local fork, and link that directly?

steel solstice
#

What are you proposing?

cyan raven
steel solstice
#

Oh the cancel thing

cyan raven
steel solstice
#

Yeah so you probably need to double check with guido that he'll sponsor it, his email is guido@python.org I'm not entirely convinced it'd need a pep but I'd trust him

celest garden
#

Hi. I want to contribute to python, is this a good place to get started with that? Or is my best bet joining the mailing list and looking for a mentor? I'm a student and want to learn more about how python works "under the hood." I've read some of the documentation about contributing.

cyan raven
timber fossil
#

(C++) How do i have multiple python interpreters in one process?

#

i am using 3.12

#

I want to have a plugin system, where you can add files to a folder and they all will independently execute

#

Can be in sequence or threaded

dusk comet
#

Subinterpreters?

timber fossil
#

that sounds like it

#

how do i use them?

dusk comet
#

Read C-API about them

timber fossil
#

this ?

dusk comet
#

Yes

timber fossil
#

How do i append modules to it?

#

or are they global

dusk comet
#

Modules have to be in sys.path in order to be imported

timber fossil
#

well i am providing my own modules

#

and disallowing for any external c modules and such

#

no io no os ...

dusk comet
#

I dont think that is possible

#

Consider not using python as scripting language

timber fossil
#

I already modified the python source code

dusk comet
timber fossil
#

is PyImport_AppendInittab global? as in: every interpreter has the same modules from it?

dusk comet
#

IIRC, all subinterpreters have their own object, objects cannot be a part of two subinterpreter worlds

timber fossil
#

what about c functions

#

same thing?

dusk comet
#

There is a paragraph about this in the docs

timber fossil
#

so can the subinterpreters call built-ins which i added?

#

if the threads will each be executing different files do i still have to worry about the GIL?

#

nothing will be shared between threads

vagrant musk
#

who here is an expert in creating tools using python?

timber fossil
#

do the threads end themselves after the script is done executing?

#

or do i need to manually check whether a thread has stopped execution

cyan raven
spark magnet
cyan raven
# spark magnet this is "the cancel thing"? Is there a thread somewhere about it? Maybe it does...
#

it's not that hard, but it seems like everyone gave up working on it ๐Ÿ˜„

#

so it can be implemented in the task.py

spark magnet
#

and he commented twice in the last hour

cyan raven
#

I see

tacit hawk
#

is byte ordering of the bytes passed in concurrent calls to socket.sendall()preserved?

tacit hawk
raven ridge
quick snow
unkempt rock
jade raven
#

!d collections.deque

fallen slateBOT
#

class collections.deque([iterable[, maxlen]])```
Returns a new deque object initialized left-to-right (using [`append()`](https://docs.python.org/3/library/collections.html#collections.deque.append)) with data from *iterable*. If *iterable* is not specified, the new deque is empty.

Deques are a generalization of stacks and queues (the name is pronounced โ€œdeckโ€ and is short for โ€œdouble-ended queueโ€). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.

Though [`list`](https://docs.python.org/3/library/stdtypes.html#list) objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for `pop(0)` and `insert(0, v)` operations which change both the size and position of the underlying data representation.
jade raven
#

This was my solution

cyan raven
#

What is the difference between status accepted and status final?

#

In pep

umbral plume
#

IIRC, "accepted" means the PEP's been accepted and is being worked on, and "final" is for once said PEP has been added and, well, finalised

#

https://peps.python.org/pep-0001/#pep-review-resolution

Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and incorporated into the main source code repository, the status will be changed to โ€œFinalโ€.

cyan raven
#

is there a pep about descriptors as well?

steel solstice
feral island
cyan raven
raven ridge
#

It would be perfectly correct for a future version of CPython to change list.append in a way where if it was called concurrently from different threads, only one of the two items winds up being added in the end, for instance

dusk comet
#

that will break a lot of code

raven ridge
#

depending on the version of Python you're running. += for int isn't atomic. ```py

cat test.py

from concurrent.futures import ThreadPoolExecutor

x = 0

def increment_x_n_times(n):
global x
for i in range(n):
x += 1

with ThreadPoolExecutor(max_workers=10) as executor:
for i in range(10):
executor.submit(increment_x_n_times, 100_000)

print(x)

```shell-session
$ python3.9 test.py
360530
$ python3.9 test.py
655863
$ python3.11 test.py
1000000
$ python3.11 test.py
1000000

The fact that int += is atomic in some versions and not in others wasn't documented anywhere - if you look at the "what's new in Python 3.10" and "what's new in Python 3.11" pages, you won't find this mentioned anywhere - because it's an implementation detail that's subject to change.

raven ridge
#

and because neither behavior was documented, it wouldn't be surprising if this changes back in some future version, or behaves differently in some other Python implementation

quick snow
#

Interesting. We use list.append with threads in production code, but it's a list of errors that is (hopefully) only rarely appended to (and if an entry went missing, it didn't really matter), so I think we're fine.

frigid bison
#

You use threading in production?

#

Threading has always been an unstable mess for me, what are you using to control the threads?

quick snow
raven ridge
#

"unstable"? that's an interesting take. I think threading is much less prone to subtle breakage than either multiprocessing or coroutine-based event loops like asyncio. multiprocessing is prone to subtle performance issues due to serializing data to send between processes as well as weird edge conditions (like, what happens if a process in the pool gets killed by the OOM killer while holding a multiprocessing lock?). coroutine-based event loops allow you to easily block the event loop and prevent parallelism without realizing you've done so

calm hawk
#

/avatar @dull ferry

dull ferry
#

w

#

gl finding it eheheh

swift sigil
#

yes

next dagger
cyan raven
#

is this a good way of testing this code?

    def test_task_cancel_and_await(self):
        # phase 1
        async def coro():
            t = self.new_task(self.loop, asyncio.sleep(1))
            await asyncio.cancel_and_await(t)
            self.assertTrue(t.cancelled())

        self.loop.run_until_complete(coro())

I haven't seen any specific section specialized for asyncio code testing(in cpython source code).
Should I just put it under the async BaseTask class?

cyan raven
neat delta
#

strictly speaking, internals of non-cpython python flavors might also fit here - they just don't usually (ever?) come up

raven ridge
#

We've talked occasionally about pypy internals here

#

MicroPython and CircuitPython, too, actually

worthy sandal
alpine rose
#

cpython gc question. for reasons, i have gc off.

fallen slateBOT
#

redis/connection.py line 515

if isinstance(response, ResponseError):```
alpine rose
#

are my only options: a) gc.collect, or b) in every frame that could be a parent of this delete any local that's costly to keep around?
is there an easy change i can make to redis to prevent the cycle?

dusk comet
#

You can delete your exception instance, it will decref all frames, and your locals will be deallocated
If you really need to store exception, you can create copy of it, like that: type(e)(*e.args) (im not sure).

#

And probably you can remove frames references from exception manually

alpine rose
fallen slateBOT
#

redis/connection.py line 372

self.read_response()```
alpine rose
#

but there's something i don't understand, since i'm not able to make a minimal repro that exactly corresponds

feral island
#

the function you linked is a little weird in that the exception is apparently returned from something it's calling?

alpine rose
fallen slateBOT
#

redis/_parsers/base.py line 88

return ResponseError(response)```
fallen slateBOT
#

redis/_parsers/resp2.py line 43

return error```
alpine rose
fallen slateBOT
#

redis/connection.py line 516

raise response```
fallen slateBOT
#

redis/connection.py line 373

except ResponseError:```
feral island
#

I tried this and now I'm wondering what this list could possibly be ๐Ÿ˜„ ```>>> import gc

gc.disable()
def inner():
... e = Exception()
... try:
... raise e
... except: pass
...
def outer():
... x = ["special string"]
... inner()
...
outer()
print([x for x in gc.get_objects() if type(x) is list and "special_string" in x])
[[b'print', 'print', b'(', 'print', b'[', b'x', 'x', b'for', 'x', 'x', 'x', b'x', 'x', b'in', 'x', b'gc', 'gc', b'.', b'get_objects', 'get_objects', b'(', b')', b'if', 'gc', b'type', 'type', b'(', b'x', 'x', b')', 'x', 'x', 'x', b'is', 'type', b'list', 'list', b'and', 'list', b'"special_string"', 'special_string', b'in', b'x', 'x', b']', 'x', b')', 'x', 'x', b'', 'print', 'print', 'print', 'print', 'print']]

#

oh this actually works, I just used an underscore instead of a space

#
>>> gc.disable()
>>> def inner():
...     e = Exception()
...     try:
...             raise e
...     except: pass
... 
>>> def outer():
...     x = ["special string"]
...     inner()
... 
>>> outer()
>>> print([x for x in gc.get_objects() if type(x) is list and len(x) == 1 and "special string" in x])
[['special string']]
#

I think there is a reference cycle between the exception and the frame locals for inner

#

and that's leaving the frame and locals for outer alive

#

putting finally: del e in inner fixes it

alpine rose
#

hmm i was doing something similar, but was doing print(gc.get_referrers(x)) in outer which ends up being empty

feral island
#

does that just not work if gc is disabled?

#

no that's not it

raven ridge
#

I think there is a reference cycle between the exception and the frame locals for inner
Right - the exception e has a reference to the most recent frame, which has a reference to that frame's locals, including e - so that's your reference cycle. And the reference to the most recent frame also holds a reference to the calling frame, so that's what's keeping x alive

#

So:
most recent frame -> locals() -> e -> most recent frame
most recent frame -> second most recent frame -> locals() -> x

feral island
#

Yes. And I think gc.get_referrers doesn't work at that point because the reference is still owned by the frame, which doesn't participate in GC because the interpreter knows how to dispose of the reference

#

But after the function returns, the locals survive in a frame object. If I do gc.get_referrers() on the list afterwards, I see a reference from a frame object

#
[<frame at 0x101b55010, file '<stdin>', line 3, code outer>, [['special string']]]
alpine rose
#

okay, thanks, the gc.get_referrers behaviour was what i didn't understand / confused me a little

#

i also don't know the answer to my original question: 1) is there anything i can do to resolve the cycle other than gc.collect(), 2) is there an easy change to redis that would remove the cycle? seems like i can't really weakref anything

feral island
#

the change to redis would be to do try: raise response finally: del response

#

possibly you can fix it in user code by finding the cycle objects by trawling through gc.get_referents and then mutating something so that the cycle goes away? that would be very fragile though

alpine rose
#

thank you!

cyan raven
dusk comet
#
>>> from enum import Enum
>>> class X(Enum):
...   a = 1
...   b = 1.0
...
>>> X.a
<X.a: 1>
>>> X.b
<X.a: 1>
>>> X.a is X.b
True
``` is this the expected behaviour?
grave jolt
#

but I assume every item that compares equal to 1 will go as 1?..

#

but that's an uh

#

stretch

grave jolt
dusk comet
#

idk, i get why it is what it is
it is just a bit weird

#

and not explicitly documented

grave jolt
#

Maybe a bit implementation-specific, but there's a thing called _value2member_map_, which is a dictionary

dusk comet
#
>>> class X(IntEnum):
...  a = True
...  b = 1
...  c = 1.0
...
>>> X.a, X.b, X.c
(<X.a: 1>, <X.a: 1>, <X.a: 1>)
>>>
>>> class X(Enum):
...  a = True
...  b = 1
...  c = 1.0
...
>>> X.a, X.b, X.c
(<X.a: True>, <X.a: True>, <X.a: True>)
grave jolt
#

well that's a certified bruh moment

dusk comet
dusk comet
#

i think enums in python are overengineered

flat gazelle
#

yeah, there is no real way for those values to be an integer subtype, due to the way the inheritance works out (which is an interesting limitation of inheritance I think may be worth expanding on)

eager ocean
#

help me guys

dusk comet
radiant garden
#

Enum members can be arbitrary values

#

I think by default equal values are aliased too

#

but i agree that the standard enums are a bit odd

dusk comet
#
>>> nan = float('nan')
>>>
>>> class X(Enum):
...   a = nan
...   b = nan
...
>>> X.a
<X.a: nan>
>>> X.b
<X.a: nan>
>>> X.a is X.b
True
>>>
>>> class X(Enum):
...   a = float('nan')
...   b = float('nan')
...
>>> X.a
<X.a: nan>
>>> X.b
<X.b: nan>
>>> X.a is X.b
False
#

there is a shortcut in most builtin collections: they first check for identity, and only then for equality

grave jolt
#

ah yes it probably uses a dict for the lookups

misty oxide
#

Does anybody have a favorite library to use for cpython bytecode/CodeType assembly?

dusk comet
#

dis + bytecode (from pypi)

#

!pypi bytecode

fallen slateBOT
misty oxide
#

I'm at a point where I have a list of valid dis.Instructions, but I need to populate co_codestring

dusk comet
#

you mean co_code? (which is a bytes array of opcodes)

misty oxide
#

Ah, cool.

#

I'm going to doublecheck, but I think it legitimately was co_codestring very briefly in 3.10.

#

But yeah

#

co_code

#

It may just be the CodeType constructor that calls it that.

#

Thank ya, looks like bytecode was what I was looking for.

dusk comet
#

even in python2

misty oxide
#

Intellisense is a dirty liar

#

(3.10)

dusk comet
#

hmm, it is indeed called __codestring in typeshed (the place where all typed signatures of stdlib live)

misty oxide
#

Wack

feral island
#

it's positional-only there though apparently

#

so have fun with that

misty oxide
#

Very fun

#

We already have a big series of if/elif statements on version, so not that big a deal.

#

Perfectly willing to fill up a screen or two.

dusk comet
#

there are other inconsistencies in naming and ordering in __init__ and replace

feral island
misty oxide
#

Are they supposed to match 1-1?

dusk comet
#

no, but it would be nice

merry bramble
swift imp
feral island
#

They change a lot from one version to another, which causes pain around releases and upgrades

grave jolt
#

they are... way too extensible, or whatever

swift imp
#

Gotcha. I think they're weird to be classes too. Like the way we define them, they're nothing like any other class def and it's difficult to tell what you can and cannot do

#

I also do not understand how enum.property works at all. Like this makes no sense to me

Note the property and the member must be defined in separate classes; for example, the value and name attributes are defined in the Enum class, and Enum subclasses can define members with the names value and name.
grave jolt
#

This might sound crazy, but with typing being so popular, we could just use strings or whatever

#

Like ```py
ColorChannel = Literal["red", "green", "blue"]

#

Well, one thing it doesn't let you do is iterate over the members

#

actually

#

!e

from typing import get_args, Literal
ColorChannel = Literal["red", "green", "blue"]
print(get_args(ColorChannel))
fallen slateBOT
#

@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.

('red', 'green', 'blue')
grave jolt
#

boom

#

And if you want flags, you don't want flags, use a set instead

feral island
#

maybe you're doing it wrong

grave jolt
#

I- ok that's strange

#

Well that page enumerates so many use cases

grave jolt
feral island
#

then to define your actual enum, you inherit from that abstract class and then you can have members with the same name

dusk comet
#

enums arent powerful enough
they cant represent values with two orthogonal properties
for example, my things are either red or green, and they can be round or rectangular
there is no way to represent that as enum conveniently

feral island
#

!e import enum class Abs(enum.Enum): @property def prop(self): return 42 class E(Abs): prop = 3 print(E.prop.prop)

grave jolt
#

which might be enums

fallen slateBOT
#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

42
feral island
#

wait I didn't even use enum.property

grave jolt
#

lmao

#

||bottom text||

#

Maybe eventually there will be enum2, which will be an adaptation of Rust enums. Which will also cover the trivial case

#

a.k.a. algebraic data type, union type, sum type, discriminated union, tagged union, variant record, sealed traits, disjoint union, variant, choice type, coproduct, disjoint coproduct, tagged variant, product dual, tagged product dual, discriminated coproduct, or intuitionistic logical disjunction under the Curryโ€“Howard correspondence

dusk comet
# grave jolt a pair of `Color` and `Shape`?..

my use-case was i bit more complex
i had 5 file types:

  1LangCache
2Lang  2Cache
HDLang HDCache
``` rows represent one property, columns - another
`1LangCache` has both column properties
i wanted to represent these 5 values as enum (or enumflag, i dont remember) in such a way, that i will be able to check if value has 1st or 2nd property, but was unable to do that
#

i wanted API that looks like this: ```py
e = MyEnum(...)
e.is_lang()
e.is_cache()
e.is_1()
e.is_2()
e.is_hd()

grave jolt
dusk comet
grave jolt
dusk comet
grave jolt
#

Well, I can't do this without trickery

dusk comet
grave jolt
#

and I suck at understanding thoughts in english

#

If you have two orthogonal properties, why can't you like, unite them into a tuple?

dusk comet
#

i cant do (Color.RED & Color.GREEN, Shape.ROUND)

grave jolt
#

({"red", "green"}, {"round"})

dusk comet
#

then it is not enum!

#

so enums suck

radiant garden
#

you want something that fits a pattern except for special cases and want it to be elegant to implement?

dusk comet
grave jolt
#

({Color.RED, Color.GREEN}, {Shape.ROUND})

grave jolt
grave jolt
#

๐Ÿ™‚

radiant garden
#

i find myself just refactoring that away when possible shrug

grave jolt
#

I just add ifs

#

makes my job veri secure

#

Like that file with a cyrillic ั in the name instead of latin c.

#

True story.

#

Someone probably did break-dancing on their keyboard and accidentally inserted the wrong c into a file.

dusk comet
grave jolt
dusk comet
grave jolt
#

but of course did not fix the import

#

and so when the code was attempted to build at the new system, there was a very nasty error message like No module named the_foo_and_ั_things

#

It was there, just with a latin c

#

Now that's where VSCode's highlighting of suspicious characters would help. But everyone who has ever opened mixed cyrillic/latin text in VSCode just disables it

#

anyway...

dusk comet
#

oh, that is indeed hard to do without magic

grave jolt
#

...where each variant could hold some data, like rs enum Color { Rgb { red: u8, green: u8, blue: u8 }, Rgba { red: u8, green: u8, blue: u8, alpha: u8 }, Variable { name: String }, }

#

I usually just make a union of dataclasses, but it's kinda verbose and might be a bit WTF-ish to the reader

feral island
#

ADTs are nice. I'm not sure they should be the same concept as enums though

#

I mostly use enums for things I want to make sure can go in an integer column in a database

grave jolt
#

Yeah, I think Rust has some unfortunate naming here

#

what I meant was, if we had such a feature, it would technically also cover the trivial case of ```rs
enum Color { Red, Green, Blue }

radiant garden
#

data goosedance

grave jolt
dusk comet
# grave jolt ...where each variant could hold some data, like ```rs enum Color { Rgb { re...

how would you store values inside enum members?
i can think of this: ```py
class Color(...):
rgb: tuple[int, int, int]
rgba: tuple[int, int, int, int]

c = Color(...)

how to check if it is a rgb kind?

c.kind == Color.rgb

how to get value from it?

c.value == (1,2,3) # ?

class IPAddrKind(...):
V4, V6

k = IPAddrKind(...)
if k.kind == IPAddrKind.V4:
print('v4 detected')
k.value # what is this? None? AttributeError?

#

i think i dont quite understand rust enums right now

grave jolt
#

something like ```py
class Color(ADT):
@case
def rgb(r: int, g: int, b: int): ...

@case
def rgba(r: int, g: int, b: int, alpha: int): ...

@case
def variable(name: str): ...

color1 = Color.rgb(255, 0, 255)
color2 = Color.variable("foo")

#

though this is still pretty verbose, idk

dusk comet
#

can rust "enums" have methods?
if no, you can omit @case

grave jolt
#
class Color(ADT):
    class Rgb(Case):
        r: int
        g: int
        b: int

    class Rgba(Case):
        r: int
        g: int
        b: int
        alpha: int

    class Var(Case):
        name: str
grave jolt
grave jolt
dusk comet
#

how would this work at runtime?

#
  • Color.Rgb(1,2,3) - what is this object? is it an instance of Color or Rgb? Or both?
  • how to check if my thing is of kind Color.Rgb?
#
class IPAddrKind(ADT):
    class V4(Case): pass
    class V6(Case): pass


grave jolt
# dusk comet how would this work at runtime?

Well, it almost works now if you remove the inheritance. Only minor touches are needed, like:

  • make it impossible to inherit from Color
  • swap the bare classes so that Rgb, Rgba and Var all inherit from Color
  • dataclassify the classes
#

So it would desugar to something like ```py
class Color:
pass

@dataclass(frozen=True)
class __Rgb(Color):
r: int
g: int
b: int

@dataclass(frozen=True)
class __Rgba(Color):
r: int
g: int
b: int
alpha: int

@dataclass(frozen=True)
class __Var(Color):
name: str

Color.Rgb = __Rgb
Color.Rgba = __Rgba
Color.Var = __Var
prohibit_further_subclasses(Color)

dusk comet
grave jolt
#

well, that's bikeshedding

dusk comet
#

prohibit_further_subclasses is doable by patching cls.__flags__ (there is some flag for final classes, that is why you cant subclass NoneType or bool)

swift imp
grave jolt
#

This still requires some changes to type checkers so that they understand exhaustiveness of matching a Color against all 3 cases. I have no idea how easy or hard it is to implement

#

and the further question is: do we really need this? are ADTs common in Python code? wouldn't a union of dataclasses work just as well?

#

Though asking if ADTs are common in Python when the only way to have them is my butt-backwards union is perhaps not very fair. It's like asking why the city should build a bike path if nobody cycles on the highway

#

but maybe it's not that backwards

#

It is composing already existing constructs, and it's not totally clear what a new "official" construct would add

feral island
fallen slateBOT
#

taxonomy/db/models/name.py line 2278

class NameTag(adt.ADT):```
feral island
#

it does work but is um not type-checker friendly

grave jolt
#

ah I did something similar

#

i don't remember why but it required meta-metaclasses

#

it was many years ago...

#

oh no, I am old

#

๐Ÿ˜ฆ

swift imp
#

I'm reading pep 3115 again, and I noticed this bit. Does it mean you can use __prepare__ as an instance method and have different affects?

The __prepare__ method will most often be implemented as a class method rather than an instance method because it is called before the metaclass instance (i.e. the class itself) is created.
merry bramble
#

Colour me intrigued

grave jolt
#

the secret dies with me

cyan raven
#

is it a bad idea to establish a new attribute for a task(asyncio) which describes the cancelled message? At the moment this is what I have found but this is strange and probably doing some unexpected stuff under the hood.

task._cancel_message  # users shouldnt use it

Basically, this attribute would have the same behaviour.

 except asyncio.CancelledError as e:
     print(e.args[0])
cyan raven
#

So we could implement a public attribute, something like this: -> task.cancel_message

grave jolt
#

My personal issue is that if this becomes default, it could have unintended side effects

#

Suppose you have a web application that logs every traceback. There might be some information you really do not want to log, like a credit card number together with its holder name.

misty oxide
#

How can I unbind a cellvar?

#
def make_cell() -> CellType:
    unbound: None # Unbound cellvar.
    return (lambda: unbound).__closure__[0]

cell = make_cell()
cell.cell_contents = 5
del cell.cell_contents # Is this sufficient?
misty oxide
#

Also, how can I set fn.__closure__ to a new tuple?

quiet crane
#

Is it possible to use threads/multiprocessing in python? I'm surprised and disappointed of the poor quality assurance. Is all of python like this?

https://github.com/python/cpython/issues/105829#issuecomment-1714593169

GitHub

Bug report Submitting many tasks to a concurrent.futures.ProcessPoolExecutor pool deadlocks with all three start methods. When running the same example with multiprocessing.pool.Pool we have NOT be...

hybrid relic
#

heard that python's getting a JIT compiler in 3.13, has it landed yet? Looking up optimizations on cpython on google hasn't yielded much other than the initial announcement about the compiler

rose schooner
jade raven
#

yes it is

#

I'm surprised and disappointed of the poor quality assurance. Is all of python like this?
bugs are often fixed when people report them, so if people don't report them or the issue isn't noticed by the core devs, then it might not be fixed promptly. it's the same in any other OSS language

paper echo
paper echo
#

consider paying a consultant to fix the bug for you

formal wyvern
#

wsg

#

im having a problem in the most basic thing

#

i dont want to take help from the forms

#

can anyone of yall help me?

#

dm me if you can help

grave jolt
formal wyvern
grave jolt
formal wyvern
#

It's a dumb problem

grave jolt
# formal wyvern It's a dumb problem

Help posts are fine for any kind of question. You will get help much quicker if you open a help post. Very few people are willing to help via DMs, especially when you haven't described your problem

formal wyvern
#

Sure

unkempt rock
#

Bros the pyi trusted publishers docs have this link to octo-org/sample-project on GitHub which doesnโ€™t exist and Iโ€™m crying

unkempt rock
#

Iโ€™m not quite sure if pypi warehouse wants the GitHub issue because itโ€™s not a pypi bug.. I got legacy upload for me personally just now

mild moss
unkempt rock
#

Ah I see, thatโ€™s too bad the release.yml thing seemed cool

clear kindle
#

is there a clean way to use logging lib to output to a different file every log? basically RotatingFileHandler but not based on time or size

dusk comet
#

You can reimplement RotatingFileHandlet with small changes

quiet crane
quiet crane
raven ridge
# quiet crane there was no stress test at all ๐Ÿ˜ณ

it has tens of thousands of users who use it in production, and no one else reported the bug you encountered in the many years since it was introduced. I think you overestimate how common the conditions required to trigger the bug are. If this is your first time finding a bug in a language's standard library, welcome to the club! All software has bugs, and I think it's quite sad that your attitude here is "how could a bug make it into the standard library?", rather than focusing on the good things - that there's an immediate workaround (using multiprocessing.pool instead), that there's an easy patch to apply (just affecting Python code, which you can patch without needing to recompile anything), that there's no security implications, etc, etc

#

You're acting as though this is a major issue, when in fact it's quite minor, as far as issues in a language's internals go. At least there's no CVE attached, heh

#

also, if you're disappointed by the level of testing that some particular module receives, note that the test suite is open source and you're welcome to contribute enhancements. Trying to assign blame to people who worked on this module in the past seems much less productive than trying to improve its quality going forward.

#

Though of course, note that race conditions are by definition non-deterministic, and are just about the hardest type of bug for a test suite to detect.

quiet crane
#

I don't mean to put the blame on a person. I'm just bummed I got bit by a bug and needed to ventilate my frustrations.

quiet crane
raven ridge
quiet crane
#

Yes, I'll be more mindful of my approach and tone. Sorry ๐Ÿ™

hybrid relic
little bloom
#

Hey Everyone
I'm starting a course on udemy named 'The complete 2023 web development boot camp'.
I am searching for a buddy to join me on the course.
Let's learn together and help each other.
Please drop me a message if an anybody's interested. Thank you!

jade raven
hybrid relic
#

In the JVM ... and other adaptive VMs, switching between tiers can be expensive

#

surprisingly accurate haha

dusk comet
#

Where are all CPython branches? I dont see them on github, i see only branches for major versions...

rose schooner
dusk comet
#

Im pretty sure in the past there were a lot of different branches

#

Maybe im wrong

steel solstice
#

They could've just been deleted

burnt rose
#

hey

steel solstice
#

I do remember there being a few branches that were feature related

burnt rose
#

is there someone i could talk?

rose schooner
#

they're still there but they don't show up for some reason

#

oh

#

i think they're tags now

worthy salmon
#

hey

#

i am learning python but sometime can't do a simple problem if it is new

#

can you please help me

#

and how to make coding notes if anyone know please give me sugessions

cyan raven
#

are there any peps about __call__ and __new__?

spark magnet
cyan raven
merry bramble
# cyan raven well, I just want to learn a bunch of low-level stuff about them.

The data model docs are often quite good for this kind of thing. They don't have much on __call__, but they have information on __new__ and on metaclasses:

merry bramble
#

Often you can learn a lot just by playing around with code, as well. Here's a little script to illustrate the order in which various methods are called when classes are created and called:

class Meta(type):
    def __new__(mcls, name, *args, **kwargs):
        print(f'{name}: entering metaclass __new__')
        cls = super().__new__(mcls, name, *args, **kwargs)
        print(f'{name}: exiting metaclass __new__')
        return cls

    def __init__(cls, *args, **kwargs):
        print(f'{cls.__name__}: entering metaclass __init__')
        super().__init__(*args, **kwargs)
        print(f'{cls.__name__}: exiting metaclass __init__')

    def __call__(cls, *args, **kwargs):
        print(f'{cls.__name__}: entering metaclass __call__')
        new = super().__call__(*args, **kwargs)
        print(f'{cls.__name__}: exiting metaclass __call__')
        return new


class Klass(metaclass=Meta):
    def __init_subclass__(cls, *args, **kwargs):
        print(f'{cls.__name__}: entering class __init_subclass__')
        super().__init_subclass__(*args, **kwargs)
        print(f'{cls.__name__}: exiting class __init_subclass__')

    def __new__(cls, *args, **kwargs):
        print(f'{cls.__name__}: entering class __new__')
        obj = super().__new__(cls, *args, **kwargs)
        print(f'{cls.__name__}: exiting class __new__')
        return obj

    def __init__(self, *args, **kwargs):
        print(f'{self.__class__.__name__}: entering class __init__')
        super().__init__(*args, **kwargs)
        print(f'{self.__class__.__name__}: exiting class __init__')

    def __call__(self, *args, **kwargs):
        print(f'{self.__class__.__name__}: entering class __call__')
        print(f'{self.__class__.__name__}: exiting class __call__')
        return 42


class SubKlass(Klass): pass

obj = Klass()
obj()
#

idk how to do the bot command thing but here's the output:

Klass: entering metaclass __new__
Klass: exiting metaclass __new__
Klass: entering metaclass __init__
Klass: exiting metaclass __init__
SubKlass: entering metaclass __new__
SubKlass: entering class __init_subclass__
SubKlass: exiting class __init_subclass__
SubKlass: exiting metaclass __new__
SubKlass: entering metaclass __init__
SubKlass: exiting metaclass __init__
Klass: entering metaclass __call__
Klass: entering class __new__
Klass: exiting class __new__
Klass: entering class __init__
Klass: exiting class __init__
Klass: exiting metaclass __call__
Klass: entering class __call__
Klass: exiting class __call__
cyan raven
#

!e

class Meta(type):
    def __new__(mcls, name, *args, **kwargs):
        print(f'{name}: entering metaclass __new__')
        cls = super().__new__(mcls, name, *args, **kwargs)
        print(f'{name}: exiting metaclass __new__')
        return cls

    def __init__(cls, *args, **kwargs):
        print(f'{cls.__name__}: entering metaclass __init__')
        super().__init__(*args, **kwargs)
        print(f'{cls.__name__}: exiting metaclass __init__')

    def __call__(cls, *args, **kwargs):
        print(f'{cls.__name__}: entering metaclass __call__')
        new = super().__call__(*args, **kwargs)
        print(f'{cls.__name__}: exiting metaclass __call__')
        return new


class Klass(metaclass=Meta):
    def __init_subclass__(cls, *args, **kwargs):
        print(f'{cls.__name__}: entering class __init_subclass__')
        super().__init_subclass__(*args, **kwargs)
        print(f'{cls.__name__}: exiting class __init_subclass__')

    def __new__(cls, *args, **kwargs):
        print(f'{cls.__name__}: entering class __new__')
        obj = super().__new__(cls, *args, **kwargs)
        print(f'{cls.__name__}: exiting class __new__')
        return obj

    def __init__(self, *args, **kwargs):
        print(f'{self.__class__.__name__}: entering class __init__')
        super().__init__(*args, **kwargs)
        print(f'{self.__class__.__name__}: exiting class __init__')

    def __call__(self, *args, **kwargs):
        print(f'{self.__class__.__name__}: entering class __call__')
        print(f'{self.__class__.__name__}: exiting class __call__')
        return 42


class SubKlass(Klass): pass

obj = Klass()
obj()
fallen slateBOT
#

@cyan raven :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | Klass: entering metaclass __new__
002 | Klass: exiting metaclass __new__
003 | Klass: entering metaclass __init__
004 | Klass: exiting metaclass __init__
005 | SubKlass: entering metaclass __new__
006 | SubKlass: entering class __init_subclass__
007 | SubKlass: exiting class __init_subclass__
008 | SubKlass: exiting metaclass __new__
009 | SubKlass: entering metaclass __init__
010 | SubKlass: exiting metaclass __init__
011 | Klass: entering metaclass __call__
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/RMNJCU4Y64IGDAQTEAKUBR2THE

worthy salmon
#

can we make

cyan raven
feral island
#

I have never had to touch that file myself though

#

Mark Shannon described async generators as a Jenga tower of state machines

flat gazelle
#

That does check out

naive saddle
#

aren't coroutines mostly just a generator under the hood though? with extra state to make the "generator" awaitable

#

or is that assuption outdated nowadays?

flat gazelle
#

If you stare at the file enough it will start to make sense, C or no C knowledge

#

A coroutine is internally very similar to a generator, though they differ in one field each IIRC.

#

Most C functions that work on generators also work on coroutines

random thistle
# naive saddle aren't coroutines mostly just a generator under the hood though? with extra stat...

I would say, generators were initially introduced as a cut-down form of coroutine. The asyncio module was first introduced in 3.4, and implemented in what I thought was a really horrible way, entirely based on generators. Thankfully, the language designers realized that here was a need for proper coroutines, and so async/await was added in 3.5, and asyncio reworked to use it.

I did this diagram I call โ€œVan Rossumโ€™s Triangleโ€ https://www.deviantart.com/default-cube/art/Van-Rossum-s-Triangle-679791228 which tries to illustrate the ways that control gets transferred between generators, coroutines and regular โ€œmainlineโ€ code.

wary fern
#

Hey folks, is there a way to mark a test as 'not multiprocess safe' when ran via test.regrtest?

jaunty steeple
#

A function to kill thread would be nice similar to multiprocessing.terminate()

urban sandal
#

generally speaking, you really really don't want to be trying to kill threads. design your threads such that you can signal to them to end what they are doing if you need that.

#

and that isn't python specific

verbal escarp
#

however there's the issue of applying numba to speed those up, so now i was wondering how to keep the code in an un-executed state until they are officially initialized

#

i was contemplating .py modules but only compile them to ast, then apply numba to that instead of importing things directly

#

keeping the inner functions as strings is a big no-no because that would negate any help from IDEs etc

#

so, any idea how to have python code parseable but not immediately executed at runtime?

jaunty steeple
paper echo
#

dealing with graceful shutdown can get hard very quickly in even trivial situations

urban sandal
# paper echo tbh i think this is a bit of FUD, *most* programs don't do things with external ...

I'd rather give someone generally true advice that errs on the side of them doing it correctly while steering them to the right means of doing it, than add something that's more likely than not, a footgun for someone else down the line. This isn't a new problem, it isn't python specific, it's a question that comes up somewhat regularly across languages from people who usualy don't have the experience about os threads and concurrency to know when it is or isn't "safe".

spark magnet
#

the reason you can't kill threads isn't because of external resources. It's because they could be holding locks.

maiden dune
#

qt's QThreads have a forcible terminate so if you absolutely need killable threads those are an option via pyqt/pyside

raven ridge
#

if you terminate a QThread while it holds the GIL, no other thread will ever be able to acquire the GIL, and you'll deadlock your Python process. That sounds super unwise to me.

maiden dune
#

yeah, that is an issue. i guess one way to handle that could be to make sure the termination is run later in the event loop, like with a QTimer.singleShot. so e.g. if you need to terminate a QThread within itself QTimer.singleShot(0, self.terminate). would that be ok?

#

could there actually even be any case where a QThread is terminated from the outside while holding the GIL?

#

by the time you reach the .terminate call in another thread, the GIL would have already been handed over, and it wouldnt be released again until the terminate call is completed, right?

raven ridge
maiden dune
# raven ridge you'd need to read the code of pyside or pyqt to be sure. It wouldn't surprise m...

based on sip file for QThread, terminate wont drop the GIL:

public slots:
    void start(QThread::Priority priority = QThread::InheritPriority) /ReleaseGIL/;
    void terminate();
    void quit();

public:
    bool wait(unsigned long msecs = ULONG_MAX) /ReleaseGIL/;

same for pyside's shiboken xml, where these are the only methods with allow-thread=yes modifier, which according to here means the call gets wrapped in a Py_BEGIN/END_ALLOW_THREADS:

  <object-type name="QThread">
    <enum-type name="Priority"/>
    <modify-function signature="run()" thread="yes" />
    <modify-function signature="exec()" rename="exec_" allow-thread="yes" />
    <modify-function signature="msleep(unsigned long)" allow-thread="yes" />
    <modify-function signature="sleep(unsigned long)" allow-thread="yes" />
    <modify-function signature="usleep(unsigned long)" allow-thread="yes" />
    <modify-function signature="wait(unsigned long)" allow-thread="yes" />
    <modify-function signature="start(QThread::Priority)" allow-thread="yes">
      <modify-argument index="1">
        <rename to="priority"/>
      </modify-argument>
    </modify-function>
    <modify-function signature="exit(int)" allow-thread="yes" />
  </object-type>

so tl;dr is: seems like both in pyside and pyqt, QThread.terminate from the outside wont cause a deadlock since the GIL wont be handed back to the QThread before its termination is complete.

raven ridge
#

Well, it won't cause a deadlock on the GIL, at least. It could still cause a deadlock on a different mutex or semaphore, though

maiden dune
#

sure, but you specifically highlighted a deadlock involving the GIL, and that was what i was referring to

maiden dune
#

actually, circling back around to the original idea of adding an terminate to threading. how about a soft interruption instead? something that just sets a queryable flag on the thread. would reduce the rigamarole of setting up a graceful exit. something like

from threading import interruption_requested, Thread

def loop():
  while not interruption_requested():
    time.sleep(1)
  print('thread interruption requested!')

thread = Thread(target=loop)
thread.start()
time.sleep(10)
thread.interrupt()

and maybe it could automatically be set for any child threads after things like KeyboardInterrupt/SIGINT, SIGHUP, sys.exit, etc.

quick snow
peak spoke
#

That is somewhat funky because the thread can just be stuck in C code with no way of receiving the exception in a reasonable time if it's not made to expect it

quick snow
#

I haven't tested this in the real world, of course, but I think it should just raise at the next possible Python moment.

peak spoke
#

yeah, just that it could be far away depending on what it's doing. I also had some weird issue where I had delayed exceptions with pyside because some python code didn't check for exceptions

static hinge
fallen slateBOT
quick snow
dusk comet
#

Cant you do the same in Err.__del__?
Doing this by spawning threads feels weird

quick snow
unkempt rock
#

Result > *, error as value

urban sandal
#

I really think this comes down to teach people to design concurrent code to handle gracefully shutting down and error handling, rather than trying to come up with a "one-size can't quite fit all" solution.

flat gazelle
#

I do think there is value in having some better abstraction for thread exits than "if should_terminate:stop" in a loop every once in a while. But I am not really aware of one

urban sandal
#

when the best possible abstraction is worse than not abstracting it, it probably shouldn't be abstracted.

daemon threads already close with the interpreter, because there you don't care if the (interpreter's) locks remain held

for 1-shot things in a thread, like moving blocking fileio to a thread in an async program, you probably just want to let the fileio finish most of the time

for long lived background threads, you probably should have a work queue or some other means for the background thread to communicate, and this becomes a viable means to also send "okay finish up" and handle that as appropriate to your application, without looping on that.

maiden dune
# urban sandal This would be significantly worse in many real world code bases, and definitely ...

This would be significantly worse in many real world code bases, and definitely shouldn't be done automatically.

Could you elaborate on why that would be the case? With this approach, a thread would be free to completely ignore and never use the interruption flag. Having it be set automatically wouldn't change anything about existing code, while making it easier to cover some commonly encountered exit conditions for code that wants it.

If you have a thread that's handling a queue, just send a special value for the queue to finish work, and then you also allow gracefully closing the queue and aren't constantly busy looping for "maybe they want that cancelled"

Not all threads are handling queues.

I really think this comes down to teach people to design concurrent code to handle gracefully shutting down and error handling, rather than trying to come up with a "one-size can't quite fit all" solution.

I agree with promoting better code design, but how would that be achieved without teaching the use of some kind of signalling mechanism? Whether dequeuing a sentinel, or polling a flag, the goal is to provide some update to the thread about the wider program state, right? They're variations of the same thing. The threading.Event case in particular is a fairly common approach, and often seen as a solution to many questions about graceful shutdowns. This would be a more convenient version of that.

maiden dune
# urban sandal when the best possible abstraction is worse than not abstracting it, it probably...

when the best possible abstraction is worse than not abstracting it, it probably shouldn't be abstracted

An explanation for why it's worse would be more helpful.

for long lived background threads, you probably should have a work queue or some other means for the background thread to communicate, and this becomes a viable means to also send "okay finish up" and handle that as appropriate to your application, without looping on that.

Unless, again, the thread isn't doing anything queue oriented, in which case forcing a queue into the mix would result in the same looped polling behaviour but now with even more overhead and complexity. And wouldn't any other means, short of injecting exceptions into the thread, also reduce to the same thing?

maiden dune
maiden dune
# quick snow What would be interesting as a "soft" way to stop a thread is `Thread.throw`. Ri...

that's cool, i like the idea of interruptions via injecting exceptions, though only if there's some explicit control over exactly where that could occur, like being able to catch a asyncio.CancelledError on an await in coroutines, or with generator.throw on a yield. maybe with a context manager? e.g.

from threading import allow_interruptions, InterruptionError

def loop():
  # do some stuff to set up...
  # so far, the code out here is guaranteed to not be interrupted by any injected exception

  while True:
    try:
      with allow_interruptions:
        # but the code inside here can be interrupted

        # do some stuff that blocks...
        info = queue.get()
    except InterruptionError:
      break
    else:
      # do uninterruptable stuff with dequeued info...

  # do clean up stuff
      
thread = Thread(target=loop)
thread.start()

# then later on:
thread.throw(CustomException) # raise a custom exception at the next interruption point

# or
thread.interrupt() # now equivalent to thread.throw(InterruptionError)

thread.join()
urban sandal
# maiden dune > This would be significantly worse in many real world code bases, and definitel...

you're adding a "way to do things" that encourages a specific way as the way to do it which leads to looping on the "should I stop" rather than it being driven by being told to stop. API design encourages code design. Sometimes, checking something like that might be the only way, but I would rather not encourage the worst way to check this with API design, there are already enough pitfalls in concurrency for API design to lead someone to thinking this is a good way to do it.

#

This would be better handled by a section in threading docs showing basic ways to handle thread shutdown if it's that common, and then letting people pick the one that fits what matches their needs best. (and building more on it from there based on their needs)

urban sandal
# maiden dune > when the best possible abstraction is worse than not abstracting it, it probab...

if the thread has no need to communicate already, it probably shouldn't be terminated abrupty as it has no way to communicate anything about the cancellation back. Unlike asyncio, which has builtin ways to still handle this in done_callback (if you're at low level cancellation), there's no equivalent in threading without already needing a means of communicating. Event driven code performs significantly better in concurrent systems than code that busy loops or polls, and the latter should be avoided when possible (yes, it isn't always possible)

maiden dune
# urban sandal you're adding a "way to do things" that encourages a specific way as the way to ...

you're adding a "way to do things" that encourages a specific way as the way to do it which leads to looping on the "should I stop" rather than it being driven by being told to stop. API design encourages code design. Sometimes, checking something like that might be the only way, but I would rather not encourage the worst way to check this with API design, there are already enough pitfalls in concurrency for API design to lead someone to thinking this is a good way to do it.

this argument could be made for literally any feature. by this logic nothing new should ever be added because someone somewhere might misunderstand how to use it or assume it's automatically better than every alternative in every case. that's not an issue with the feature itself, it's at most an issue with presentation, or just plain old user error. also not sure how this explains why it would be 'significantly worse for many real world code bases'; the flag poll method is already common for code that doesnt do any queue oriented work and a built-in flag would be a more convenient way of doing that.

This would be better handled by a section in threading docs showing basic ways to handle thread shutdown if it's that common, and then letting people pick the one that fits what matches their needs best. (and building more on it from there based on their needs)

there's nothing about this feature which would prevent the addition of such a section. a new feature isn't in competition or mutually exclusive with more docs.

maiden dune
# urban sandal if the thread has no need to communicate already, it probably shouldn't be termi...

if the thread has no need to communicate already, it probably shouldn't be terminated abrupty as it has no way to communicate anything about the cancellation back.

not sure i follow the logic here. if a thread already doesn't need to communicate overall, then not being able to communicate about a cancellation wont be an issue either, since it's already been established that there's no need to. also to be clear, this feature is about enabling a graceful termination, as in allowing the thread to do cleanup, etc. on its own terms. not abrupt abrupt as in killing a process.

Unlike asyncio, which has builtin ways to still handle this in done_callback (if you're at low level cancellation), there's no equivalent in threading without already needing a means of communicating.

if the problem is recovering information from threads after theyre done, thats a separate issue which has always been there with threads (though I suppose ThreadPoolExecutor addresses this to an extent with concurrent.futures.Future). dont see how making one method of interruption more convenient could make this worse.

in any case, something like this could suffice:

def loop():
    while not interruption_requested():
        # do stuff
    # clean up
    return 'cool result'

class ThreadWithResult(Thread):
    def __init__(self, target, args=None, kwargs=None):
        super().__init__()
        self.target = target
        self.args = args or ()
        self.kwargs = kwargs or {}
        self.result = None
        self.error = None
    def run(self):
        try:
            self.result = self.target(*self.args, **self.kwargs)

        except Exception as e:
            self.error = e

thread = ThreadWithResult(loop)
thread.start()
# ... later on
thread.interrupt()
thread.join()
if thread.error is None:
    # do things with result

(or alternatively add these attributes to the default Thread class to capture run's return value and uncaught exceptions)

#

if it specifically has to involve a callback, i suppose that could also be set with another method, or just as another attribute on the thread.

from threading import interruption_requested, interruption_callback, Thread

def loop():
    while not interruption_requested:

    # get callback 
    callback = interruption_callback()

    if callback:
        callback(...)

thread = Thread(target=loop)
thread.start()
# ... later on
thread.set_interruption_callback(print)
thread.interrupt()
thread.join()
urban sandal
#

The argument I made doesn't apply to all features, it's saying the base case of not adding it is a better state than the API provided in the standard library encouraging the worst way to do it, especially when you can already do what you want there yourself without it being at the language level. Sometimes abstractions aren't necessary, and the social impacts of them are negative.

maiden dune
# urban sandal if the thread has no need to communicate already, it probably shouldn't be termi...

Event driven code performs significantly better in concurrent systems than code that busy loops or polls, and the latter should be avoided when possible (yes, it isn't always possible)

Far as I know, it's possible to have event driven code that involves busy waiting or polling; these aren't mutually exclusive concepts. Also I feel like characterizing this kind of loop as a busy wait would be inaccurate since there would be work being done between each poll. It wouldn't be like some kind of spinlock, just sitting there repeatedly checking for an interruption doing nothing else. I think maybe the use of sleep in original example loop, intended as a placeholder for work being done, might have given a misleading impression.

paper echo
spark magnet
urban sandal
#

They could be holding external locks too like a named semaphore but the problem remains the same whether you consider the resources being held as internal or external, killing a thread instead of communicating to close it prevents any neccessary cleanup and releasing of resources to happen. The same is not true with killing a process, as they recieve a signal from the OS.

charred fulcrum
#

An in-process lock may not be external to the system, but it is still a resource that is external to the current thread.

spark magnet
#

i guess i would say, "clean up held resources, especially locks". Internal vs external is vague and irrelevant.

paper echo
spark magnet
#

now anything else needing the lock is deadlocked.

paper echo
#

right, but i can also deadlock my app in 100 other ways

#

actually i manage to deadlock my scripts almost guaranteed every time any time i have to write while (item := queue.get()) is not None: ...

#

with asyncio there's a workaround:

while True:
    queue_get_task = asyncio.create_task(queue.get())
    shutdown_signal_task = asyncio.create_task(shutdown_event.wait())
    tasks = (shutdown_signal_task, queue_get_task)
    done, _ = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
    if queue_get_task in done:
        ...
    if shutdown_signal_task in done:
        break   

but i don't know of any equivalent with threads, other than "more threads" which seems kind of.. sketchy? bad?

with concurrent.futures.ThreadPoolExecutor(2) as local_executor:
    while True:
        queue_get_fut = local_executor.submit(queue.get)
        shutdown_signal_fut = local_executor.submit(shutdown_event.wait)
        futures = (shutdown_signal_fut, queue_get_fut)
        done, _ = concurrent.futures.wait(futures, return_when=concurrent.futures.FIRST_COMPLETED)

maybe this is just me being bad at software design, but i feel like this is not exactly a great experience for people who need to do a bunch of i/o and don't have an async-ready library available for that purpose

#

another option was floated in #async-and-concurrency when i last brought this up, which was to create my external resource handle in the main thread, but never actually use it there, and keep passing it off to run in worker threads, which is also kind of scary because such handles are often not even close to thread-safe

raven ridge
# paper echo right, but a thread holding an _internal_ lock (i.e. one that i deliberately cre...

you've conflated two different things here. Not every mutex your thread might be holding is one that you deliberately created inside your application. There's one inside of the libc stdio objects used by CPython to print to stdout, for instance. There's almost certainly one inside the libc malloc function that CPython uses to allocate memory. There's one in the C code for initializing a block scoped static variable, and possibly one for initializing a thread-local variable inside an extension module. And those are just examples of mutexes in libc / libpthread. There's also mutexes inside of, for instance, the logging module, so if your thread was in the middle of logging when you killed it, it could die holding a mutex that could cause a deadlock on any future attempt to log anything from any thread.

#

in other words, "process-local mutexes are an internal resource rather than an external one" does not imply "internal resources were deliberately created by me in my own application"

flat gazelle
#

Another interesting thing to keep in mind is that there are synchronization scenarios where releasing a lock on exit will also cause a bug.

grand rain
#

I mean

dusk comet
#

yield from () also doesn't use yield under the hood, because tuple iterator is not a generator

#

yielding from if False: yield function will also do the thing

grand rain
#

I mean that actually executes

#

but returns stop instantly, and looks same in asyncio

uncut ridge
#

and await is maybe await maybe block

uncut ridge
static hinge
#

yield from is also how you access the return value of a generator.

#

very niche feature

#

at least since async/await came along

grand rain
#

well, await is actually a synonym for yield from

#

This is more about coming of async in python in general

feral island
#

They can be used in a similar way semantically, but they do not do the same thing under the hood

dusk comet
#

i guess they do similar thing
historically, yield from was used instead of await. Then await was introduced and it replaced yield from usage for async functions
there is still a function in stdlib that converts generator to async function

grand rain
#

also, __iter__ in futures binded to __await__

static hinge
#

yield from is also a way you can implement __await__

#

that's right, remove your ๐Ÿ‘Ž reaction

uncut ridge
#

replaced for async functions
not in async functions

static hinge
#

lol.

cyan raven
uncut ridge
feral island
#

Seems like people are mostly in agreement but not always using the most precise language

feral island
uncut ridge
#

we need to go back to 09-Apr-2015

#

damn

#

ModuleNotFoundError: No module named 'timetravel'

grand rain
#

man

neat delta
#

!rule 9 6 perhaps you should re-read the channel description. this channel is about python internals, which your message isn't. and in case you're wandering, nowhere in this entire server do we allow resumes

fallen slateBOT
#

6. Do not post unapproved advertising.

9. Do not offer or ask for paid work of any kind.

raven ridge
#

!warn 1129321197448986636 Please don't attempt to solicit work here, per rules 6 and 9.

fallen slateBOT
#

:incoming_envelope: :ok_hand: applied warning to @gaunt sleet.

raven ridge
#

I've deleted your message accordingly

soft drum
#
lis = [1,2,3,4,5,6]
print(lis[5:65])

Why does this not result in an error?

dusk comet
#

because builtin sequences ignore index errors in case of slicing
otherwise it would be VERY annoying

#

imagine x[:5] erroring because there are less than 5 elements

cyan raven
#

where is the source code of the python discourse site(discussion. python), or is that just a fork of the original one?

soft drum
dusk comet
#

golang is not python

deep jolt
cyan raven
#

https://discuss.python.org/t/official-list-of-core-developers/924/4
Any thoughts on this, I'm not sure if this git repository was created.

spark magnet
cyan raven
# spark magnet what git repo are you asking about?

I mean, the person who wrote the post mentioned there's no official list of core developers and asked a bunch of other stuff. I'm kinda curious if anything's changed since then or what's up with it now.

unkempt rock
#

Is there a PEP for implementation of package managers?

cyan raven
#

pep about pip?

unkempt rock
safe basalt
#

!pep 518

fallen slateBOT
safe basalt
#

no wait

#

!pep 517

fallen slateBOT
dusk comet
#

there are some packaging-related peps, iirc

safe basalt
#

There are several

dusk comet
#

there is a lot of peps with "metadata" in their names

unkempt rock
dusk comet
#

"build-system independent format"

#

it is the universal format for describing building process, i guess

safe basalt
#

not like a setup.py

#

!pep 621 provides the speficiation for a pyproject.toml

fallen slateBOT
dusk comet
#

then maybe setup.cfg?

safe basalt
#

no

dusk comet
#

ok ๐Ÿ‘

#

so pyproject.toml is the only build-system independent format

safe basalt
#

PEP 517 creates a specification for how to turn a pyproject.toml into a correct package

safe basalt
#

They're still faaar too prevalant to actually do that, but the talk is there

#

And not all pyproject.tomls are created equal, either.
Poetry uses the same file name, and it even looks almost the same, but they use their own structure and their own process that is not standards compliant

cyan raven
fallen slateBOT
#

Cython/Compiler/Symtab.py line 2175

if name == "classmethod":```
cyan raven
#

build-backend = 'setuptools.build_meta'

alpine rose
cyan raven
merry bramble
#

And most important discussions take place in public

cyan raven
merry bramble
#

The aim is of course that all important discussions should take place in public (either on GitHub or at discuss.python.org) โ€” it's open source, after all. But occasionally there are things where a quick back-and-forth on an instant-messaging platform is really useful

cyan raven
merry bramble
cyan raven
merry bramble
#

no

merry bramble
cyan raven
merry bramble
#

yes

cyan raven
alpine rose
#

no, why would it?

cyan raven
# alpine rose no, why would it?

well, I'm thinking about making more contributions to CPython and if I have enough experience I might try being a core developer. I wasn't sure whether my age would fit(this is why I asked).

spark magnet
merry bramble
#

Core developers need to be people who have demonstrated commitment to the project, people we're confident will work well as part of the team and people whose judgement we have confidence in. But there's certainly no age requirement

spark magnet
#

or time commitment.

upper timber
#

Hi, is this a good place to ask about pypy?

#

I couldn't find any pypy related discussion in python-help forum (or maybe it's just that discord is autocorrecting it to pypi ? hmm)

upper timber
#

I was wondering what would be the best way of studying machine codes emitted by pypy JIT.

It seems like my options are vmprof (which functions as both profiler and JIT log visualizer) and jitviewer. Am I missing anything?

I was just being wary because vmprof.com is down and jitviewer was not updated for few years.

#

I want some Godbolt-esque tool that I could use to study how pypy is responding to my attempt at optimizing my code

cyan raven
unkempt rock
#

are each pep styles attributed to different python versions and does this mean each has linguistic differences or syntax variety

cyan raven
#

how is pypy getting on with the latest features, is it still up-to-date?

spark magnet
cyan raven
merry bramble
cyan raven
spark magnet
upper timber
#

An impression I got is that many previous core developers are no longer active but bug fixes are still being resolved and all. Wasnโ€™t there HN thread recently where large amount of people came out of woodwork and explained how they are deploying pypy at work?

static hinge
#

||totally not because of the famous site ending in 621||

cyan raven
cyan raven
#

Could someone link me to the code where the self is being passed as the first argument to the methods under the hood?

feral island
#

The other level is that in practice, as an optimization, we usually bypass that and the bytecode calls the function object directly with the extra argument added

fallen slateBOT
#

Objects/funcobject.c line 962

func_descr_get(PyObject *func, PyObject *obj, PyObject *type)```
`Objects/classobject.c` line 108
```c
PyMethod_New(PyObject *func, PyObject *self)```
`Objects/classobject.c` line 43
```c
method_vectorcall(PyObject *method, PyObject *const *args,```
feral island
#

and there you can see some code like newargs[0] = self

fallen slateBOT
#

Python/bytecodes.c line 3374

inst(CALL_METHOD_DESCRIPTOR_O, (unused/1, unused/2, callable, self_or_null, args[oparg] -- res)) {```
cyan raven
# feral island there's two levels of that. One is that FunctionType has a `__get__` implementat...

I wonder if I could implement something in pure Python that passes the self in using descriptor terminology.
like this for classmethod.

import functools

class ClassMethod:
    "Emulate PyClassMethod_Type() in Objects/funcobject.c"

    def __init__(self, f):
        self.f = f
        functools.update_wrapper(self, f)

    def __get__(self, obj, cls=None):
        if cls is None:
            cls = type(obj)
        if hasattr(type(self.f), '__get__'):
            # This code path was added in Python 3.9
            # and was deprecated in Python 3.11.
            return self.f.__get__(cls, cls)
        return MethodType(self.f, cls)
feral island
merry bramble
lavish leaf
#

hello

#

4 years ago i failed in senior secondary

#

can you tell me some valuable certifications that would make companies to overlook my gap and failure and still hire me

fervent pawn
#

@fallen slate source reminder

fallen slateBOT
#
Command: remind

Commands for managing your reminders.

Source Code
inland halo
#

: guys is it necessary to build a team for online hackthons and competitions like machine learning projects on kaggle

wanton aspen
#

@feral island
I know its a really stupid question, but i got nervous a little bit, can you produce yourself?

feral island
wanton aspen
feral island
wanton aspen
cyan raven
#

is it common to have peps accepted but still unimplemented?
or its being implemented as accepted immediately?

feral island
cyan raven
feral island
cyan raven
#

yes just checked out the current state of pep 649:

static hinge
#

I once saw a PR for a pep before it was officially submitted. The PR was done by guido btw

cyan raven
#

what is the hash algorithm that Python is using? Like the maths formula.

peak spoke
#

hash for what?

cyan raven
peak spoke
#

different types implement it differently

steel solstice
#

default hash?

cyan raven
steel solstice
#

no lol

cyan raven
# steel solstice no lol

not sure what default hash means in this context.

 attr_tuple = tuple(getattr(self, attr) for attr in type(self).__slots__)
        return hash(attr_tuple)
#

hash(...)

grave jolt
#

as Gobot said, there's no single algorithm. Everyone is free to implement their own hash

cyan raven
steel solstice
#

i cant find it but it includes the id and stuff

feral island
#

object.__hash__ is mostly the same as id(), right?

cyan raven
feral island
fallen slateBOT
#

Python/bltinmodule.c line 1600

static PyObject *```
feral island
#

!e ```
o = object()
print(id(o))
print(hash(o))

fallen slateBOT
#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 140255253185424
002 | 8765953324089
feral island
#

guess not!

flat gazelle
#

I think its just divided by 16

cyan raven
#

well, they are different.

flat gazelle
#

!e

o = object()
print(id(o))
print(hash(o) * 16)
fallen slateBOT
#

@flat gazelle :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 140145685562256
002 | 140145685562256
flat gazelle
#

(doesn't always work since the hash can end up negative - not sure what the circumstance would be, but I did just do it)

fallen slateBOT
#

Objects/object.c line 878

PyObject_Hash(PyObject *v)```
flat gazelle
#

There is only one (non security/digest-related) hash algo that is actually part of python rather than an implementation detail of CPython, and that is the numeric hash - https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types

raven lark
#

!res

fallen slateBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

static hinge
#

Here's a common hash function used in Java. ```java
int hash = 7;
hash = 31 * hash + (int) id;
hash = 31 * hash + (name == null ? 0 : name.hashCode());
hash = 31 * hash + (email == null ? 0 : email.hashCode());
return hash;

trim merlin
#

I'd like to propose implementing __instancecheck__ for types.GenericAlias and types.TypeAliasType. Has there already been a discussion about this before?

dusk comet
#

it is impossible to check that some list is an instance of list[int]

static hinge
#

It would be an O(n) operation

#
def is_int_list(value: Any) -> TypeGuard[list[int]]:
  return isinstance(value, list) and all(isinstance(item, int) for item in value)
dusk comet
static hinge
#

of course it wouldn't stop you from adding some non-int to the list at some other point.

#

and it would probably break if the list is empty

dusk comet
#

even adding ints to list[int] isn't always safe

static hinge
#

What if you had ```py
class IntList(list[int]): pass

dusk comet
#

then [0] is not an instance of IntList, so it is useless

static hinge
#

what about list[int] in cls.__orig_bases__?

dusk comet
trim merlin
#

but what's the problem with an O(n) impl in isinstance? why do you say it's literally impossible

feral island
trim merlin
#

this usecase is definitely not for static typing, even though it might help in type narrowing

misty oxide
#

I'm doing bytecode analysis (3.10), and I'm trying to figure out the number of kwargs args vs kwargs in a CALL_FUNCTION_KW. Is there any way to determine this statically, or do I need to know the runtime value of the kwargs tuple? Will cpython ever generate this instruction when the kwargs tuple is not static and easily inferrable?

flat gazelle
trim merlin
#

this makes sense for new type aliases as well:

class Item: ...
type items = list[Item]

...

isinstance(mylist, Items)
flat gazelle
#
def f(l, cb):
    if isinstance(l, list[int]):
        cb()
        for i in l:
            print(i + 2)
x = [1]
f(x, lambda: x.append('a'))
```I do not think it is all that sensible to have an each-element check sort of default as a type check. If you do need an each-element check, you should just use an each-element check, but it is not always correct (consider taking it as a ctor argument and using the field), and I would argue it is not a sane default.
static hinge
#

Maybe limit it to Sequence[int]

swift imp
static hinge
#

best to make it isinstance(x, Sequence) and (len(s) == 0 or all(isinstance(el, int) for el in x))

#

actually, all([]) => True

raven ridge
trim merlin
#

actually yeah, Iterable would be a problem as it could be exhaustive. hm.

#

point taken, O(1) thing also is a fair assumption as i haven't seen isinstance implementations ever do a for loop

dusk comet
#

isinstance is doing both loop and recursion (to iterate through all types in given type tuple, and it does recursion because tuples can be nested)

#
>>> isinstance(1, ((((), ()), str), str))
False
>>> isinstance(1, ())
False
trim merlin
#

it's a reasonable assumption that isinstance(x, T) will be constant time

dusk comet
#

how would you do isinstance(x, list[int]) ?
will it be equivalent to isinstance(x, list)?

urban sandal
# trim merlin I'd like to propose implementing `__instancecheck__` for `types.GenericAlias` an...

I'd rather __isinstance_check__ not exist at all :) (this won't happen for pragmatic reasons and backwards compatability, among a lot else)

I think rather than having this, it would have been better for a builtin to be added that can determine if things are structurally equivalent even if not nominally a subclass for runtime structural subtyping, but that ship has long sailed.

this usecase is definitely not for static typing, even though it might help in type narrowing
It won't. If you look at pytype, much stronger inference than this can already be done should a type checker choose to.

The runtime use of this can't be any better than exhausting the iterator (as was already said by others) and with such a cost attached, people are free to do it themselves, but I agree with others that hiding a cost in there for iterables when most people won't need it at runtime isn't ideal.

magic zodiac
#

Is it a good idea to join internships on LinkedIn from small tech companies provided for free with small projects to showgirl? Mostly these small tech companies are Indian
E.g
Meriskill
Info aid tech
Code samurai
Bharat intern
Etc

magic zodiac
#

There were already having a discussion, didn't wanted to interupt there in the middle of of it

misty oxide
#

The same question applies to later versions, and to CALL in 3.12+.

#

Is there any situation where the kwargs are not statically known, other than fn(**kwargs), which uses a different instruction?

ripe tinsel
#

Feature proposal: async import

I'm busy polishing code for production (desktop application) and I am using asynchronous functions to load some of the heavier libraries with minimal impact on startup time. Multithreading does not provide significant time advantages over async in my case, the main hurdle is the linear flow which has plenty of waiting time downstream. It occurs to me that this is probably a common problem, and a common solution might be useful for the greater community.

I propose an "async import" function that is a built-in function or part of the asyncio library. The syntax could be something like pd = async import pandas (equivalent to "import pandas as pd"), where "async" acts almost like a decorator, but defines and instantiates this function instead:

async def import_function(package): 
    import package as x
    return x
grave jolt
#

well, not exactly this function

spark magnet
#

when would you be able to use pandas after that?

grave jolt
ripe tinsel
#

When you return a value that was imported, it gets assigned to the namespace you chose.

grave jolt
#

I mean, you can do this: ```py
pd = await asyncio.to_thread(import, "pandas")

ripe tinsel
spark magnet
feral island
ripe tinsel
grave jolt
#

I suppose importing a package can be expensive. But is this expense from reading from the disk? Or is it from the CPU-bound work of just executing a lot of code?

#

In any case, I think the problem of UI taking too long to start is solved by running the UI in a separate thread

ripe tinsel
# feral island You aren't really making clear how your proposal is different from just `import ...

The software stops until the import statement returns a value. With a library like sentence_transformers this takes a second or two. With multiple libraries that take a second it adds up. Very few of these libraries are needed until the user actually does something.

My greatest pain in python is making the client wait for libraries to load. Async works really well for that, but @brittle mantle gave a great response so I will use that instead.

grave jolt
#

that's not a great response if you haven't measured what is slow in the importing

#

Is it disk I/O or is it compiling and running Python code (which is CPU-bound)?

#

If it's CPU-bound, then you have no choice but to wait for the import to finish. If you use threading you can at least interleave this CPU-bound work with the work in your UI thread

#

I guess to_thread kinda does this

#

The main problem with adding a whole language feature for this is: async/await is not coupled to a particular loop implementation, be it asyncio or trio. So how do you decide what to use for the I/O?

cyan raven
#

How does the dataclasses standard lib make sure that the fields are using type annotations?
Could someone link me to that part in the source code?

fallen slateBOT
#

Lib/dataclasses.py line 970

cls_annotations = inspect.get_annotations(cls)```
ripe tinsel
# grave jolt Is it disk I/O or is it compiling and running Python code (which is CPU-bound)?

In the context of a GUI-based app, the underlying code is often written in C (I think matplotlib is based on Matlab). Matplotlib specifically appears to run in its own process that interacts with the GUI package through a backend. In my specific case I want to delay imports until matplotlib does its thing (which is why multithreading doesn't differ much from async implementations).

Some imports have to happen linearly, but when an import can be delayed then there is actually very little literature on that. I've looked at lazy import implementations, but async seemed to work better.

I also have a case where I have a function which has an import in it, but the return value is cached. This solved a different problem though, since the client has to wait for data to process in anyway if he inputs new data. However, I mention it because it was another workaround to the long import problem.

But thanks for your advice, the code you suggested was exactly what I was looking for.

deft pagoda
#

I find this slightly annoying, possibly inconsistent that:

>>> tuple(range(3))
(0, 1, 2)

but

>>> from typing import NamedTuple
>>> class MyTuple(NamedTuple):
...     a: int
...     b: int
...     c: int
... 
>>> MyTuple(range(3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: MyTuple.__new__() missing 2 required positional arguments: 'b' and 'c'

I know this can be fixed with unpacking, e.g.,:

>>> MyTuple(*range(3))
MyTuple(a=0, b=1, c=2)

But it gets really ugly with generator expressions:

>>> MyTuple(i for i in range(3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: MyTuple.__new__() missing 2 required positional arguments: 'b' and 'c'
>>> MyTuple(*(i for i in range(3)))
MyTuple(a=0, b=1, c=2)

Is there some given reason why NamedTuples can't be constructed from iterables?

feral island
cyan raven
deft pagoda
#

since they changed initialization from tuple([1, 2, 3]) to MyTuple(1, 2, 3) i guess there's nothing to be done about it

deft pagoda
#

maybe they should add a .from_iterable class method

cyan raven
feral island
deft pagoda
#

because it's uglier, especially in front of generator expressions

#

prefer:

MyTuple.from_iterable(i for i in range(3))

over

MyTuple(*(i for i in range(3)))

itertools.chain has similar

ripe tinsel
grave jolt
#

but that would be hardcoding asyncio as the loop implementation to use

feral island
grave jolt
#

I mean, this could be considered, but that's a major change. For very little gain IMO

#

And yes, this exact line does the same as import pandas as pd

ripe tinsel
# feral island That line doesn't really do anything over `import pandas as pd`. You need to act...

According to this website, .to_thread() creates a coroutine that executes in a separate thread from the main thread when awaited (this syntax gives me an error which is why I use asyncio.run() to execute the coroutine.)

https://superfastpython.com/asyncio-to_thread/

You can run a blocking function in asyncio via the asyncio.to_thread() function. In this tutorial, you will discover how to execute blocking functions in new threads separate from the asyncio eventโ€ฆ

feral island
#

so the end result is basically the same: you are blocking waiting for the import to finish

#

the actual import happens in a separate thread, but I don't see how that helps you

grave jolt
#

Yeah, like asyncio.run(asyncio.sleep(5)) is exactly the same as time.sleep(5)

winged sphinx
# ripe tinsel A syntax like "async import" would be a convenience, but this works so it is alr...

I have a similar problem/workflow, with a 3-4 second import delay (total) after pruning and adding lazy imports where helpful (reviewed with -X importtime). This kills me on multiprocessing, where every process ends up with another 3-4 sec delay. Pandas is one of my worst cases. In my particular use case, there's a few points where this async solution actually would be helpful: where we're waiting on data (via async apis/requests), but before we need the rest of the stack (pandas, charting libraries, pyarrow, etc).

ripe tinsel
feral island
#

You might be interested in

#

!pep 690

fallen slateBOT
feral island
#

which was rejected, but discusses some of the relevant design space

grave jolt
#

!e

import asyncio

async def foo():
    asyncio.run(asyncio.sleep(1))

asyncio.run(foo())
fallen slateBOT
#

@grave jolt :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "/home/main.py", line 6, in <module>
003 |     asyncio.run(foo())
004 |   File "/lang/python/default/lib/python3.11/asyncio/runners.py", line 190, in run
005 |     return runner.run(main)
006 |            ^^^^^^^^^^^^^^^^
007 |   File "/lang/python/default/lib/python3.11/asyncio/runners.py", line 118, in run
008 |     return self._loop.run_until_complete(task)
009 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
010 |   File "/lang/python/default/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
011 |     return future.result()
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/Q3K53TJDACDHEHDU3AVADX6PYE

ripe tinsel
# grave jolt Yeah, like `asyncio.run(asyncio.sleep(5))` is exactly the same as `time.sleep(5)...

Using time.perf_counter(), I find the normal import is consistently slower by ~10% (0.13 sec) for pandas than the async. I don't think it is significant enough to rave about, but it isn't measuring the import time but rather how quickly the event loop moves on from the import statement.

I agree async is actually awaiting for the import, but the import is consitently ~0.1 seconds less than the 1.2-1.3 seconds the normal import statement takes.

grave jolt
#

How did you benchmark it?

ripe tinsel
feral island
#

I tried this ```import asyncio
import sys
import time

before = time.perf_counter()
if sys.argv[1] == "asyncio":
pd = asyncio.run(asyncio.to_thread(import, "pandas"))
else:
import pandas as pd
after = time.perf_counter()

print(f"imported pandas in {after - before:.2f} seconds")

grave jolt
#

I tried this, and I get

Sync: 2.1705752294510603
Async: 3.07330556132365
ripe tinsel
#

I think it is a fluke. Pandas seems to be faster but sentence_transformers takes longer. Definitely not loading async though

feral island
#

you really have to do it in a fresh process

grave jolt
#

big sad

feral island
#

I can definitely see the effect of pyc files though ```% python ~/py/tmp/importpandas.py sync
imported pandas in 1.76 seconds
% python ~/py/tmp/importpandas.py asyncio
imported pandas in 0.21 seconds
% python ~/py/tmp/importpandas.py sync
imported pandas in 0.18 seconds
% python ~/py/tmp/importpandas.py asyncio
imported pandas in 0.21 seconds

grave jolt
#

that means you have to create a new venv for each test run? ๐Ÿ˜ฌ

#

or well, clean the pyc files somehow

feral island
#

I feel like a benchmark with pyc files is more useful, though I guess it depends on your application

grave jolt
#

For me both versions take about 0.2 seconds

#

(after pyc)

peak spoke
ripe tinsel
# grave jolt I tried this, and I get ``` Sync: 2.1705752294510603 Async: 3.07330556132365 ```

I am now convinced that an async approach on average will always be more expensive than normal imports. That got me thinking about multithreading (to get past the blocking event). I modified your code as provided below and got the following results for N=10 000:

Normal: 57.36153389397077
Multithreaded: 55.91058199480176

and

Normal: 51.267108304426074
Multithreaded: 53.34445761004463

However, with N = 1000 I got

Normal: 13.458581696497276
Multithreaded: 4.948981299996376

and

Normal: 15.239287502830848
Multithreaded: 5.18649219837971

The discrepancy is beyond me, but the multithreading implementation is literally the simplest. I modified your code and included sentence_transformers as a second library because it also has a long import time:


import sys
import time
import threading


N = 1000

def sync_import():
    total = 0
    for _ in range(N):
        start = time.perf_counter()
        import pandas
        import sentence_transformers
        end = time.perf_counter()
        total += (end - start)
        sys.modules.pop("pandas")
        sys.modules.pop("sentence_transformers")
    print("Normal:", total)



def threaded_import():
    total = 0
    for _ in range(N):
        start = time.perf_counter()

        t1 = threading.Thread(target=__import__, args=("pandas",))
        t2 = threading.Thread(target=__import__, args=("sentence_transformers",))
 
        t1.start()
        t2.start()

        end = time.perf_counter()
        total += (end - start)

        t1.join()
        t2.join()
        try:
            sys.modules.pop("pandas")
            sys.modules.pop("sentence_transformers")
        except KeyError: pass
    print("Multithreaded:", total)

sync_import()
threaded_import()
feral island
#

your benchmark is measuring the time to start the thread, not the time to run the import

ripe tinsel
urban sandal
#

This seems extremely unlikely to help anything, presumably if you're importing it, you need these things to exist. What happens when something defined in the module relies on the import?

ripe tinsel
urban sandal
#

I don't see how that's the fault of current synchronous imports at all?

you can easily have your application entrypoint not import anything expensive until starting a seperate thread for the GUI

ripe tinsel
# grave jolt I tried this, and I get ``` Sync: 2.1705752294510603 Async: 3.07330556132365 ```

It just occurred to me that sys.modules.pop() does not work. It says I loaded sentence_transformers 1000 times in 13 seconds, but loading it once takes 6 seconds on a good day.

I must implement it using multiprocessing and a shared value for the counter, since that will ensure each instance actually imports the full package and not from memory. I'll do that tomorrow and let you know if anything changes. Otherwise great chat

urban sandal
#

Not to be too pessimistic here, but you should have multiple threads in any application taking user input anyhow, at least 2, as you don't want your GUI blocked on any of the work the GUI is driving.

#

synchronous imports then work just fine as long as the GUI is tossed into a thread, before all the import statements (which is perfectly possible by doing something like:)

    ...
    queue_pair = ...
    some_handle = start_gui_thread(...)
    import ...  # leave a comment here about it not being a module level import due to impact on gui
ripe tinsel
# urban sandal synchronous imports then work just fine as long as the GUI is tossed into a thre...

Also, when a user is not likely to use modules every time you can return the module in a function attached to a lru cache. That way it only loads once if needed (cache is redundant but useful to limit memory use).

In general, putting imports inside functions has the most impact on reducing waiting times for clients. Caches are useful to avoid reloading if one import is in several functions (and it is too expensive to load if not needed)

urban sandal
#

importing in a function already defers to sys.modules just like module level imports, it just changes when that gets executed

ripe tinsel
feral island
ripe tinsel
#

In light of our discussion yesterday, I would like to redefine my feature proposal:

I propose a feature for the Python versions being developed without a GIL that allows a user to specify that an import must run in the background as a parallel thread. This will still work with the GIL, but less efficiently.

The aim is not for a lazy import, but just to prevent imports from being blocking events on the start up of a program.

The interest group for whom I suggest this is desktop developers, where waiting times depends on client hardware and any optimization leads to a better product.

The syntax I suggest is something similar to split import package, which tells the compiler to perform the import in a separate thread and assign the resulting object to a namespace in the main thread with the same name as the package.

quick snow
ripe tinsel
#

It is a common case that software has to wait for user input before doing most of its computations. Those imports aren't needed immediately, but if the import is lazy you still have to wait later.

quick snow
ripe tinsel
#

You can use an alias in the separate thread and only assign the module to the package name upon completion, so the name won't be in the name space in your case, causing a NameError.

#

A developer can then catch that error with a retry loop to wait out the import

#

Alternatively, the compiler can implement all imports like this, with a built-in retry loop that waits for the import. I'm not proposing that, just speculating about an alternative implementation as food for thought.

ripe tinsel
#

I have a working prototype of a system for parallel imports. It works incredibly well, thank you to everyone who contributed. A join statement is used to ensure the package has been loaded. The results are crazy:

Import statement pass time: 0.0024886999744921923
Import statement execution time: 7.9670460999477655
Name: sentence_transformers

and here is the code:

import threading
import time

class SplitImport(threading.Thread):
    def __init__(self, name=None, Verbose=None) -> None:
        args = (name,)
        self.target = __import__
        threading.Thread.__init__(self, None, self.target, name, args, {})
        self._return = None

    def run(self):
        if self.target is not None:
            self._return = self._target(*self._args)
    
    def join(self, *args):
        threading.Thread.join(self, *args)
        return self._return


# Timing statements
t = time.perf_counter()

split_import = SplitImport("sentence_transformers")
split_import.start()
print("Import statement pass time:", time.perf_counter()-t)

sentence_transformers = split_import.join()
print("Import statement execution time:", time.perf_counter()-t)

print("Name:", sentence_transformers.__name__)
dusk comet
#

you are still measuring thread creation time, so comparing it to normal import doesn't make sense

ripe tinsel
#

I can't let the user wait 7 seconds on startup, and I can't let them wait the first time they perform a query. A background import is the ideal solution. And the usecase is general for GUI developers.

#

That 7 second is 12 seconds on my smaller laptop

dusk comet
#

loading modules in different threads happens not that often to invent new syntax for it

ripe tinsel
#

Perhaps, but Python has a reputation for being slow and a speed up like this can really boost the Python GUI development community. Imports that block the event loop consume time that is not always justified by the use case of the software.

Certain features could be described as redundant, but they add to the flavour by affecting how a product is used.

For me, I get the full functionality of my software, but my startup time has almost disappeared. I'm actually concerned the splash screen disappears too fast.

This is not an arbitrary thing from the perspective of a desktop developer. I am genuinely excited about reducing the lag in my software, because speed impacts the client's opinion of the software.

People who use Python in other ways might not need the feature, but running the prototype I provided above is tedious and repetitive for multiple imports.

So does the language need the feature? No. But would the feature make Python feel faster for the user in certain use cases? Certainly.

dusk comet
#

a lot of features were rejected because benefits from them were not very big or part of community affected by the feature was too small

ripe tinsel
#

Don't get me wrong. I proposed a feature, we had a discussion, it resulted in a genuine solution for me, so I used that as a basis for a better proposal. Whether that proposal is accepted or rejected is not my concern.

I shared my code as a proof of concept. In the case of the sentence_transformers library (which imports Pytorch, a ~800 MB C library), I only need to wait 0.002 seconds (instead of 7.96 seconds, a ~4000 times improvement).

As such, I also showed that this approach is useful in bringing AI-based applications to the user, on-device.

peak spoke
#

I think there is a rejected lazy import pep that partially addressed this

ripe tinsel
#

Lazy import makes you wait later rather than waiting now. That is not a solution in my opinion

peak spoke
#

The later waiting can be in a different thread if you use it in one without having to explicitly take care of it

urban sandal
#

and you're adding a lot of machinery to thread imports instead of just what was suggested here #internals-and-peps message on launching the GUI in a thread prior to expensive imports on the main thread, and I didn't see any added details that explained why this wouldn't work here.

ripe tinsel
urban sandal
#

there are a lot of theoretical issues that people with experience with concurrency could point to with what you're wanting out of this. I was trying to stick to demonstrable issues that arise from it along with something easy and which behaves correctly* that you can do now to get what you want (a responsive GUI that isn't waiting on your expensive imports)

* This is somewhat limited by "are your libraries also doing the right things?"

spark magnet
#

@ripe tinsel i feel like we keep asking, "How do we know when the async import is done so that we can use it?" and you haven't given an answer.

cyan raven
ripe tinsel
# spark magnet <@702473287485096046> i feel like we keep asking, "How do we know when the async...

Sorry, I thought my answer is clear. Whenever you use any module incorrectly, an error is raised and every developer has to be aware of these errors and how to catch them.

Nevertheless, I proposed maintaining a list of pending imports and removing names from the list if the import is complete. Then, when the interpreter encounters a NameError it can check if the name is in the list and halt the event loop in 0.1 second cycles until the name disappears from the list. If the name is not on the list then it raises the error like normal.

Alternatively, calling "split import" can create an intermediary object with the name of the package and a method called join() which returns "True" if the module is loaded (insert it after loading the module and assigning to the namespace) or otherwise acts like the normal Thread.join() function if the name is assigned to the intermediary object and not the loaded module, but also returns True when finished.

Either way, if the module is not finished loading and a function from the module is called, that will result in an error which can be handled like any other error.

spark magnet
# ripe tinsel Sorry, I thought my answer is clear. Whenever you use any module incorrectly, an...

Definitely if you are going to pursue this idea, you want a way to use the module with a guarantee of no errors. "Every developer has to be aware of these errors and how to catch them" isn't enough. I'm importing the module because I want to use it. A race condition during an async import isn't something to catch, it's something to prevent. A join operation somehow is the way to do that.

feral island
#

A useful variant could be that the import statement returns a special object so that if you access an attribute on it, it blocks until the import is complete

#

That

#

is something you could write today, no language change needed

dusk comet
#

you can even replace __import__ to do that, and i believe not much will break

#

except for imports with side-effects

feral island
#

well you open yourself up to fun threading bugs, like the issue @urban sandal linked above ๐Ÿ™‚

dusk comet
cyan raven
#

Does someone know in what pep slots were introduced?

spark magnet