#internals-and-peps
1 messages ยท Page 18 of 1
args = {'a': 1, 0: 98, 1: 99, 'b': 2}
def f(*args, **kwargs): return args, kwargs
``` if you pass `**args` to this function, positional args will be pulled out into `*args`, they will not be in `**kwargs`
right, so we might need triple-star
hm, yeah - my initial reaction is that it ought to raise TypeError when unpacking a dict containing positional args for a call to a function that doesn't accept positional args, but you're right that this would mean that wrapper functions would need to keep accepting *args.
maybe triple star is a good workaround for that... And reasonably intuitive, I guess
though if we had a triple star, I think we ought to have it both at the call site and the parameter list
definitely
***bargs = both args and kwargs 
oh god imagine saying star star star bargs outloud in a class
this seems like a reasonably good idea to me. I've wished for something like this occasionally. You can usually just pass all parameters by keyword, which isn't so awkward - but that doesn't work if the function has positional-only arguments, and then you are stuck building up both a sequence and a mapping for the positional vs keyword arguments
actually, though - if that's the only major use case for this - using one mapping to provide all the arguments to a function that uses positional-only arguments - maybe we don't need new syntax at all. Maybe we just need something new in functools
def f(***bargs):
print(bargs)
f("a", 123, flag=True, other_flag=False)
{0: "a", 1: 123, "flag": True, "other_flag": False}
?
yeah, that's what I'd expect
can't wait to see this in python 3.17 
let me know when the pull request is ready to review ๐
def f(a): ...
f(***{"a": "abc", 0: "def"})
what happens here?
an error, just like if you did f(*["def"], **{"a": "abc"})
i like that @raven ridge is writing the PEP for me!
def f(a, **kwargs): ...
f(***{0: "def", "a": "abc"})
how about this?
would you get "a" in kwargs?
a = "def", kwargs = {"a": "abc"}
would think as such
now for the kicker
def f(a, **kwargs): ...
f(***{"a": "abc", 0: "def"})
(order is swapped)
that's the same result
so first you must filter out all int keys and sort them I guess
Maybe something in functools like ```py
def apply_mixed_args(func, mapping):
pos = {}
kwargs = {}
for key, val in mapping.items():
if isinstance(key, int):
pos[key] = val
else:
kwargs[key] = val
args = []
try:
for i in range(len(pos)):
args.append(pos[i])
except KeyError:
raise TypeError("Arguments to pass positionally must be contiguous integers beginning with 0")
return func(*args, **kwargs)
I've wanted this rarely enough that solving it with a new stdlib function seems better to me than solving it with new syntax
it does seem like a very niche problem for new syntax yeah. matmul operator is cowering
naming that function might be the toughest part, heh
apply_bargs 
barge_into_function
heh
apply_neds_bats
kool_aid
the idea of a new stdlib function sidesteps a lot of the issues and ambiguities we mentioned above, too. There's no question of what happens if you do ```py
def foo(**kwargs):
pass
apply_mixed_args(foo, {0: 42})
``` It's an error, because you pass positional args to a function that doesn't take them. There's no question of what happens if you do foo(*args, **kwargs, ***bargs) in one function call, because - well, you can't.
and I suspect that selling people on triple star would be considerably harder than selling them on an enhanced double star, but you're right - the enhanced double star couldn't be transparently proxied, so callables that currently accept *args and **kwargs would need to keep doing so in the future even if f(**{0: 42}) could be used at the call site.
there's another nasty case that happens if you allow two-star f(**{0: 42}) actually: the implementation would need to detect f(**{0: 42, 1: 43}, **{0: 10}) and raise a TypeError for that as well
I guess my opinion is that triple star is a bad idea (too magical for too niche a feature, especially for something that's possible today and just inconvenient). I think a new stdlib function would be enough, but if we did any new syntax, I don't think it should be more than just allowing integer keys in mappings unpacked with ** in a function call, without changing the behavior for **kwargs parameters in a function at all - they'd still only receive the keyword arguments, and you'd still need to use *args to receive positional arguments.
yeah, i was literally thinking about that ๐
even simpler example: ```py
def f(a): ...
...f(0, a=0)
โญโโโ Traceback (most recent call last) โโโโโฎ
โ in <module> โ
โ โญโ locals โโฎ โ
โ โ f = f โ โ
โ โฐโโโโโโโโโโโฏ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
TypeError: f() got multiple values for
argument 'a'
def _(a, b=0, /, c=1, *args, *, d=2, **e, ***f): ...
syntax hell
Where is os.truncate/os.ftruncate implemented for linux? I traced it to https://github.com/python/cpython/blob/042aa88bcc6541cb8b312f1119452f7a58a5b4df/Modules/clinic/posixmodule.c.h#L8067 but now I'm lost
Modules/clinic/posixmodule.c.h line 8067
os_ftruncate_impl(PyObject *module, int fd, Py_off_t length);```
Modules/posixmodule.c line 11784
os_ftruncate_impl(PyObject *module, int fd, Py_off_t length)```
Thanks. GitHub search did not pick that up for some reason
I wonder why its docs say
Truncate the file corresponding to file descriptor fd, so that it is at most length bytes in size
When the linux manual says
The truncate() and ftruncate() functions cause the regular file
named by path or referenced by fd to be truncated to a size of
precisely length bytes.
My guess is the Python docs are trying to be more general, and for some platforms they cannot make the stronger guarantee of "precisely length bytes". Python doesn't seem to do anything special with the length when it calls ftruncate.
Or maybe I am misinterpreting the way it's worded
POSIX says:
If fildes refers to a regular file, the ftruncate() function shall cause the size of the file to be truncated to length. If the size of the file previously exceeded length, the extra data shall no longer be available to reads on the file. If the file previously was smaller than this size, ftruncate() shall increase the size of the file. If the file size is increased, the extended area shall appear as if it were zero-filled. The value of the seek pointer shall not be modified by a call to ftruncate().
Old versions instead said:
If the file previously was smaller than this size, ftruncate() shall either increase the size of the file or fail. [XSI] [Option Start] XSI-conformant systems shall increase the size of the file. [Option End]
but even that doesn't allow for ftruncate to succeed without setting the size of the file to exactly the given length. So... ๐คทโโ๏ธ
Thanks. Yeah, that is confusing. I will just trust the linux manual on this.
It's the behaviour I observed in practical tests anyway
the opposite of cringe
so it's a good thing?
yes!
it always strikes me as the opposite. I'll have to try to remember ๐
https://bugs.python.org/issue21644
here people are talking about using calloc in bytearray.__init__
They mention it makes initialization faster. but it looks like it makes it MUCH faster, at least on my Windows 10.
static int
bytearray___init___impl(PyByteArrayObject *self, PyObject *arg,
const char *encoding, const char *errors)
/*[clinic end generated code: output=4ce1304649c2f8b3 input=1141a7122eefd7b9]*/
{
void *sval;
Py_ssize_t count;
PyObject *it;
PyObject *(*iternext)(PyObject *);
//
// existing code
//
/* Is it an int? */
if (_PyIndex_Check(arg)) {
count = PyNumber_AsSsize_t(arg, PyExc_OverflowError);
if (count == -1 && PyErr_Occurred()) {
if (!PyErr_ExceptionMatches(PyExc_TypeError))
return -1;
PyErr_Clear(); /* fall through */
}
else {
if (count < 0) {
PyErr_SetString(PyExc_ValueError, "negative count");
return -1;
}
if (count > 0) {
if (self->ob_alloc == 0) { // new bytearray
if (!_canresize(self))
return -1;
// remember to avoid overflow by using size_t. see issue #22335.
sval = PyObject_Calloc((size_t)count + 1, 1); // + 1 for null terminator
if (sval == NULL) {
PyErr_NoMemory();
return -1;
}
self->ob_bytes = self->ob_start = sval;
Py_SET_SIZE(self, count);
self->ob_alloc = (size_t)count + 1;
return 0;
}
if (PyByteArray_Resize((PyObject *)self, count))
return -1;
memset(PyByteArray_AS_STRING(self), 0, count);
}
return 0;
}
}
here's how I timed this change:
from timeit import timeit
setup = """
def f(n):
b = bytearray(n)
return b
"""
for n in range(12):
print(timeit(stmt=f"f({n**10})", setup=setup, number=1000))
and here are the timing results:
times using calloc:
0.00034580007195472717
0.00039679999463260174
0.0008060999680310488
0.0025179999647662044
0.011856100056320429
0.011817399994470179
0.013045100029557943
0.016800999990664423
0.03710949991364032
0.04992749996017665
0.1916151000186801
0.5652574999257922
times without using calloc:
0.00024830002803355455
0.00025889999233186245
0.000635799951851368
0.0014845000114291906
0.2839431999018416
2.6696265999926254
15.101725699962117
74.59119629999623
I gave up here
someone should update the __init__ method so it does uses calloc.
They mention a bug with the other person's change not detecting an existing memoryview, but that's solved easily by just checking for it using _canresize
i think i had this done once in a PR
maybe just a local change
Maybe it's because it's a little faster with smaller sizes
this seems to be only worth it when the size allocated exceeds 1 MB
Yeah
and there's not many cases where someone needs to allocate 1 MB with bytearray()... right?
Are there a lot of cases where they need to allocate a large number of small bytearray?
Either way, allocating a large one is still a one-time thing, so I guess it shouldn't matter either way
do/while will always run at least once because the condition is after the block.
it's kind of like shutes and ladders
I'm writing a Bytecode -> Bytecode transform, and trying to support nonlocal variables.
My idea was to replace all freevar LOAD_DEREF instructions with a chain of instructions that does fn.__closure__[instr.arg - len(code.co_cellvars)].cell_contents, where fn is the python object of the currently executing function, and code is the code object being translated. Then, I turn all cellvars into local variables.
The replacing is actually sound. It's what the cpython does under the hood, so I'm just replacing one instruction with many.
What I'm worried about is what happens if I decide to empty the list of cellvars and freevars. Am I allowed to do that? Will something in python function creation or execution go wrong if I don't leave a proper trail of freevars and cellvars?
you probably need to make sure that the relevant fields on the code object match reality
Probably
But, even if there are no STORE_DEREF/LOAD_DEREF instructions in the bytecode?
A related follow-up question that I just thought of. In python 3.11+, is it okay if there's a nonlocal variable and a local variable with the same name?
it isn't possible. if you generate your own bytecode and code objects you can probably make it work, but it will be fragile
๐
Is there some C memcpy equivalent for Python's bytearrays?
buffer = bytearray(16)
data = b'1234'
buffer[:8] = data # len is now 16 - 4 = 12, it should be still 16
that code is replacing the last 8 characters with data which is only 4 bytes, you need to adjust your slice
!e ```py
data = b'1234'
what you did:
buffer = bytearray(16)
buffer[:8] = data
print('replace first 8 with data:', buffer)
replace first 4
buffer = bytearray(16)
buffer[:4] = data
print('replace first 4 with data:', buffer)
replace last 4 of first half
buffer = bytearray(16)
buffer[4:8] = data
print('replace last 4 of first half with data:', buffer)```
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | replace first 8 with data: bytearray(b'1234\x00\x00\x00\x00\x00\x00\x00\x00')
002 | replace first 4 with data: bytearray(b'1234\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
003 | replace last 4 of first half with data: bytearray(b'\x00\x00\x00\x001234\x00\x00\x00\x00\x00\x00\x00\x00')
how so?
Like pandas has their doc decorators that transforms and appends docstrings between various methods and functions. Can instead define types that are annotated with a doc and bake it into the type annotation
pandas/util/_decorators.py line 408
class Substitution:```
pandas/util/_decorators.py line 455
class Appender:```
Editor developers (VS Code and PyCharm) have shown some interest, while showing concerns about the verbosity of the proposal, although not about the implementation (which is what would affect them the most). And they have shown they would consider adding support for this if it were to become an official standard. In that case, they would only need to add support for rendering, as support for editing, which is normally non-existing for other standards, is already there, as they already support editing standard Python syntax.
What does it mean "support for rendering"? Editors already can render code
ah yes, so I need to align the length
buffer[:8] = data + b'\x00' * 4
I want to keep the buffer with the same length, but here this length alignment seem to be a waste
you can just write only the first 4 instead of the first 8 with the slice
buffer[:4] = data
then you don't need to align it
rendering parameter documentation when showing a tooltip for a function
Ah, ok
I thought of the annotated docstring a while back and the editor support has been the main thing that came to mind, the code would look a bit too busy imo with something longer on the docstring and if the editor couldn't collapse it
though the doc call looks a bit weird when everything else you'd see in annotations uses brackets instead of parentheses
ah yes you are right haha, thanks
I have a feature proposal to improve python as a language: an asyncronous/multithreaded for loop.
It occurred to me that the majority of the "for item in list" loops in the code I have optimised are processes that can run independently. This appears to be the general case with for loops, but not with while loops. Therefore, it would be convenient for a user to have a build-in keyword like mfor (multi-threaded for) or afor (async for). Alternative syntax could be something like "for item in list.async()". Await() and join() functions will be necessary for these loops.
This will be convenient for developers and beginners alike, and should allow users to speed up loops in many instances with minimal code.
Also, if the compiler notices that only mathematical operations are happening inside a loop, you can have the compiler send it to the GPU is CUDA is available
Unfortunately, due to the GIL, this is mostly useless. Unless the for loop is doing IO, splitting it across threads will do just about nothing, even if each iteration is independent.
This is something best left to the .map methods on threadpools etc.
This particular case is better served by existing libraries like Numba and CuPy with explicit opt-in in hot spots (where initial costs can be paid at import rather than at observation by the interpreter.) initial warmup for code like this can cause unexpected performance losses if it were to happen in a short lived application.
there's also Executor in concurrent.futures
Anyon knows book Begaining python how is it?
can I count on you?
there will not be a GIL in the near future
Sam Gross rather
I don't think we have a sticker of him
a mfor keyword for multi-threaded for loops still sounds like syntactical sugar for sending stuff to be ran by a threadpool, which in turn means there'd have to be some trickery where the following indented block of code is secretly a function - it all sounds a little messy
it introduces a new scope
Didn't we just eliminate a scope for list comprehension?
one step forward, 2 steps back
we also added one for type parameters ๐
and listcomps still have their own scope, it's mostly just an implementation change
I would suggest just giving all for loops their own scope, but that would absolutely break things
if it had its own scope, with the same rules as classes or functions, that'd unfortunately break things as simple aspy count = 0 for i in range(10): if i % 2 == 0: count += 1 since count is now no longer a local variable within the loop
nah it could be a nonlocal
the bigger problem is the data race
maybe we should introduce "weak scopes":
- if variable you are assigning to appears in surrounding scopes - use it
- if not - it is a local variable
count = 0
for i in range(10):
is_even = i % 2 == 0 # local
if is_even:
count += 1 # nonlocal
count # some value
is_even # error
i can't quite find the words, but such scoping rules sound a little.. arbitrary, i dunno, since it kinda lulls you into a false sense of security of loops having their own scope, until you accidentally shadow a variable name from an outer scope
agree
i like the idea, but at that point we're approaching just bringing in a let or var keyword into python (though such an idea does sound super interesting!)
there is already a thing that does the same thing as let/var: it is an annotation: x: T, it forces x to be a local variable
>>> stuff = [5,6,7,8]
>>> def foo():
... stuff: list
... stuff.append(9)
...
>>> foo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in foo
UnboundLocalError: cannot access local variable 'stuff' where it is not associated with a value
``` TIL
Hello anyone know book Begaining python?
and the walrus escapes, so you aren't even safe from edge cases by linting for shadowing of variables.
>>> [x for x in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> [(x:=i) for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> x
9
I should email him and ask him if heโd be willing to pose for a sticker
local count similar to global count
I added an API function for bytearray which resizes it to an exact size without overallocating:
https://pastebin.com/dy31smce (lines 78 and 265)
https://pastebin.com/6ZjU96Mx
https://pastebin.com/2risAhc8
Is this good enough to submit as a pull request?
There's no way to do this in the current API
Anybody else looking forward to the โnogilโ project? Itโs going to be quite complex to keep reference counting in a threadsafe way. I know there were some advocating moving to a pure garbage-collected model, like Java. But then even a simple script can end up swallowing all of memory, if it is left running long enough. And we would like to avoid that.
Also I know some people want to avoid the name โPython 4.0โ, after all the pain of the 2.xโ3.x transition. But I think the threadsafeness changes will have sufficient implications for backward (in)compatibility that calling the new version โ4.0โ would be a good idea. What do you think?
the core devs are deadset on not needing to call it 4.0
and to limit backwards incompatibility to things using the C API, as I understand. The intention is for nothing in the Python API to need to change.
I really don't think it will be anything like Python 2/3. Pure Python code will not need to change to support nogil; by contrast, pretty much every Python program had to be changed to support Python 3. C extension code will more often require changes and that migration will take a lot of ecosystem effort, but most people aren't writing C extensions
yeah. Extension module devs are a very important subset of the Python developer ecosystem, but they're a tiny portion - I'd wager that fewer than 1 in 1000 Python users ever interacts with the C API
Some of it will. Because there will be some Python code that assumes that pure-Python code will never execute concurrently on multiple threads; that assumption will be broken by removal of the GIL.
Hi guys, i saw a programming question named knightโs sequence. What does it mean by a knight sequence. What is 10-key sequence of knight?
I think that will be declared incorrect code that should have been using locks. Changes to dict iteration also broke some incorrect code.
I think the more interesting compatibility break is in the other direction: New code that is developed on nogil Python, and runs (but with terrible performance) on older versions. We have these breaks whenever a new Python version comes out, but usually it's easy to tell (because the code just won't work on older versions).
Some C extensions may get this handled for them automatically by using Hpy.
Things using Cython should get it for free, too.
Though there will still be separate gil/nogil ABI tags for a while, and that will need handling from each project
That assumption is already wrong though. You can already use threading and threads can switch at any point.
It's going to be pretty hard in Python code to distinguish two threads running truly concurrently (with nogil) or switching at arbitrary points (current situation)
So threading will make python faster like how it does in other languages?
Only if you add threads to your code.
And it will also make your code more prone to hard-to-debug race conditions
Compared to multiprocessing?
Yes, actually. Threads share more state than processes, so there's more ways to have data dependency bugs
people shouldn't have anything but the minimal neccessary shared state (Which is often "nothing") when doing things concurrently, and use the appropriate stategy for guarding that shared state based on the concurrency patterns in use. But what people "Should" do is very far away from a lot of real world code.
Yes, I guess the practical effect of nogil is that it will become a lot more important to make pure-Python code threadsafe, even if technically what you need to do to achieve thread-safety isn't terribly different from the current world
This is not true. Threads can switch only if GIL is dropped, which can happen (usually) after executing any bytecode instruction => threads can't switch in-between instruction
For example, consider operation x[::] = y. Lets assume that x and y are huge lists. This operation happens in C with acquired GIL, so no other thread can so anything => this operation is atomic in some sense.
But if there is no GIL, other threads can do different things to x and y while data is copied, which can result in broken data.
I can imagine a lot of code with these implicit assumptions about atomicness of some operations. And some atomic operations (that happens in C code) will become not atomic in noGIL, which will break code
You are generally correct that currently switching can happen in fewer points, but I'm not sure in practice that makes much of a difference. Under nogil the list would be locked during the execution of the slicing operation, so other code can't interfere with it.
also switching only happening between bytecode instructions or when the GIL is explicitly released doesn't help as much as you might assume, because all sorts of operations that look atomic can cause a Python __del__ method to fire, which runs new bytecode and gives new places where a context switch can happen.
!e ```py
class C:
def del(self):
print(x)
x = {1: C(), 2: C()}
x.update({1: 1, 2: 2})
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | {1: 1, 2: <__main__.C object at 0x7f75834eb2d0>}
002 | {1: 1, 2: 2}
the __del__ there sees an intermediate state where x is half-updated.
this is a persistent annoyance when writing C extensions: any Py_DECREF of an object of user-controlled type can cause a Python __del__ to run, and that __del__ could examine the state of your extension module or call into it (or more likely yield control to another thread that does), so the extension module dev needs to guarantee that their module and all of their objects are in a sane state that's OK to be exposed to users whenever they call Py_DECREF
Currently the switch only happens when you give up the GIL. In the future, there will be no GIL.
Can you say โHeisenbugโ?
Python code doesn't explicitly give up the GIL though
^. Any python code which breaks by changing this would have to have been relying relying on extremely subtle internal behaviors that aren't even guranteed to be consistent across implementations, or on native code that was being called holding the GIL for them.
What does getting rid of it accomplish?
by "it" you mean the GIL? Greater parallelism for multi-threaded code
Will that do anything besides make multithreaded pure python faster? Because it seems like that wouldn't matter that much since it would still be slower than using Cython / C-extension.
it will make multithreaded C extensions faster as well
C extensions currently need to acquire the GIL whenever they want to create a Python object, store a reference to a Python object, call a Python callable, allocate memory with the Python allocators, inspect an object using the Python C API, etc. There's many things that extension modules can do today without holding the GIL, but there's also many that they can't, and so the GIL forces some operations to be done in serial rather than in parallel
Why there is no noGIL single-threaded python build already?
Isnt that super simple? Just remove gil, any kinds of locks, forbid threading and that's it. It is way to get free performance in single-threaded apps
I heard that dropping and reacquiring GIL constantly is pretty slow
if "just remove gil" was simple we'd live in a different world
You have to stop holding it manually, right? So in a single threaded program, you can just hold it the whole time?
the GIL is very nice for single-threaded performance, because it means interpreter-internal data structures don't have to worry much about threading
that's why past attempts to remove the GIL tended to lead to huge performance regressions
in C code you release the GIL manually. In Python code you never interact with it directly
Just delete some lines from ceval.c, redefine some macros to expand to nothing...
I think this will work perfectly fine in single-threaded apps
Shouldn't it not affect single threaded python programs?
Just never release it. Then there's no overhead for releasing/acquiring it.
Actually, you can drop GIL using ctypes, and then horrible things will happen
it actually used to work that way, up until (I think) Python 3.7 - the GIL used to be created on demand, the first time a thread was created (or maybe when something tried to acquire the GIL from a different thread)
releasing and reacquiring it isn't very expensive, honestly. Mutexes are pretty close to free when there's no contention.
So basically, it starts making a difference when running multithreaded C code which has a lot of interactions with Python
Or when running multithreaded pure python.
right, just in general: it allows multithreaded code to have greater parallelism
Best not make assumptions lest you rediscover the implications
hi
What is the best way of proposing a pep, I mean I have to attach some code snippet that adds the stuff to the source code or at least try to explain it.
Just create a patch? or add the stuff on my local fork, and link that directly?
What are you proposing?
sorry, it's more like an extra function to asyncio, which was discussed before, and guido said he would sponsor it.
Oh the cancel thing
yes
Yeah so you probably need to double check with guido that he'll sponsor it, his email is guido@python.org I'm not entirely convinced it'd need a pep but I'd trust him
oh okay thank you
Hi. I want to contribute to python, is this a good place to get started with that? Or is my best bet joining the mailing list and looking for a mentor? I'm a student and want to learn more about how python works "under the hood." I've read some of the documentation about contributing.
Pick an issue on GitHub and check out the projects board, first, you can just send a pr or fix small issues
(C++) How do i have multiple python interpreters in one process?
i am using 3.12
I want to have a plugin system, where you can add files to a folder and they all will independently execute
Can be in sequence or threaded
Subinterpreters?
Read C-API about them
this ?
Yes
Modules have to be in sys.path in order to be imported
well i am providing my own modules
and disallowing for any external c modules and such
no io no os ...
I already modified the python source code
Then remove all modules you dont want to be imported
Keep in mind that some modules cannot be removed
is PyImport_AppendInittab global? as in: every interpreter has the same modules from it?
IIRC, all subinterpreters have their own object, objects cannot be a part of two subinterpreter worlds
There is a paragraph about this in the docs
so can the subinterpreters call built-ins which i added?
if the threads will each be executing different files do i still have to worry about the GIL?
nothing will be shared between threads
who here is an expert in creating tools using python?
do the threads end themselves after the script is done executing?
or do i need to manually check whether a thread has stopped execution
well, he said that he can't help through the whole process, so I might need some help.
If you're proposing to write the PEP and asking me to mentor you through the process, alas, I don't have time for that. In any case, it seems the OP in that thread is no longer interested -- perhaps you can add your opinion to the thread?
this is "the cancel thing"? Is there a thread somewhere about it? Maybe it doesn't need a full PEP.
yes, you can find one here: https://discuss.python.org/t/asyncio-cancel-a-cancellation-utility-as-a-coroutine-this-time-with-feeling/26304
I propose asyncio.cancel(task, *, msg=None) a novel utility for helping in cancelling tasks/futures. Why? I have seen too much code that calls task.cancel() and assumes immediate cancellation. With no await of the cancelled task or even a callback (yuck) The first thing one needs to do after calling cancel() on the task is to await the task....
it's not that hard, but it seems like everyone gave up working on it ๐
so it can be implemented in the task.py
i see, Guido suggested a PEP. Hmm, that's trickier then.
and he commented twice in the last hour
oh, I just refreshed the page.
I see
is byte ordering of the bytes passed in concurrent calls to socket.sendall()preserved?
looks like it is not https://github.com/python/cpython/blob/6f97eeec222f81bd7ae836c149872a40b079e2a6/Modules/socketmodule.c#L4384-L4449 sendall() is done by cosecutive calls to send(), this loop is not atomic
having multiple threads access any mutable object without synchronization is a bug, unless that object is explicitly documented as being thread safe (like queue.Queue is)
list.append isn't marked as thread safe (at least here). Are you telling me it isn't guaranteed to be?
From past experience , it isnโt thread safe
!d collections.deque
class collections.deque([iterable[, maxlen]])```
Returns a new deque object initialized left-to-right (using [`append()`](https://docs.python.org/3/library/collections.html#collections.deque.append)) with data from *iterable*. If *iterable* is not specified, the new deque is empty.
Deques are a generalization of stacks and queues (the name is pronounced โdeckโ and is short for โdouble-ended queueโ). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.
Though [`list`](https://docs.python.org/3/library/stdtypes.html#list) objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for `pop(0)` and `insert(0, v)` operations which change both the size and position of the underlying data representation.
This was my solution
IIRC, "accepted" means the PEP's been accepted and is being worked on, and "final" is for once said PEP has been added and, well, finalised
https://peps.python.org/pep-0001/#pep-review-resolution
Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and incorporated into the main source code repository, the status will be changed to โFinalโ.
oh so it's not completed yet, but the idea seems good.
is there a pep about descriptors as well?
Python Enhancement Proposals (PEPs)
in practice we're not very good at moving peps from accepted to final
what does that mean? if a pep is accepted it needs to be implemented so this is how it can go into final?
Yes. As an implementation detail of current versions of CPython it happens to be, but that's not a guarantee that the language, or even the implementation, makes
It would be perfectly correct for a future version of CPython to change list.append in a way where if it was called concurrently from different threads, only one of the two items winds up being added in the end, for instance
that will break a lot of code
depending on the version of Python you're running. += for int isn't atomic. ```py
cat test.py
from concurrent.futures import ThreadPoolExecutor
x = 0
def increment_x_n_times(n):
global x
for i in range(n):
x += 1
with ThreadPoolExecutor(max_workers=10) as executor:
for i in range(10):
executor.submit(increment_x_n_times, 100_000)
print(x)
```shell-session
$ python3.9 test.py
360530
$ python3.9 test.py
655863
$ python3.11 test.py
1000000
$ python3.11 test.py
1000000
The fact that int += is atomic in some versions and not in others wasn't documented anywhere - if you look at the "what's new in Python 3.10" and "what's new in Python 3.11" pages, you won't find this mentioned anywhere - because it's an implementation detail that's subject to change.
and because neither behavior was documented, it wouldn't be surprising if this changes back in some future version, or behaves differently in some other Python implementation
Interesting. We use list.append with threads in production code, but it's a list of errors that is (hopefully) only rarely appended to (and if an entry went missing, it didn't really matter), so I think we're fine.
You use threading in production?
Threading has always been an unstable mess for me, what are you using to control the threads?
A ThreadPoolExecutor. The threads are pretty short-lived.
"unstable"? that's an interesting take. I think threading is much less prone to subtle breakage than either multiprocessing or coroutine-based event loops like asyncio. multiprocessing is prone to subtle performance issues due to serializing data to send between processes as well as weird edge conditions (like, what happens if a process in the pool gets killed by the OOM killer while holding a multiprocessing lock?). coroutine-based event loops allow you to easily block the event loop and prevent parallelism without realizing you've done so
/avatar @dull ferry
yes
image compression creates unreadable strings. and your </> covers data
but offtopic so move to #ot0-psvmโs-eternal-disapproval
is this a good way of testing this code?
def test_task_cancel_and_await(self):
# phase 1
async def coro():
t = self.new_task(self.loop, asyncio.sleep(1))
await asyncio.cancel_and_await(t)
self.assertTrue(t.cancelled())
self.loop.run_until_complete(coro())
I haven't seen any specific section specialized for asyncio code testing(in cpython source code).
Should I just put it under the async BaseTask class?
it seems that some people still can't understand that, this channel is for CPython internals, what we can do is tell them aggressively to don't do it.
strictly speaking, internals of non-cpython python flavors might also fit here - they just don't usually (ever?) come up
We've talked occasionally about pypy internals here
MicroPython and CircuitPython, too, actually
Tips and tricks on python programming, could be useful for beginners -- https://book.pythontips.com/en/latest/exceptions.html
cpython gc question. for reasons, i have gc off.
this code https://github.com/redis/redis-py/blob/e3de026a90ef2cc35a5b68934029a0ef2a5b2f53/redis/connection.py#L515 seems to raise (and later handle) an exception. but because the exception is stored in a local, i think it's keeping everything i care about in my code alive.
redis/connection.py line 515
if isinstance(response, ResponseError):```
are my only options: a) gc.collect, or b) in every frame that could be a parent of this delete any local that's costly to keep around?
is there an easy change i can make to redis to prevent the cycle?
You can delete your exception instance, it will decref all frames, and your locals will be deallocated
If you really need to store exception, you can create copy of it, like that: type(e)(*e.args) (im not sure).
And probably you can remove frames references from exception manually
i don't have access to the exception instance. it's handled internally within redis. the only way i know of its existence is by following the gc.get_referrers to see what's keeping my locals alive
https://github.com/redis/redis-py/blob/e3de026a90ef2cc35a5b68934029a0ef2a5b2f53/redis/connection.py#L372
redis/connection.py line 372
self.read_response()```
but there's something i don't understand, since i'm not able to make a minimal repro that exactly corresponds
the function you linked is a little weird in that the exception is apparently returned from something it's calling?
yeah, there's a bunch of exception returning. i believe (but am not 100%) that this is where the exception is created: https://github.com/redis/redis-py/blob/e3de026a90ef2cc35a5b68934029a0ef2a5b2f53/redis/_parsers/base.py#L88
redis/_parsers/base.py line 88
return ResponseError(response)```
redis/_parsers/resp2.py line 43
return error```
then returned in another function, then actually raised at https://github.com/redis/redis-py/blob/e3de026a90ef2cc35a5b68934029a0ef2a5b2f53/redis/connection.py#L516
redis/connection.py line 516
raise response```
redis/connection.py line 373
except ResponseError:```
I tried this and now I'm wondering what this list could possibly be ๐ ```>>> import gc
gc.disable()
def inner():
... e = Exception()
... try:
... raise e
... except: pass
...
def outer():
... x = ["special string"]
... inner()
...
outer()
print([x for x in gc.get_objects() if type(x) is list and "special_string" in x])
[[b'print', 'print', b'(', 'print', b'[', b'x', 'x', b'for', 'x', 'x', 'x', b'x', 'x', b'in', 'x', b'gc', 'gc', b'.', b'get_objects', 'get_objects', b'(', b')', b'if', 'gc', b'type', 'type', b'(', b'x', 'x', b')', 'x', 'x', 'x', b'is', 'type', b'list', 'list', b'and', 'list', b'"special_string"', 'special_string', b'in', b'x', 'x', b']', 'x', b')', 'x', 'x', b'', 'print', 'print', 'print', 'print', 'print']]
oh this actually works, I just used an underscore instead of a space
>>> gc.disable()
>>> def inner():
... e = Exception()
... try:
... raise e
... except: pass
...
>>> def outer():
... x = ["special string"]
... inner()
...
>>> outer()
>>> print([x for x in gc.get_objects() if type(x) is list and len(x) == 1 and "special string" in x])
[['special string']]
I think there is a reference cycle between the exception and the frame locals for inner
and that's leaving the frame and locals for outer alive
putting finally: del e in inner fixes it
hmm i was doing something similar, but was doing print(gc.get_referrers(x)) in outer which ends up being empty
I think there is a reference cycle between the exception and the frame locals for inner
Right - the exceptionehas a reference to the most recent frame, which has a reference to that frame's locals, includinge- so that's your reference cycle. And the reference to the most recent frame also holds a reference to the calling frame, so that's what's keepingxalive
So:
most recent frame -> locals() -> e -> most recent frame
most recent frame -> second most recent frame -> locals() -> x
Yes. And I think gc.get_referrers doesn't work at that point because the reference is still owned by the frame, which doesn't participate in GC because the interpreter knows how to dispose of the reference
But after the function returns, the locals survive in a frame object. If I do gc.get_referrers() on the list afterwards, I see a reference from a frame object
[<frame at 0x101b55010, file '<stdin>', line 3, code outer>, [['special string']]]
okay, thanks, the gc.get_referrers behaviour was what i didn't understand / confused me a little
i also don't know the answer to my original question: 1) is there anything i can do to resolve the cycle other than gc.collect(), 2) is there an easy change to redis that would remove the cycle? seems like i can't really weakref anything
the change to redis would be to do try: raise response finally: del response
possibly you can fix it in user code by finding the cycle objects by trawling through gc.get_referents and then mutating something so that the cycle goes away? that would be very fragile though
thank you!
is it a good approach in CPython documentation to use Links if I want to redirect the user to a category?
Or should I go with this:
:meth:cancel() <asyncio.Task.cancel>
so
:meth:cancel() <asyncio.Task.cancel>
vs
'"cancelled https://docs.python.org/3/library/asyncio-task.html?highlight=asyncio task#task-cancellation_"
>>> from enum import Enum
>>> class X(Enum):
... a = 1
... b = 1.0
...
>>> X.a
<X.a: 1>
>>> X.b
<X.a: 1>
>>> X.a is X.b
True
``` is this the expected behaviour?
Hmm, the documentation is suspiciously vague about this
but I assume every item that compares equal to 1 will go as 1?..
I guess there's a hint here: https://docs.python.org/3/howto/enum.html#duplicatefreeenum that aliases are considered by equality
but that's an uh
stretch
Though why wouldn't this be expected behaviour?
Would you expect it to explicitly differentiate values based on type?
Maybe a bit implementation-specific, but there's a thing called _value2member_map_, which is a dictionary
>>> class X(IntEnum):
... a = True
... b = 1
... c = 1.0
...
>>> X.a, X.b, X.c
(<X.a: 1>, <X.a: 1>, <X.a: 1>)
>>>
>>> class X(Enum):
... a = True
... b = 1
... c = 1.0
...
>>> X.a, X.b, X.c
(<X.a: True>, <X.a: True>, <X.a: True>)
well that's a certified bruh moment
i was expecting that for a moment
but i dont think this feature would be very useful
IntEnum is a subclass of int, so it calls int(True), int(1), int(1.0) somewhere internally, and all of this becomes just 1
i think enums in python are overengineered
yeah, there is no real way for those values to be an integer subtype, due to the way the inheritance works out (which is an interesting limitation of inheritance I think may be worth expanding on)
help me guys
please open help thread #โ๏ฝhow-to-get-help
Enum members can be arbitrary values
I think by default equal values are aliased too
but i agree that the standard enums are a bit odd
>>> nan = float('nan')
>>>
>>> class X(Enum):
... a = nan
... b = nan
...
>>> X.a
<X.a: nan>
>>> X.b
<X.a: nan>
>>> X.a is X.b
True
>>>
>>> class X(Enum):
... a = float('nan')
... b = float('nan')
...
>>> X.a
<X.a: nan>
>>> X.b
<X.b: nan>
>>> X.a is X.b
False
there is a shortcut in most builtin collections: they first check for identity, and only then for equality
ah yes it probably uses a dict for the lookups
Does anybody have a favorite library to use for cpython bytecode/CodeType assembly?
I'm at a point where I have a list of valid dis.Instructions, but I need to populate co_codestring
you mean co_code? (which is a bytes array of opcodes)
Ah, cool.
I'm going to doublecheck, but I think it legitimately was co_codestring very briefly in 3.10.
But yeah
co_code
It may just be the CodeType constructor that calls it that.
Thank ya, looks like bytecode was what I was looking for.
hmm, it is indeed called __codestring in typeshed (the place where all typed signatures of stdlib live)
Wack
Very fun
We already have a big series of if/elif statements on version, so not that big a deal.
Perfectly willing to fill up a screen or two.
there are other inconsistencies in naming and ordering in __init__ and replace
I'd accept a typeshed PR to make them match. It probably doesn't matter practically though because the names of the pos-only args to __init__ are ignored and the order of the kw-only arguments to replace doesn't matter
Are they supposed to match 1-1?
no, but it would be nice
many people are saying this
In what way and why is that bad
They change a lot from one version to another, which causes pain around releases and upgrades
Yes, absolutely
they are... way too extensible, or whatever
Gotcha. I think they're weird to be classes too. Like the way we define them, they're nothing like any other class def and it's difficult to tell what you can and cannot do
I also do not understand how enum.property works at all. Like this makes no sense to me
Note the property and the member must be defined in separate classes; for example, the value and name attributes are defined in the Enum class, and Enum subclasses can define members with the names value and name.
This might sound crazy, but with typing being so popular, we could just use strings or whatever
Like ```py
ColorChannel = Literal["red", "green", "blue"]
Well, one thing it doesn't let you do is iterate over the members
actually
!e
from typing import get_args, Literal
ColorChannel = Literal["red", "green", "blue"]
print(get_args(ColorChannel))
@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.
('red', 'green', 'blue')
and if you want this https://docs.python.org/3.12/howto/enum.html#enum-dataclass-support
maybe you're doing it wrong
Flag = Literal["ignore_case", "multiline", "verbose"]
def re_search(pattern: str, haystack: str, flags: AbstractSet[Flag] = frozenset()) -> Match | None:
...
re_search("^[0-9]+ # yo are those digits??", "foo\n123\nbar", {"multiline", "verbose"})
I think the point is you define the property in an "abstract enum" class, a class that inherits from Enum but doesn't define members
then to define your actual enum, you inherit from that abstract class and then you can have members with the same name
enums arent powerful enough
they cant represent values with two orthogonal properties
for example, my things are either red or green, and they can be round or rectangular
there is no way to represent that as enum conveniently
!e import enum class Abs(enum.Enum): @property def prop(self): return 42 class E(Abs): prop = 3 print(E.prop.prop)
a pair of Color and Shape?..
which might be enums
@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.
42
wait I didn't even use enum.property
lmao
||bottom text||
Maybe eventually there will be enum2, which will be an adaptation of Rust enums. Which will also cover the trivial case
a.k.a. algebraic data type, union type, sum type, discriminated union, tagged union, variant record, sealed traits, disjoint union, variant, choice type, coproduct, disjoint coproduct, tagged variant, product dual, tagged product dual, discriminated coproduct, or intuitionistic logical disjunction under the CurryโHoward correspondence
my use-case was i bit more complex
i had 5 file types:
1LangCache
2Lang 2Cache
HDLang HDCache
``` rows represent one property, columns - another
`1LangCache` has both column properties
i wanted to represent these 5 values as enum (or enumflag, i dont remember) in such a way, that i will be able to check if value has 1st or 2nd property, but was unable to do that
i wanted API that looks like this: ```py
e = MyEnum(...)
e.is_lang()
e.is_cache()
e.is_1()
e.is_2()
e.is_hd()
I mean, it's not hard to do. The hard part is convincing all the editors and type checkers to understand them
i could create 5 distinct enum values, and in each method perform manual check, but i didnt like that
I don't think I understand at all
The hard part is convincing all the editors and type checkers to understand them
just dont do weird trickery with them, and they will be able to infer everything from stubs
Well, I can't do this without trickery
i suck at expressing my thoughts in english...
and I suck at understanding thoughts in english
If you have two orthogonal properties, why can't you like, unite them into a tuple?
because there is a thing that is red and green at the same time
i cant do (Color.RED & Color.GREEN, Shape.ROUND)
({"red", "green"}, {"round"})
you want something that fits a pattern except for special cases and want it to be elegant to implement?
or maybe i can with EnumFlag?
({Color.RED, Color.GREEN}, {Shape.ROUND})
yes you can, Color.Red | Color.Green
yes ๐ญ
I thought you were complaining that enums are too complicated, no?
๐
i find myself just refactoring that away when possible 
I just add ifs
makes my job veri secure
Like that file with a cyrillic ั in the name instead of latin c.
True story.
Someone probably did break-dancing on their keyboard and accidentally inserted the wrong c into a file.
why do you need trickery? ```py
class Color(enum2.Enum[int]): # can hold only ints
red = 0
green = 1
blue = 2
Color.red # 0
Color(1) # 1 (new returns value of the same type)
0 in Color # True
Then they imported said file with auto-import, very convenient.
they are complicated, they are weak, and they suck
all at once ๐
Then someone "fixed" the filename when transferring the code to a different VCS/repo/whatever
but of course did not fix the import
and so when the code was attempted to build at the new system, there was a very nasty error message like No module named the_foo_and_ั_things
It was there, just with a latin c
Now that's where VSCode's highlighting of suspicious characters would help. But everyone who has ever opened mixed cyrillic/latin text in VSCode just disables it
anyway...
I meant like Rust enums
oh, that is indeed hard to do without magic
...where each variant could hold some data, like rs enum Color { Rgb { red: u8, green: u8, blue: u8 }, Rgba { red: u8, green: u8, blue: u8, alpha: u8 }, Variable { name: String }, }
I usually just make a union of dataclasses, but it's kinda verbose and might be a bit WTF-ish to the reader
ADTs are nice. I'm not sure they should be the same concept as enums though
I mostly use enums for things I want to make sure can go in an integer column in a database
Yeah, I think Rust has some unfortunate naming here
what I meant was, if we had such a feature, it would technically also cover the trivial case of ```rs
enum Color { Red, Green, Blue }
data 
Allow me to introduce you to https://www.postgresql.org/docs/current/datatype-enum.html
how would you store values inside enum members?
i can think of this: ```py
class Color(...):
rgb: tuple[int, int, int]
rgba: tuple[int, int, int, int]
c = Color(...)
how to check if it is a rgb kind?
c.kind == Color.rgb
how to get value from it?
c.value == (1,2,3) # ?
class IPAddrKind(...):
V4, V6
k = IPAddrKind(...)
if k.kind == IPAddrKind.V4:
print('v4 detected')
k.value # what is this? None? AttributeError?
i think i dont quite understand rust enums right now
calling them enums was a mistake IMO, it is pretty confusing
something like ```py
class Color(ADT):
@case
def rgb(r: int, g: int, b: int): ...
@case
def rgba(r: int, g: int, b: int, alpha: int): ...
@case
def variable(name: str): ...
color1 = Color.rgb(255, 0, 255)
color2 = Color.variable("foo")
though this is still pretty verbose, idk
can rust "enums" have methods?
if no, you can omit @case
class Color(ADT):
class Rgb(Case):
r: int
g: int
b: int
class Rgba(Case):
r: int
g: int
b: int
alpha: int
class Var(Case):
name: str
it's complicated, but yes they can, but rust is a very different language
I mean, it's pretty much as verbose as any current alternative ๐คท
how would this work at runtime?
Color.Rgb(1,2,3)- what is this object? is it an instance ofColororRgb? Or both?- how to check if my thing is of kind
Color.Rgb?
class IPAddrKind(ADT):
class V4(Case): pass
class V6(Case): pass
Well, it almost works now if you remove the inheritance. Only minor touches are needed, like:
- make it impossible to inherit from
Color - swap the bare classes so that
Rgb,RgbaandVarall inherit fromColor - dataclassify the classes
So it would desugar to something like ```py
class Color:
pass
@dataclass(frozen=True)
class __Rgb(Color):
r: int
g: int
b: int
@dataclass(frozen=True)
class __Rgba(Color):
r: int
g: int
b: int
alpha: int
@dataclass(frozen=True)
class __Var(Color):
name: str
Color.Rgb = __Rgb
Color.Rgba = __Rgba
Color.Var = __Var
prohibit_further_subclasses(Color)
maybe cases with no fields should become singletons
well, that's bikeshedding
prohibit_further_subclasses is doable by patching cls.__flags__ (there is some flag for final classes, that is why you cant subclass NoneType or bool)
That's a lot of boiler plate
i like that ๐
This still requires some changes to type checkers so that they understand exhaustiveness of matching a Color against all 3 cases. I have no idea how easy or hard it is to implement
and the further question is: do we really need this? are ADTs common in Python code? wouldn't a union of dataclasses work just as well?
Though asking if ADTs are common in Python when the only way to have them is my butt-backwards union is perhaps not very fair. It's like asking why the city should build a bike path if nobody cycles on the highway
but maybe it's not that backwards
It is composing already existing constructs, and it's not totally clear what a new "official" construct would add
I wrote this thing when I was overexcited about things you can do with __prepare__: https://github.com/JelleZijlstra/taxonomy/blob/master/taxonomy/db/models/name.py#L2278
taxonomy/db/models/name.py line 2278
class NameTag(adt.ADT):```
it does work but is um not type-checker friendly
ah I did something similar
i don't remember why but it required meta-metaclasses
it was many years ago...
oh no, I am old
๐ฆ
the history on __prepare__ was that it was introduced for enum.Enum right?
I'm reading pep 3115 again, and I noticed this bit. Does it mean you can use __prepare__ as an instance method and have different affects?
The __prepare__ method will most often be implemented as a class method rather than an instance method because it is called before the metaclass instance (i.e. the class itself) is created.
okay I've done some crazy things with metaclasses in my time (we all went through that phase) but I've never needed a meta-metaclass
Colour me intrigued
is it a bad idea to establish a new attribute for a task(asyncio) which describes the cancelled message? At the moment this is what I have found but this is strange and probably doing some unexpected stuff under the hood.
task._cancel_message # users shouldnt use it
Basically, this attribute would have the same behaviour.
except asyncio.CancelledError as e:
print(e.args[0])
So we could implement a public attribute, something like this: -> task.cancel_message
I just stumbled upon this thread https://discuss.python.org/t/traceback-showing-local-variable-values-at-call-site-hacking-frame-f-locals-frame-f-lineno-etc/21411
I wonder if this idea gets some traction or if it's mostly forgotten?
My personal issue is that if this becomes default, it could have unintended side effects
Suppose you have a web application that logs every traceback. There might be some information you really do not want to log, like a credit card number together with its holder name.
How can I unbind a cellvar?
def make_cell() -> CellType:
unbound: None # Unbound cellvar.
return (lambda: unbound).__closure__[0]
cell = make_cell()
cell.cell_contents = 5
del cell.cell_contents # Is this sufficient?
Also, how can I set fn.__closure__ to a new tuple?
Is it possible to use threads/multiprocessing in python? I'm surprised and disappointed of the poor quality assurance. Is all of python like this?
https://github.com/python/cpython/issues/105829#issuecomment-1714593169
heard that python's getting a JIT compiler in 3.13, has it landed yet? Looking up optimizations on cpython on google hasn't yielded much other than the initial announcement about the compiler
there's now a tier 2 optimization thing in cpython + branch prediction but not much about converting to native machine code
What does tier 2 include?
Is it possible to use threads/multiprocessing in python?
yes it is
I'm surprised and disappointed of the poor quality assurance. Is all of python like this?
bugs are often fixed when people report them, so if people don't report them or the issue isn't noticed by the core devs, then it might not be fixed promptly. it's the same in any other OSS language
Of course it's possible, many many applications are built this way. And all software has bugs
oh i see this is your own open issue
consider paying a consultant to fix the bug for you
wsg
im having a problem in the most basic thing
i dont want to take help from the forms
can anyone of yall help me?
dm me if you can help
If you have a question, you should see #โ๏ฝhow-to-get-help and make a help post
That's what I don't want to do
why?
It's a dumb problem
Help posts are fine for any kind of question. You will get help much quicker if you open a help post. Very few people are willing to help via DMs, especially when you haven't described your problem
Sure
Bros the pyi trusted publishers docs have this link to octo-org/sample-project on GitHub which doesnโt exist and Iโm crying
what's the link exactly?
Iโm not quite sure if pypi warehouse wants the GitHub issue because itโs not a pypi bug.. I got legacy upload for me personally just now
https://docs.github.com/en/search?query=octo-org
github docs have a lot of fake links for octo-org too, they're merely serving as placeholders like how you might write https://example.com/ in place of a real website
Ah I see, thatโs too bad the release.yml thing seemed cool
is there a clean way to use logging lib to output to a different file every log? basically RotatingFileHandler but not based on time or size
You can reimplement RotatingFileHandlet with small changes
there was no stress test at all ๐ณ
We have already contributed a fix. I'm just surprised the level of testing was so low.
it has tens of thousands of users who use it in production, and no one else reported the bug you encountered in the many years since it was introduced. I think you overestimate how common the conditions required to trigger the bug are. If this is your first time finding a bug in a language's standard library, welcome to the club! All software has bugs, and I think it's quite sad that your attitude here is "how could a bug make it into the standard library?", rather than focusing on the good things - that there's an immediate workaround (using multiprocessing.pool instead), that there's an easy patch to apply (just affecting Python code, which you can patch without needing to recompile anything), that there's no security implications, etc, etc
You're acting as though this is a major issue, when in fact it's quite minor, as far as issues in a language's internals go. At least there's no CVE attached, heh
also, if you're disappointed by the level of testing that some particular module receives, note that the test suite is open source and you're welcome to contribute enhancements. Trying to assign blame to people who worked on this module in the past seems much less productive than trying to improve its quality going forward.
Though of course, note that race conditions are by definition non-deterministic, and are just about the hardest type of bug for a test suite to detect.
I don't mean to put the blame on a person. I'm just bummed I got bit by a bug and needed to ventilate my frustrations.
This is why a multithreading library should have a lot of testing. Random testing or stress testing. Anyway this bug seems like it will be resolved soon ๐๐
I think you ought to reconsider the way that you do that. If you look at the responses you got from people, it's very evident that people found your chosen method of venting to be rude.
Yes, I'll be more mindful of my approach and tone. Sorry ๐
guess we'll never know
Hey Everyone
I'm starting a course on udemy named 'The complete 2023 web development boot camp'.
I am searching for a buddy to join me on the course.
Let's learn together and help each other.
Please drop me a message if an anybody's interested. Thank you!
@hybrid relic @frigid bison https://docs.google.com/presentation/d/1_cvQUwO2WWsaySyCmIy9nj9by4JKnkbiPCqtluLP3Mg
In the JVM ... and other adaptive VMs, switching between tiers can be expensive
surprisingly accurate haha
Where are all CPython branches? I dont see them on github, i see only branches for major versions...
wdym "all cpython branches"?
They could've just been deleted
hey
I do remember there being a few branches that were feature related
is there someone i could talk?
yea actually
they're still there but they don't show up for some reason
oh
i think they're tags now
hey
i am learning python but sometime can't do a simple problem if it is new
can you please help me
and how to make coding notes if anyone know please give me sugessions
are there any peps about __call__ and __new__?
those probably predated PEPs. Not everything in the language is covered by a PEP. What information are you looking for?
well, I just want to learn a bunch of low-level stuff about them.
The data model docs are often quite good for this kind of thing. They don't have much on __call__, but they have information on __new__ and on metaclasses:
Thank you!
Often you can learn a lot just by playing around with code, as well. Here's a little script to illustrate the order in which various methods are called when classes are created and called:
class Meta(type):
def __new__(mcls, name, *args, **kwargs):
print(f'{name}: entering metaclass __new__')
cls = super().__new__(mcls, name, *args, **kwargs)
print(f'{name}: exiting metaclass __new__')
return cls
def __init__(cls, *args, **kwargs):
print(f'{cls.__name__}: entering metaclass __init__')
super().__init__(*args, **kwargs)
print(f'{cls.__name__}: exiting metaclass __init__')
def __call__(cls, *args, **kwargs):
print(f'{cls.__name__}: entering metaclass __call__')
new = super().__call__(*args, **kwargs)
print(f'{cls.__name__}: exiting metaclass __call__')
return new
class Klass(metaclass=Meta):
def __init_subclass__(cls, *args, **kwargs):
print(f'{cls.__name__}: entering class __init_subclass__')
super().__init_subclass__(*args, **kwargs)
print(f'{cls.__name__}: exiting class __init_subclass__')
def __new__(cls, *args, **kwargs):
print(f'{cls.__name__}: entering class __new__')
obj = super().__new__(cls, *args, **kwargs)
print(f'{cls.__name__}: exiting class __new__')
return obj
def __init__(self, *args, **kwargs):
print(f'{self.__class__.__name__}: entering class __init__')
super().__init__(*args, **kwargs)
print(f'{self.__class__.__name__}: exiting class __init__')
def __call__(self, *args, **kwargs):
print(f'{self.__class__.__name__}: entering class __call__')
print(f'{self.__class__.__name__}: exiting class __call__')
return 42
class SubKlass(Klass): pass
obj = Klass()
obj()
idk how to do the bot command thing but here's the output:
Klass: entering metaclass __new__
Klass: exiting metaclass __new__
Klass: entering metaclass __init__
Klass: exiting metaclass __init__
SubKlass: entering metaclass __new__
SubKlass: entering class __init_subclass__
SubKlass: exiting class __init_subclass__
SubKlass: exiting metaclass __new__
SubKlass: entering metaclass __init__
SubKlass: exiting metaclass __init__
Klass: entering metaclass __call__
Klass: entering class __new__
Klass: exiting class __new__
Klass: entering class __init__
Klass: exiting class __init__
Klass: exiting metaclass __call__
Klass: entering class __call__
Klass: exiting class __call__
!e
class Meta(type):
def __new__(mcls, name, *args, **kwargs):
print(f'{name}: entering metaclass __new__')
cls = super().__new__(mcls, name, *args, **kwargs)
print(f'{name}: exiting metaclass __new__')
return cls
def __init__(cls, *args, **kwargs):
print(f'{cls.__name__}: entering metaclass __init__')
super().__init__(*args, **kwargs)
print(f'{cls.__name__}: exiting metaclass __init__')
def __call__(cls, *args, **kwargs):
print(f'{cls.__name__}: entering metaclass __call__')
new = super().__call__(*args, **kwargs)
print(f'{cls.__name__}: exiting metaclass __call__')
return new
class Klass(metaclass=Meta):
def __init_subclass__(cls, *args, **kwargs):
print(f'{cls.__name__}: entering class __init_subclass__')
super().__init_subclass__(*args, **kwargs)
print(f'{cls.__name__}: exiting class __init_subclass__')
def __new__(cls, *args, **kwargs):
print(f'{cls.__name__}: entering class __new__')
obj = super().__new__(cls, *args, **kwargs)
print(f'{cls.__name__}: exiting class __new__')
return obj
def __init__(self, *args, **kwargs):
print(f'{self.__class__.__name__}: entering class __init__')
super().__init__(*args, **kwargs)
print(f'{self.__class__.__name__}: exiting class __init__')
def __call__(self, *args, **kwargs):
print(f'{self.__class__.__name__}: entering class __call__')
print(f'{self.__class__.__name__}: exiting class __call__')
return 42
class SubKlass(Klass): pass
obj = Klass()
obj()
@cyan raven :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | Klass: entering metaclass __new__
002 | Klass: exiting metaclass __new__
003 | Klass: entering metaclass __init__
004 | Klass: exiting metaclass __init__
005 | SubKlass: entering metaclass __new__
006 | SubKlass: entering class __init_subclass__
007 | SubKlass: exiting class __init_subclass__
008 | SubKlass: exiting metaclass __new__
009 | SubKlass: entering metaclass __init__
010 | SubKlass: exiting metaclass __init__
011 | Klass: entering metaclass __call__
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/RMNJCU4Y64IGDAQTEAKUBR2THE
can we make
thank you btw how hard is it to understand the genobject.c file? I mean Im not professional in c, but I'd love to see what's going on in the background.
that file covers coroutines and async generators, which I hear are pretty complicated under the hood
I have never had to touch that file myself though
Mark Shannon described async generators as a Jenga tower of state machines
That does check out
aren't coroutines mostly just a generator under the hood though? with extra state to make the "generator" awaitable
or is that assuption outdated nowadays?
If you stare at the file enough it will start to make sense, C or no C knowledge
A coroutine is internally very similar to a generator, though they differ in one field each IIRC.
Most C functions that work on generators also work on coroutines
huh, okay. I'm scared.
I would say, generators were initially introduced as a cut-down form of coroutine. The asyncio module was first introduced in 3.4, and implemented in what I thought was a really horrible way, entirely based on generators. Thankfully, the language designers realized that here was a need for proper coroutines, and so async/await was added in 3.5, and asyncio reworked to use it.
I did this diagram I call โVan Rossumโs Triangleโ https://www.deviantart.com/default-cube/art/Van-Rossum-s-Triangle-679791228 which tries to illustrate the ways that control gets transferred between generators, coroutines and regular โmainlineโ code.
Hey folks, is there a way to mark a test as 'not multiprocess safe' when ran via test.regrtest?
A function to kill thread would be nice similar to multiprocessing.terminate()
generally speaking, you really really don't want to be trying to kill threads. design your threads such that you can signal to them to end what they are doing if you need that.
and that isn't python specific
i was just re-thinking my approach in https://github.com/Python-Fuzzylogic/fuzzylogic/blob/master/src/fuzzylogic/functions.py to initialize functions and pre-computing things, then return an inner function for work. i realized it's basically the factory pattern for functions
however there's the issue of applying numba to speed those up, so now i was wondering how to keep the code in an un-executed state until they are officially initialized
i was contemplating .py modules but only compile them to ast, then apply numba to that instead of importing things directly
keeping the inner functions as strings is a big no-no because that would negate any help from IDEs etc
so, any idea how to have python code parseable but not immediately executed at runtime?
I tried implementing signal to gracefully shutdown but gunicorn suppresses signals and kills the process hard
tbh i think this is a bit of FUD, most programs don't do things with external resources that would be unsafe to terminate suddenly
dealing with graceful shutdown can get hard very quickly in even trivial situations
I'd rather give someone generally true advice that errs on the side of them doing it correctly while steering them to the right means of doing it, than add something that's more likely than not, a footgun for someone else down the line. This isn't a new problem, it isn't python specific, it's a question that comes up somewhat regularly across languages from people who usualy don't have the experience about os threads and concurrency to know when it is or isn't "safe".
the reason you can't kill threads isn't because of external resources. It's because they could be holding locks.
qt's QThreads have a forcible terminate so if you absolutely need killable threads those are an option via pyqt/pyside
if you terminate a QThread while it holds the GIL, no other thread will ever be able to acquire the GIL, and you'll deadlock your Python process. That sounds super unwise to me.
yeah, that is an issue. i guess one way to handle that could be to make sure the termination is run later in the event loop, like with a QTimer.singleShot. so e.g. if you need to terminate a QThread within itself QTimer.singleShot(0, self.terminate). would that be ok?
could there actually even be any case where a QThread is terminated from the outside while holding the GIL?
by the time you reach the .terminate call in another thread, the GIL would have already been handed over, and it wouldnt be released again until the terminate call is completed, right?
you'd need to read the code of pyside or pyqt to be sure. It wouldn't surprise me if they drop the GIL before calling terminate on QThread. You usually drop the GIL whenever you're doing anything where you'd like another thread to do something
based on sip file for QThread, terminate wont drop the GIL:
public slots:
void start(QThread::Priority priority = QThread::InheritPriority) /ReleaseGIL/;
void terminate();
void quit();
public:
bool wait(unsigned long msecs = ULONG_MAX) /ReleaseGIL/;
same for pyside's shiboken xml, where these are the only methods with allow-thread=yes modifier, which according to here means the call gets wrapped in a Py_BEGIN/END_ALLOW_THREADS:
<object-type name="QThread">
<enum-type name="Priority"/>
<modify-function signature="run()" thread="yes" />
<modify-function signature="exec()" rename="exec_" allow-thread="yes" />
<modify-function signature="msleep(unsigned long)" allow-thread="yes" />
<modify-function signature="sleep(unsigned long)" allow-thread="yes" />
<modify-function signature="usleep(unsigned long)" allow-thread="yes" />
<modify-function signature="wait(unsigned long)" allow-thread="yes" />
<modify-function signature="start(QThread::Priority)" allow-thread="yes">
<modify-argument index="1">
<rename to="priority"/>
</modify-argument>
</modify-function>
<modify-function signature="exit(int)" allow-thread="yes" />
</object-type>
so tl;dr is: seems like both in pyside and pyqt, QThread.terminate from the outside wont cause a deadlock since the GIL wont be handed back to the QThread before its termination is complete.
Well, it won't cause a deadlock on the GIL, at least. It could still cause a deadlock on a different mutex or semaphore, though
sure, but you specifically highlighted a deadlock involving the GIL, and that was what i was referring to
actually, circling back around to the original idea of adding an terminate to threading. how about a soft interruption instead? something that just sets a queryable flag on the thread. would reduce the rigamarole of setting up a graceful exit. something like
from threading import interruption_requested, Thread
def loop():
while not interruption_requested():
time.sleep(1)
print('thread interruption requested!')
thread = Thread(target=loop)
thread.start()
time.sleep(10)
thread.interrupt()
and maybe it could automatically be set for any child threads after things like KeyboardInterrupt/SIGINT, SIGHUP, sys.exit, etc.
What would be interesting as a "soft" way to stop a thread is Thread.throw. Right now it is only possible via ctypes, AFAIK.
That is somewhat funky because the thread can just be stuck in C code with no way of receiving the exception in a reasonable time if it's not made to expect it
I haven't tested this in the real world, of course, but I think it should just raise at the next possible Python moment.
yeah, just that it could be far away depending on what it's doing. I also had some weird issue where I had delayed exceptions with pyside because some python code didn't check for exceptions
!pip result sounds like this
Yes, but less serious, mine "ensures" you're handling the error :D
Cant you do the same in Err.__del__?
Doing this by spawning threads feels weird
not for my usecase, because I want to prevent assigning to err but never actually doing anything with it
Result > *, error as value
This would be significantly worse in many real world code bases, and definitely shouldn't be done automatically. If you have a thread that's handling a queue, just send a special value for the queue to finish work, and then you also allow gracefully closing the queue and aren't constantly busy looping for "maybe they want that cancelled"
I really think this comes down to teach people to design concurrent code to handle gracefully shutting down and error handling, rather than trying to come up with a "one-size can't quite fit all" solution.
I do think there is value in having some better abstraction for thread exits than "if should_terminate:stop" in a loop every once in a while. But I am not really aware of one
when the best possible abstraction is worse than not abstracting it, it probably shouldn't be abstracted.
daemon threads already close with the interpreter, because there you don't care if the (interpreter's) locks remain held
for 1-shot things in a thread, like moving blocking fileio to a thread in an async program, you probably just want to let the fileio finish most of the time
for long lived background threads, you probably should have a work queue or some other means for the background thread to communicate, and this becomes a viable means to also send "okay finish up" and handle that as appropriate to your application, without looping on that.
This would be significantly worse in many real world code bases, and definitely shouldn't be done automatically.
Could you elaborate on why that would be the case? With this approach, a thread would be free to completely ignore and never use the interruption flag. Having it be set automatically wouldn't change anything about existing code, while making it easier to cover some commonly encountered exit conditions for code that wants it.
If you have a thread that's handling a queue, just send a special value for the queue to finish work, and then you also allow gracefully closing the queue and aren't constantly busy looping for "maybe they want that cancelled"
Not all threads are handling queues.
I really think this comes down to teach people to design concurrent code to handle gracefully shutting down and error handling, rather than trying to come up with a "one-size can't quite fit all" solution.
I agree with promoting better code design, but how would that be achieved without teaching the use of some kind of signalling mechanism? Whether dequeuing a sentinel, or polling a flag, the goal is to provide some update to the thread about the wider program state, right? They're variations of the same thing. The threading.Event case in particular is a fairly common approach, and often seen as a solution to many questions about graceful shutdowns. This would be a more convenient version of that.
when the best possible abstraction is worse than not abstracting it, it probably shouldn't be abstracted
An explanation for why it's worse would be more helpful.
for long lived background threads, you probably should have a work queue or some other means for the background thread to communicate, and this becomes a viable means to also send "okay finish up" and handle that as appropriate to your application, without looping on that.
Unless, again, the thread isn't doing anything queue oriented, in which case forcing a queue into the mix would result in the same looped polling behaviour but now with even more overhead and complexity. And wouldn't any other means, short of injecting exceptions into the thread, also reduce to the same thing?
Yeah, I think the threading module could do with some sprucing up in general, especially with GIL removal on the horizon.
that's cool, i like the idea of interruptions via injecting exceptions, though only if there's some explicit control over exactly where that could occur, like being able to catch a asyncio.CancelledError on an await in coroutines, or with generator.throw on a yield. maybe with a context manager? e.g.
from threading import allow_interruptions, InterruptionError
def loop():
# do some stuff to set up...
# so far, the code out here is guaranteed to not be interrupted by any injected exception
while True:
try:
with allow_interruptions:
# but the code inside here can be interrupted
# do some stuff that blocks...
info = queue.get()
except InterruptionError:
break
else:
# do uninterruptable stuff with dequeued info...
# do clean up stuff
thread = Thread(target=loop)
thread.start()
# then later on:
thread.throw(CustomException) # raise a custom exception at the next interruption point
# or
thread.interrupt() # now equivalent to thread.throw(InterruptionError)
thread.join()
you're adding a "way to do things" that encourages a specific way as the way to do it which leads to looping on the "should I stop" rather than it being driven by being told to stop. API design encourages code design. Sometimes, checking something like that might be the only way, but I would rather not encourage the worst way to check this with API design, there are already enough pitfalls in concurrency for API design to lead someone to thinking this is a good way to do it.
This would be better handled by a section in threading docs showing basic ways to handle thread shutdown if it's that common, and then letting people pick the one that fits what matches their needs best. (and building more on it from there based on their needs)
if the thread has no need to communicate already, it probably shouldn't be terminated abrupty as it has no way to communicate anything about the cancellation back. Unlike asyncio, which has builtin ways to still handle this in done_callback (if you're at low level cancellation), there's no equivalent in threading without already needing a means of communicating. Event driven code performs significantly better in concurrent systems than code that busy loops or polls, and the latter should be avoided when possible (yes, it isn't always possible)
you're adding a "way to do things" that encourages a specific way as the way to do it which leads to looping on the "should I stop" rather than it being driven by being told to stop. API design encourages code design. Sometimes, checking something like that might be the only way, but I would rather not encourage the worst way to check this with API design, there are already enough pitfalls in concurrency for API design to lead someone to thinking this is a good way to do it.
this argument could be made for literally any feature. by this logic nothing new should ever be added because someone somewhere might misunderstand how to use it or assume it's automatically better than every alternative in every case. that's not an issue with the feature itself, it's at most an issue with presentation, or just plain old user error. also not sure how this explains why it would be 'significantly worse for many real world code bases'; the flag poll method is already common for code that doesnt do any queue oriented work and a built-in flag would be a more convenient way of doing that.
This would be better handled by a section in threading docs showing basic ways to handle thread shutdown if it's that common, and then letting people pick the one that fits what matches their needs best. (and building more on it from there based on their needs)
there's nothing about this feature which would prevent the addition of such a section. a new feature isn't in competition or mutually exclusive with more docs.
if the thread has no need to communicate already, it probably shouldn't be terminated abrupty as it has no way to communicate anything about the cancellation back.
not sure i follow the logic here. if a thread already doesn't need to communicate overall, then not being able to communicate about a cancellation wont be an issue either, since it's already been established that there's no need to. also to be clear, this feature is about enabling a graceful termination, as in allowing the thread to do cleanup, etc. on its own terms. not abrupt abrupt as in killing a process.
Unlike asyncio, which has builtin ways to still handle this in done_callback (if you're at low level cancellation), there's no equivalent in threading without already needing a means of communicating.
if the problem is recovering information from threads after theyre done, thats a separate issue which has always been there with threads (though I suppose ThreadPoolExecutor addresses this to an extent with concurrent.futures.Future). dont see how making one method of interruption more convenient could make this worse.
in any case, something like this could suffice:
def loop():
while not interruption_requested():
# do stuff
# clean up
return 'cool result'
class ThreadWithResult(Thread):
def __init__(self, target, args=None, kwargs=None):
super().__init__()
self.target = target
self.args = args or ()
self.kwargs = kwargs or {}
self.result = None
self.error = None
def run(self):
try:
self.result = self.target(*self.args, **self.kwargs)
except Exception as e:
self.error = e
thread = ThreadWithResult(loop)
thread.start()
# ... later on
thread.interrupt()
thread.join()
if thread.error is None:
# do things with result
(or alternatively add these attributes to the default Thread class to capture run's return value and uncaught exceptions)
if it specifically has to involve a callback, i suppose that could also be set with another method, or just as another attribute on the thread.
from threading import interruption_requested, interruption_callback, Thread
def loop():
while not interruption_requested:
# get callback
callback = interruption_callback()
if callback:
callback(...)
thread = Thread(target=loop)
thread.start()
# ... later on
thread.set_interruption_callback(print)
thread.interrupt()
thread.join()
The argument I made doesn't apply to all features, it's saying the base case of not adding it is a better state than the API provided in the standard library encouraging the worst way to do it, especially when you can already do what you want there yourself without it being at the language level. Sometimes abstractions aren't necessary, and the social impacts of them are negative.
Event driven code performs significantly better in concurrent systems than code that busy loops or polls, and the latter should be avoided when possible (yes, it isn't always possible)
Far as I know, it's possible to have event driven code that involves busy waiting or polling; these aren't mutually exclusive concepts. Also I feel like characterizing this kind of loop as a busy wait would be inaccurate since there would be work being done between each poll. It wouldn't be like some kind of spinlock, just sitting there repeatedly checking for an interruption doing nothing else. I think maybe the use of sleep in original example loop, intended as a placeholder for work being done, might have given a misleading impression.
isn't that kind of an external resource? other than the GIL
I wouldn't call an in-process lock an external resource. What is it external to?
They could be holding external locks too like a named semaphore but the problem remains the same whether you consider the resources being held as internal or external, killing a thread instead of communicating to close it prevents any neccessary cleanup and releasing of resources to happen. The same is not true with killing a process, as they recieve a signal from the OS.
An in-process lock may not be external to the system, but it is still a resource that is external to the current thread.
i guess i would say, "clean up held resources, especially locks". Internal vs external is vague and irrelevant.
right, but a thread holding an internal lock (i.e. one that i deliberately created inside my application) falls within the territory of "i know i'm not doing that, please let me just kill the thread"
if the thread is holding the lock, your app would be wrong to terminate the thread.
now anything else needing the lock is deadlocked.
right, but i can also deadlock my app in 100 other ways
actually i manage to deadlock my scripts almost guaranteed every time any time i have to write while (item := queue.get()) is not None: ...
with asyncio there's a workaround:
while True:
queue_get_task = asyncio.create_task(queue.get())
shutdown_signal_task = asyncio.create_task(shutdown_event.wait())
tasks = (shutdown_signal_task, queue_get_task)
done, _ = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
if queue_get_task in done:
...
if shutdown_signal_task in done:
break
but i don't know of any equivalent with threads, other than "more threads" which seems kind of.. sketchy? bad?
with concurrent.futures.ThreadPoolExecutor(2) as local_executor:
while True:
queue_get_fut = local_executor.submit(queue.get)
shutdown_signal_fut = local_executor.submit(shutdown_event.wait)
futures = (shutdown_signal_fut, queue_get_fut)
done, _ = concurrent.futures.wait(futures, return_when=concurrent.futures.FIRST_COMPLETED)
maybe this is just me being bad at software design, but i feel like this is not exactly a great experience for people who need to do a bunch of i/o and don't have an async-ready library available for that purpose
another option was floated in #async-and-concurrency when i last brought this up, which was to create my external resource handle in the main thread, but never actually use it there, and keep passing it off to run in worker threads, which is also kind of scary because such handles are often not even close to thread-safe
you've conflated two different things here. Not every mutex your thread might be holding is one that you deliberately created inside your application. There's one inside of the libc stdio objects used by CPython to print to stdout, for instance. There's almost certainly one inside the libc malloc function that CPython uses to allocate memory. There's one in the C code for initializing a block scoped static variable, and possibly one for initializing a thread-local variable inside an extension module. And those are just examples of mutexes in libc / libpthread. There's also mutexes inside of, for instance, the logging module, so if your thread was in the middle of logging when you killed it, it could die holding a mutex that could cause a deadlock on any future attempt to log anything from any thread.
in other words, "process-local mutexes are an internal resource rather than an external one" does not imply "internal resources were deliberately created by me in my own application"
i see, thanks for clarifying
Another interesting thing to keep in mind is that there are synchronization scenarios where releasing a lock on exit will also cause a bug.
o/
https://peps.python.org/pep-0492/#await-expression
Any yield from chain of calls ends with a yield.
Isn't that false?
Python Enhancement Proposals (PEPs)
I mean
yield from () also doesn't use yield under the hood, because tuple iterator is not a generator
yielding from if False: yield function will also do the thing
so yield from not always leads to yield?
and await is maybe await maybe block
yield from iterable generally?
yield from is also how you access the return value of a generator.
very niche feature
at least since async/await came along
well, await is actually a synonym for yield from
This is more about coming of async in python in general
no, it's not
They can be used in a similar way semantically, but they do not do the same thing under the hood
i guess they do similar thing
historically, yield from was used instead of await. Then await was introduced and it replaced yield from usage for async functions
there is still a function in stdlib that converts generator to async function
in case of coroutines also?
also, __iter__ in futures binded to __await__
yield from was never used in async functions. It was used with @asyncio.coroutine.
yield from is also a way you can implement __await__
that's right, remove your ๐ reaction
replaced for async functions
not in async functions
lol.
you must be a really smart person, thank you for helping this conversation going forward.
wdym
where am I wrong?
unalivejoy simply misinterpreted denball's message, and disputing imagined statement
of course yield from was not used in async functions, it's just a syntax error by pep.
it was used in generator based coroutines before async await
no?
Seems like people are mostly in agreement but not always using the most precise language
but still, what about that?
Yeah I don't know what to make of that sentence either.
we need to go back to 09-Apr-2015
damn
ModuleNotFoundError: No module named 'timetravel'
man
!rule 9 6 perhaps you should re-read the channel description. this channel is about python internals, which your message isn't. and in case you're wandering, nowhere in this entire server do we allow resumes
6. Do not post unapproved advertising.
9. Do not offer or ask for paid work of any kind.
!warn 1129321197448986636 Please don't attempt to solicit work here, per rules 6 and 9.
:incoming_envelope: :ok_hand: applied warning to @gaunt sleet.
I've deleted your message accordingly
lis = [1,2,3,4,5,6]
print(lis[5:65])
Why does this not result in an error?
because builtin sequences ignore index errors in case of slicing
otherwise it would be VERY annoying
imagine x[:5] erroring because there are less than 5 elements
where is the source code of the python discourse site(discussion. python), or is that just a fork of the original one?
But Golang shows errors on this !
golang is not python
looks interesting ๐
https://discuss.python.org/t/official-list-of-core-developers/924/4
Any thoughts on this, I'm not sure if this git repository was created.
I can be wrong. If you are aware of someone else having an โOSS dayโ (@emily maybe?), tell me ๐ Yeah, I tried to put this number in perspective with the popularity of the Python language: #3 most popular language in the world according to TIOBE indexโฆ and only 2 full-time paid developersโฆ Python is not a product. Itโs hard to justify to your m...
what git repo are you asking about?
I mean, the person who wrote the post mentioned there's no official list of core developers and asked a bunch of other stuff. I'm kinda curious if anything's changed since then or what's up with it now.
Is there a PEP for implementation of package managers?
you mean pip?
pep about pip?
No, just in general
!pep 518
there are some packaging-related peps, iirc
There are several
there is a lot of peps with "metadata" in their names
What does "build-system independent system" mean here?
"build-system independent format"
it is the universal format for describing building process, i guess
like setup.py or pyproject.toml
then maybe setup.cfg?
no
PEP 517 creates a specification for how to turn a pyproject.toml into a correct package
Yes.
setup.py and setup.cfg are a part of setuptools. They are not a part of any other build system.
And even the setuptools maintainers have started talking about getting rid of the setup.{py,cfg} now that they've added support for pyproject.tomls.
They're still faaar too prevalant to actually do that, but the talk is there
And not all pyproject.tomls are created equal, either.
Poetry uses the same file name, and it even looks almost the same, but they use their own structure and their own process that is not standards compliant
is this how Cython implements class methods?
https://github.com/cython/cython/blob/9827c6085e2141db71c55ae231a4a09a878dd524/Cython/Compiler/Symtab.py#L2175
I suppose this module refers to the symbol table inside the compiler.
Cython/Compiler/Symtab.py line 2175
if name == "classmethod":```
I think the best combo for someone who wants to go with setuptools is to use setuptools backend in pyproejct.toml.
build-backend = 'setuptools.build_meta'
the official list that iโm aware of is not public, presumably because it contains contact details
oh okay, is there a discord server or something where core developers are talking with each other, just wondering tho.
yes, but many core developers aren't on discord
And most important discussions take place in public
fair enough.
The aim is of course that all important discussions should take place in public (either on GitHub or at discuss.python.org) โ it's open source, after all. But occasionally there are things where a quick back-and-forth on an instant-messaging platform is really useful
yes, but I'd love to see a list in public so I can see all core developers and their work, I don't think some public information can cause issues.
Thank you, is there a requirement about how many hours you should spend developing per day as a core developer?
no
Some of the people listed here have not made contributions for many years, but are still officially core developers
so you can have commit privileges if you are a core dev I assume.
yes
age doesn't matter, does it?
no, why would it?
well, I'm thinking about making more contributions to CPython and if I have enough experience I might try being a core developer. I wasn't sure whether my age would fit(this is why I asked).
The release manager for 2.7 was a teenager I believe. Age doesn't matter.
Core developers need to be people who have demonstrated commitment to the project, people we're confident will work well as part of the team and people whose judgement we have confidence in. But there's certainly no age requirement
or time commitment.
Hi, is this a good place to ask about pypy?
I couldn't find any pypy related discussion in python-help forum (or maybe it's just that discord is autocorrecting it to pypi ? hmm)
Yes.
I was wondering what would be the best way of studying machine codes emitted by pypy JIT.
It seems like my options are vmprof (which functions as both profiler and JIT log visualizer) and jitviewer. Am I missing anything?
I was just being wary because vmprof.com is down and jitviewer was not updated for few years.
I want some Godbolt-esque tool that I could use to study how pypy is responding to my attempt at optimizing my code
I'm sure that the others will be able to help, I'm not too familiar with Pypy.
are each pep styles attributed to different python versions and does this mean each has linguistic differences or syntax variety
what styles do you mean?
how is pypy getting on with the latest features, is it still up-to-date?
they have a 3.10 version.
so it's not that active?
what do these bars mean?
the activity of pypy on GitLab. I was just looking at the project. I was wondering why it doesn't support Python 3.11.
3.10 isn't too far behind current.
An impression I got is that many previous core developers are no longer active but bug fixes are still being resolved and all. Wasnโt there HN thread recently where large amount of people came out of woodwork and explained how they are deploying pypy at work?
It's so easy to remember. 621 is what to do when attending a convention. 6 hours of sleep, 2 meals, 1 shower, every day.
||totally not because of the famous site ending in 621||
I didn't see any tasks in the todos text file, but I suppose they are working on it.
Could someone link me to the code where the self is being passed as the first argument to the methods under the hood?
there's two levels of that. One is that FunctionType has a __get__ implementation that returns a bound method object that inserts the extra argument
The other level is that in practice, as an optimization, we usually bypass that and the bytecode calls the function object directly with the extra argument added
you can trace the first one at https://github.com/python/cpython/blob/d73c12b88c2275fd44e27c91c24f3ac85419d2b8/Objects/funcobject.c#L962 (implementation of tp_descr_get for functions), which creates a method object (https://github.com/python/cpython/blob/d73c12b88c2275fd44e27c91c24f3ac85419d2b8/Objects/classobject.c#L108), which has a __call__ that ends up at https://github.com/python/cpython/blob/d73c12b88c2275fd44e27c91c24f3ac85419d2b8/Objects/classobject.c#L43
Objects/funcobject.c line 962
func_descr_get(PyObject *func, PyObject *obj, PyObject *type)```
`Objects/classobject.c` line 108
```c
PyMethod_New(PyObject *func, PyObject *self)```
`Objects/classobject.c` line 43
```c
method_vectorcall(PyObject *method, PyObject *const *args,```
and there you can see some code like newargs[0] = self
for the second level, I think you'd have to look at the ways the CALL opcode is specialized, e.g. https://github.com/python/cpython/blob/d73c12b88c2275fd44e27c91c24f3ac85419d2b8/Python/bytecodes.c#L3374
Python/bytecodes.c line 3374
inst(CALL_METHOD_DESCRIPTOR_O, (unused/1, unused/2, callable, self_or_null, args[oparg] -- res)) {```
I wonder if I could implement something in pure Python that passes the self in using descriptor terminology.
like this for classmethod.
import functools
class ClassMethod:
"Emulate PyClassMethod_Type() in Objects/funcobject.c"
def __init__(self, f):
self.f = f
functools.update_wrapper(self, f)
def __get__(self, obj, cls=None):
if cls is None:
cls = type(obj)
if hasattr(type(self.f), '__get__'):
# This code path was added in Python 3.9
# and was deprecated in Python 3.11.
return self.f.__get__(cls, cls)
return MethodType(self.f, cls)
thank you
yes, pretty sure you can implement classmethod and friends in Python using __get__ pretty easily
The descriptor HOWTO has a whole section giving pure-Python equivalents of property, classmethod, staticmethod and others: https://docs.python.org/3/howto/descriptor.html#pure-python-equivalents
hello
4 years ago i failed in senior secondary
can you tell me some valuable certifications that would make companies to overlook my gap and failure and still hire me
oh okay, thank you.
@fallen slate source reminder
Commands for managing your reminders.
: guys is it necessary to build a team for online hackthons and competitions like machine learning projects on kaggle
@feral island
I know its a really stupid question, but i got nervous a little bit, can you produce yourself?
No need to be nervous, questions is what this place is for. Not sure what "produce yourself" means though, do you mean "introduce"?
Im sorry i just learned english in youtube anyway, yeah i mean that.
I'm Jelle Zijlstra, I work on several open source projects related to the Python language, and I answer questions in a few channels on this server sometimes
Thats good, nice to meet you.
That's good!
Are you free?
is it common to have peps accepted but still unimplemented?
or its being implemented as accepted immediately?
There can be a time delay between acceptance and implementation. Usually when a PEP is accepted there is at least a prototype implementation
I mean, I was wondering whether it's possible to have peps accepted and still unimplemented. (if I were about to go back and check out every single pep)
It's theoretically possible but I think no PEP is currently in that state. For PEP 649 and 703 the SC has said they'd accept the PEP but I think there's no formal acceptance yet for either
okay, thank you.
yes just checked out the current state of pep 649:
I once saw a PR for a pep before it was officially submitted. The PR was done by guido btw
what is the hash algorithm that Python is using? Like the maths formula.
hash for what?
I'm talking about the built-in hash.
different types implement it differently
default hash?
SHA1?
no lol
not sure what default hash means in this context.
attr_tuple = tuple(getattr(self, attr) for attr in type(self).__slots__)
return hash(attr_tuple)
hash(...)
as Gobot said, there's no single algorithm. Everyone is free to implement their own hash
hmm, Return the hash value of the object (if it has one).
i cant find it but it includes the id and stuff
object.__hash__ is mostly the same as id(), right?
you mean something like this
a = 10
print(id(a))
no, that will use int.__hash__ which just returns the value (for small ints)
Python/bltinmodule.c line 1600
static PyObject *```
!e ```
o = object()
print(id(o))
print(hash(o))
@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 140255253185424
002 | 8765953324089
guess not!
I think its just divided by 16
well, they are different.
!e
o = object()
print(id(o))
print(hash(o) * 16)
@flat gazelle :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 140145685562256
002 | 140145685562256
(doesn't always work since the hash can end up negative - not sure what the circumstance would be, but I did just do it)
not sure if this is what im looking for: https://github.com/python/cpython/blob/main/Objects/object.c#L878
Objects/object.c line 878
PyObject_Hash(PyObject *v)```
There is only one (non security/digest-related) hash algo that is actually part of python rather than an implementation detail of CPython, and that is the numeric hash - https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types
!res
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Here's a common hash function used in Java. ```java
int hash = 7;
hash = 31 * hash + (int) id;
hash = 31 * hash + (name == null ? 0 : name.hashCode());
hash = 31 * hash + (email == null ? 0 : email.hashCode());
return hash;
I'd like to propose implementing __instancecheck__ for types.GenericAlias and types.TypeAliasType. Has there already been a discussion about this before?
it is impossible to check that some list is an instance of list[int]
It would be an O(n) operation
def is_int_list(value: Any) -> TypeGuard[list[int]]:
return isinstance(value, list) and all(isinstance(item, int) for item in value)
no, it is not
it is literally impossible to check
of course it wouldn't stop you from adding some non-int to the list at some other point.
and it would probably break if the list is empty
even adding ints to list[int] isn't always safe
What if you had ```py
class IntList(list[int]): pass
then [0] is not an instance of IntList, so it is useless
what about list[int] in cls.__orig_bases__?
same problem
creating an object of list[int] returns a regular list object anyway, so that's not really a thing
but what's the problem with an O(n) impl in isinstance? why do you say it's literally impossible
you can't know whether an empty list is a list[int] at runtime
think about it as runtime type validation. will protobuf accept an empty list for list[int]? it will.
isintance() has always been about runtime type validation
this usecase is definitely not for static typing, even though it might help in type narrowing
I'm doing bytecode analysis (3.10), and I'm trying to figure out the number of kwargs args vs kwargs in a CALL_FUNCTION_KW. Is there any way to determine this statically, or do I need to know the runtime value of the kwargs tuple? Will cpython ever generate this instruction when the kwargs tuple is not static and easily inferrable?
Protobuf will copy the list, so it can do the check fine. In a regular program, some other piece of code could also refer to the list, and at any point after the check add a str to it
yeah, I'm not saying that the list object should be a list[str] object, I'm just saying that isinstance(mylist, list[str]) should work at that given point in time, for any list
this makes sense for new type aliases as well:
class Item: ...
type items = list[Item]
...
isinstance(mylist, Items)
def f(l, cb):
if isinstance(l, list[int]):
cb()
for i in l:
print(i + 2)
x = [1]
f(x, lambda: x.append('a'))
```I do not think it is all that sensible to have an each-element check sort of default as a type check. If you do need an each-element check, you should just use an each-element check, but it is not always correct (consider taking it as a ctor argument and using the field), and I would argue it is not a sane default.
Maybe limit it to Sequence[int]
Maybe there can be a new builtin that does sort of thing.
Sugar around isinstance(x, list) and all(isinstance(el, int) for el in x)
best to make it isinstance(x, Sequence) and (len(s) == 0 or all(isinstance(el, int) for el in x))
actually, all([]) => True
People expect isinstance() to be O(1). Besides, even if you could check whether a given list is a list-of-int programmatically, you can't check whether a given iterable is an iterable-of-int programmatically, so this proposal would result in asymmetrical interfaces
actually yeah, Iterable would be a problem as it could be exhaustive. hm.
point taken, O(1) thing also is a fair assumption as i haven't seen isinstance implementations ever do a for loop
isinstance is doing both loop and recursion (to iterate through all types in given type tuple, and it does recursion because tuples can be nested)
>>> isinstance(1, ((((), ()), str), str))
False
>>> isinstance(1, ())
False
i meant to say __instancecheck__ implementations are generally O(1)
it's a reasonable assumption that isinstance(x, T) will be constant time
how would you do isinstance(x, list[int]) ?
will it be equivalent to isinstance(x, list)?
I'd rather __isinstance_check__ not exist at all :) (this won't happen for pragmatic reasons and backwards compatability, among a lot else)
I think rather than having this, it would have been better for a builtin to be added that can determine if things are structurally equivalent even if not nominally a subclass for runtime structural subtyping, but that ship has long sailed.
this usecase is definitely not for static typing, even though it might help in type narrowing
It won't. If you look at pytype, much stronger inference than this can already be done should a type checker choose to.
The runtime use of this can't be any better than exhausting the iterator (as was already said by others) and with such a cost attached, people are free to do it themselves, but I agree with others that hiding a cost in there for iterables when most people won't need it at runtime isn't ideal.
Is it a good idea to join internships on LinkedIn from small tech companies provided for free with small projects to showgirl? Mostly these small tech companies are Indian
E.g
Meriskill
Info aid tech
Code samurai
Bharat intern
Etc
There were already having a discussion, didn't wanted to interupt there in the middle of of it
Does anybody know if bytecode could ever be generated by cpython for the kwargs tuple in CALL_FUNCTION_KW, where the kwargs tuple is not in co_consts?
The same question applies to later versions, and to CALL in 3.12+.
Is there any situation where the kwargs are not statically known, other than fn(**kwargs), which uses a different instruction?
Feature proposal: async import
I'm busy polishing code for production (desktop application) and I am using asynchronous functions to load some of the heavier libraries with minimal impact on startup time. Multithreading does not provide significant time advantages over async in my case, the main hurdle is the linear flow which has plenty of waiting time downstream. It occurs to me that this is probably a common problem, and a common solution might be useful for the greater community.
I propose an "async import" function that is a built-in function or part of the asyncio library. The syntax could be something like pd = async import pandas (equivalent to "import pandas as pd"), where "async" acts almost like a decorator, but defines and instantiates this function instead:
async def import_function(package):
import package as x
return x
well, not exactly this function
when would you be able to use pandas after that?
(because it will block and act exactly as import package as x)
When you return a value that was imported, it gets assigned to the namespace you chose.
I mean, you can do this: ```py
pd = await asyncio.to_thread(import, "pandas")
I use pandas as an example, since it is often imported as pd. You can then call it as pd at any time
but the import might not be done yet, right?
You aren't really making clear how your proposal is different from just import pandas as pd
Looks interesting, will read up on that. Thank you
I suppose importing a package can be expensive. But is this expense from reading from the disk? Or is it from the CPU-bound work of just executing a lot of code?
In any case, I think the problem of UI taking too long to start is solved by running the UI in a separate thread
The software stops until the import statement returns a value. With a library like sentence_transformers this takes a second or two. With multiple libraries that take a second it adds up. Very few of these libraries are needed until the user actually does something.
My greatest pain in python is making the client wait for libraries to load. Async works really well for that, but @brittle mantle gave a great response so I will use that instead.
that's not a great response if you haven't measured what is slow in the importing
Is it disk I/O or is it compiling and running Python code (which is CPU-bound)?
If it's CPU-bound, then you have no choice but to wait for the import to finish. If you use threading you can at least interleave this CPU-bound work with the work in your UI thread
I guess to_thread kinda does this
The main problem with adding a whole language feature for this is: async/await is not coupled to a particular loop implementation, be it asyncio or trio. So how do you decide what to use for the I/O?
How does the dataclasses standard lib make sure that the fields are using type annotations?
Could someone link me to that part in the source code?
Lib/dataclasses.py line 970
cls_annotations = inspect.get_annotations(cls)```
thank you
In the context of a GUI-based app, the underlying code is often written in C (I think matplotlib is based on Matlab). Matplotlib specifically appears to run in its own process that interacts with the GUI package through a backend. In my specific case I want to delay imports until matplotlib does its thing (which is why multithreading doesn't differ much from async implementations).
Some imports have to happen linearly, but when an import can be delayed then there is actually very little literature on that. I've looked at lazy import implementations, but async seemed to work better.
I also have a case where I have a function which has an import in it, but the return value is cached. This solved a different problem though, since the client has to wait for data to process in anyway if he inputs new data. However, I mention it because it was another workaround to the long import problem.
But thanks for your advice, the code you suggested was exactly what I was looking for.
I find this slightly annoying, possibly inconsistent that:
>>> tuple(range(3))
(0, 1, 2)
but
>>> from typing import NamedTuple
>>> class MyTuple(NamedTuple):
... a: int
... b: int
... c: int
...
>>> MyTuple(range(3))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: MyTuple.__new__() missing 2 required positional arguments: 'b' and 'c'
I know this can be fixed with unpacking, e.g.,:
>>> MyTuple(*range(3))
MyTuple(a=0, b=1, c=2)
But it gets really ugly with generator expressions:
>>> MyTuple(i for i in range(3))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: MyTuple.__new__() missing 2 required positional arguments: 'b' and 'c'
>>> MyTuple(*(i for i in range(3)))
MyTuple(a=0, b=1, c=2)
Is there some given reason why NamedTuples can't be constructed from iterables?
It's a weird inconsistency, yes, but I don't think there's an alternative. If you interpret a single argument to the NamedTuple constructor as an iterable, you'll get to weird edge cases with single-element NamedTuples
wondering what you'd get from this: https://github.com/brettcannon/record-type/tree/main
since they changed initialization from tuple([1, 2, 3]) to MyTuple(1, 2, 3) i guess there's nothing to be done about it
nvm, you'd get the same thing.
maybe they should add a .from_iterable class method
yes, there might be a solution since it's a really fresh idea.
You might open an issue on the GitHub page if you want.
I can do it as well, I'm personally interested.
Why add that when * unpacking already works?
because it's uglier, especially in front of generator expressions
prefer:
MyTuple.from_iterable(i for i in range(3))
over
MyTuple(*(i for i in range(3)))
itertools.chain has similar
A syntax like "async import" would be a convenience, but this works so it is already a feature.
pd = asyncio.run(asyncio.to_thread(__import__, "pandas"))
but that would be hardcoding asyncio as the loop implementation to use
That line doesn't really do anything over import pandas as pd. You need to actually make the code async so that it can do other work while the import is running
I mean, this could be considered, but that's a major change. For very little gain IMO
And yes, this exact line does the same as import pandas as pd
According to this website, .to_thread() creates a coroutine that executes in a separate thread from the main thread when awaited (this syntax gives me an error which is why I use asyncio.run() to execute the coroutine.)
yes, but then you are blocking on the result
so the end result is basically the same: you are blocking waiting for the import to finish
the actual import happens in a separate thread, but I don't see how that helps you
Yeah, like asyncio.run(asyncio.sleep(5)) is exactly the same as time.sleep(5)
I have a similar problem/workflow, with a 3-4 second import delay (total) after pruning and adding lazy imports where helpful (reviewed with -X importtime). This kills me on multiprocessing, where every process ends up with another 3-4 sec delay. Pandas is one of my worst cases. In my particular use case, there's a few points where this async solution actually would be helpful: where we're waiting on data (via async apis/requests), but before we need the rest of the stack (pandas, charting libraries, pyarrow, etc).
In my experience, async only blocks if the await keyword is used. If you use asyncio.run() an an async function where there is no await keyword, my experience is that it behaves like multithreading provided you are not doing heavy calculations (but I can draw scatterplots and pie charts simultaneously with word clouds using async without await, and that feels pretty insantaneous to me)
which was rejected, but discusses some of the relevant design space
asyncio.run() will not work in an async function (running under asyncio) at all
!e
import asyncio
async def foo():
asyncio.run(asyncio.sleep(1))
asyncio.run(foo())
@grave jolt :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "/home/main.py", line 6, in <module>
003 | asyncio.run(foo())
004 | File "/lang/python/default/lib/python3.11/asyncio/runners.py", line 190, in run
005 | return runner.run(main)
006 | ^^^^^^^^^^^^^^^^
007 | File "/lang/python/default/lib/python3.11/asyncio/runners.py", line 118, in run
008 | return self._loop.run_until_complete(task)
009 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
010 | File "/lang/python/default/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
011 | return future.result()
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/Q3K53TJDACDHEHDU3AVADX6PYE
Using time.perf_counter(), I find the normal import is consistently slower by ~10% (0.13 sec) for pandas than the async. I don't think it is significant enough to rave about, but it isn't measuring the import time but rather how quickly the event loop moves on from the import statement.
I agree async is actually awaiting for the import, but the import is consitently ~0.1 seconds less than the 1.2-1.3 seconds the normal import statement takes.
How did you benchmark it?
I run the two import methods individually using time.perf_counter() for the timepoints. I get consistent results across repeats
Can you share the code?
I tried this ```import asyncio
import sys
import time
before = time.perf_counter()
if sys.argv[1] == "asyncio":
pd = asyncio.run(asyncio.to_thread(import, "pandas"))
else:
import pandas as pd
after = time.perf_counter()
print(f"imported pandas in {after - before:.2f} seconds")
I tried this, and I get
Sync: 2.1705752294510603
Async: 3.07330556132365
I think it is a fluke. Pandas seems to be faster but sentence_transformers takes longer. Definitely not loading async though
I think that's not a fair benchmark because the first import takes the bulk of the time. You pop pandas itself out of sys.modules but not all of its submodules
you really have to do it in a fresh process
big sad
I can definitely see the effect of pyc files though ```% python ~/py/tmp/importpandas.py sync
imported pandas in 1.76 seconds
% python ~/py/tmp/importpandas.py asyncio
imported pandas in 0.21 seconds
% python ~/py/tmp/importpandas.py sync
imported pandas in 0.18 seconds
% python ~/py/tmp/importpandas.py asyncio
imported pandas in 0.21 seconds
that means you have to create a new venv for each test run? ๐ฌ
or well, clean the pyc files somehow
I feel like a benchmark with pyc files is more useful, though I guess it depends on your application
spin up a docker container
I am now convinced that an async approach on average will always be more expensive than normal imports. That got me thinking about multithreading (to get past the blocking event). I modified your code as provided below and got the following results for N=10 000:
Normal: 57.36153389397077
Multithreaded: 55.91058199480176
and
Normal: 51.267108304426074
Multithreaded: 53.34445761004463
However, with N = 1000 I got
Normal: 13.458581696497276
Multithreaded: 4.948981299996376
and
Normal: 15.239287502830848
Multithreaded: 5.18649219837971
The discrepancy is beyond me, but the multithreading implementation is literally the simplest. I modified your code and included sentence_transformers as a second library because it also has a long import time:
import sys
import time
import threading
N = 1000
def sync_import():
total = 0
for _ in range(N):
start = time.perf_counter()
import pandas
import sentence_transformers
end = time.perf_counter()
total += (end - start)
sys.modules.pop("pandas")
sys.modules.pop("sentence_transformers")
print("Normal:", total)
def threaded_import():
total = 0
for _ in range(N):
start = time.perf_counter()
t1 = threading.Thread(target=__import__, args=("pandas",))
t2 = threading.Thread(target=__import__, args=("sentence_transformers",))
t1.start()
t2.start()
end = time.perf_counter()
total += (end - start)
t1.join()
t2.join()
try:
sys.modules.pop("pandas")
sys.modules.pop("sentence_transformers")
except KeyError: pass
print("Multithreaded:", total)
sync_import()
threaded_import()
your benchmark is measuring the time to start the thread, not the time to run the import
That is the point of the exercise, to move past the blocking event while importing in the background
This seems extremely unlikely to help anything, presumably if you're importing it, you need these things to exist. What happens when something defined in the module relies on the import?
In the context of a GUI, there is plenty of time to do imports while the user navigates the mouse and types in a query or something. On the other hand, sentence_transformers takes 8 seconds to load on my PC (intel core i7) which is a very noticable and cumbersome waiting time for an application
I don't see how that's the fault of current synchronous imports at all?
you can easily have your application entrypoint not import anything expensive until starting a seperate thread for the GUI
It just occurred to me that sys.modules.pop() does not work. It says I loaded sentence_transformers 1000 times in 13 seconds, but loading it once takes 6 seconds on a good day.
I must implement it using multiprocessing and a shared value for the counter, since that will ensure each instance actually imports the full package and not from memory. I'll do that tomorrow and let you know if anything changes. Otherwise great chat
Not to be too pessimistic here, but you should have multiple threads in any application taking user input anyhow, at least 2, as you don't want your GUI blocked on any of the work the GUI is driving.
synchronous imports then work just fine as long as the GUI is tossed into a thread, before all the import statements (which is perfectly possible by doing something like:)
...
queue_pair = ...
some_handle = start_gui_thread(...)
import ... # leave a comment here about it not being a module level import due to impact on gui
Also, when a user is not likely to use modules every time you can return the module in a function attached to a lru cache. That way it only loads once if needed (cache is redundant but useful to limit memory use).
In general, putting imports inside functions has the most impact on reducing waiting times for clients. Caches are useful to avoid reloading if one import is in several functions (and it is too expensive to load if not needed)
importing in a function already defers to sys.modules just like module level imports, it just changes when that gets executed
I get the impression that sys.modules.pop() wasn't working because the compiler had a cache of the imports somewhere. Including a sleep command might give enough time for garbage collection (my prime suspect)
I mentioned this above. It only pops the top-level module, but the rest of the package is still cached.
In light of our discussion yesterday, I would like to redefine my feature proposal:
I propose a feature for the Python versions being developed without a GIL that allows a user to specify that an import must run in the background as a parallel thread. This will still work with the GIL, but less efficiently.
The aim is not for a lazy import, but just to prevent imports from being blocking events on the start up of a program.
The interest group for whom I suggest this is desktop developers, where waiting times depends on client hardware and any optimization leads to a better product.
The syntax I suggest is something similar to split import package, which tells the compiler to perform the import in a separate thread and assign the resulting object to a namespace in the main thread with the same name as the package.
And what happens if I do
split import slow_module
slow_module.do_thing()
Attribute access blocks until the import has finished?
For that you use a normal import. The extra keyword is so you have to do it by choice
It is a common case that software has to wait for user input before doing most of its computations. Those imports aren't needed immediately, but if the import is lazy you still have to wait later.
I'm not asking what you should use, but what should happen if you don't.
Then you raise a NameError.
You can use an alias in the separate thread and only assign the module to the package name upon completion, so the name won't be in the name space in your case, causing a NameError.
A developer can then catch that error with a retry loop to wait out the import
Alternatively, the compiler can implement all imports like this, with a built-in retry loop that waits for the import. I'm not proposing that, just speculating about an alternative implementation as food for thought.
I have a working prototype of a system for parallel imports. It works incredibly well, thank you to everyone who contributed. A join statement is used to ensure the package has been loaded. The results are crazy:
Import statement pass time: 0.0024886999744921923
Import statement execution time: 7.9670460999477655
Name: sentence_transformers
and here is the code:
import threading
import time
class SplitImport(threading.Thread):
def __init__(self, name=None, Verbose=None) -> None:
args = (name,)
self.target = __import__
threading.Thread.__init__(self, None, self.target, name, args, {})
self._return = None
def run(self):
if self.target is not None:
self._return = self._target(*self._args)
def join(self, *args):
threading.Thread.join(self, *args)
return self._return
# Timing statements
t = time.perf_counter()
split_import = SplitImport("sentence_transformers")
split_import.start()
print("Import statement pass time:", time.perf_counter()-t)
sentence_transformers = split_import.join()
print("Import statement execution time:", time.perf_counter()-t)
print("Name:", sentence_transformers.__name__)
you are still measuring thread creation time, so comparing it to normal import doesn't make sense
I am not attempting to impact import time at all, I am trying to prevent import statements from blocking the flow. Sentence_transformers is a good example, I only need it when the user makes a query so it can import in the background while matplotlib renders the various graphs
I can't let the user wait 7 seconds on startup, and I can't let them wait the first time they perform a query. A background import is the ideal solution. And the usecase is general for GUI developers.
That 7 second is 12 seconds on my smaller laptop
loading modules in different threads happens not that often to invent new syntax for it
Perhaps, but Python has a reputation for being slow and a speed up like this can really boost the Python GUI development community. Imports that block the event loop consume time that is not always justified by the use case of the software.
Certain features could be described as redundant, but they add to the flavour by affecting how a product is used.
For me, I get the full functionality of my software, but my startup time has almost disappeared. I'm actually concerned the splash screen disappears too fast.
This is not an arbitrary thing from the perspective of a desktop developer. I am genuinely excited about reducing the lag in my software, because speed impacts the client's opinion of the software.
People who use Python in other ways might not need the feature, but running the prototype I provided above is tedious and repetitive for multiple imports.
So does the language need the feature? No. But would the feature make Python feel faster for the user in certain use cases? Certainly.
a lot of features were rejected because benefits from them were not very big or part of community affected by the feature was too small
Don't get me wrong. I proposed a feature, we had a discussion, it resulted in a genuine solution for me, so I used that as a basis for a better proposal. Whether that proposal is accepted or rejected is not my concern.
I shared my code as a proof of concept. In the case of the sentence_transformers library (which imports Pytorch, a ~800 MB C library), I only need to wait 0.002 seconds (instead of 7.96 seconds, a ~4000 times improvement).
As such, I also showed that this approach is useful in bringing AI-based applications to the user, on-device.
I think there is a rejected lazy import pep that partially addressed this
Lazy import makes you wait later rather than waiting now. That is not a solution in my opinion
The later waiting can be in a different thread if you use it in one without having to explicitly take care of it
You may also run into stuff like this: https://github.com/python/cpython/issues/83065 (Still open, not the only thing you may run into around this either, here's another: https://github.com/python/cpython/issues/91238)
and you're adding a lot of machinery to thread imports instead of just what was suggested here #internals-and-peps message on launching the GUI in a thread prior to expensive imports on the main thread, and I didn't see any added details that explained why this wouldn't work here.
There appears to be an issue loading submodules, not sure if you encounter the same problem if you load the module and return the submodule from a thread. I haven't encountered the issue in my usecase, but useful to keep in mind.
My code does not produce the error when loading tensorflow.estimator, so not sure if the issue was fixed in later versions (I am useing 3.10).
there are a lot of theoretical issues that people with experience with concurrency could point to with what you're wanting out of this. I was trying to stick to demonstrable issues that arise from it along with something easy and which behaves correctly* that you can do now to get what you want (a responsive GUI that isn't waiting on your expensive imports)
* This is somewhat limited by "are your libraries also doing the right things?"
@ripe tinsel i feel like we keep asking, "How do we know when the async import is done so that we can use it?" and you haven't given an answer.
is the core-mentorship still functioning, or is it active?
Sorry, I thought my answer is clear. Whenever you use any module incorrectly, an error is raised and every developer has to be aware of these errors and how to catch them.
Nevertheless, I proposed maintaining a list of pending imports and removing names from the list if the import is complete. Then, when the interpreter encounters a NameError it can check if the name is in the list and halt the event loop in 0.1 second cycles until the name disappears from the list. If the name is not on the list then it raises the error like normal.
Alternatively, calling "split import" can create an intermediary object with the name of the package and a method called join() which returns "True" if the module is loaded (insert it after loading the module and assigning to the namespace) or otherwise acts like the normal Thread.join() function if the name is assigned to the intermediary object and not the loaded module, but also returns True when finished.
Either way, if the module is not finished loading and a function from the module is called, that will result in an error which can be handled like any other error.
Definitely if you are going to pursue this idea, you want a way to use the module with a guarantee of no errors. "Every developer has to be aware of these errors and how to catch them" isn't enough. I'm importing the module because I want to use it. A race condition during an async import isn't something to catch, it's something to prevent. A join operation somehow is the way to do that.
A useful variant could be that the import statement returns a special object so that if you access an attribute on it, it blocks until the import is complete
That
is something you could write today, no language change needed
you can even replace __import__ to do that, and i believe not much will break
except for imports with side-effects
well you open yourself up to fun threading bugs, like the issue @urban sandal linked above ๐
a special object
i think it even can be a ModuleType subclass, that replaces its.__dict__with actual module.__dict__on first attr access
Does someone know in what pep slots were introduced?
slots might predate the pep process.