#internals-and-peps
1 messages ยท Page 15 of 1
this code #internals-and-peps message prints "hi" three times in 3.10+, and only two times in 3.9-. Seems like the 3.10+ behaviour is correct.
<3.10 it prints a line event only once
3.10+ it prints a line event thrice
Oh, right. In my original code it was 2 vs 1.
@pliant tusk I think I fixed my releasebuffer problems now. The issue was that it became impossible to actually call the C bf_releasebuffer slot on classes that override it with a Python __release_buffer__. Now when we call the slot from C code, we invoke both the Python method and any C slot in a base class, to ensure the C slot is called correctly and only when required.
and I wrote a blog post 12 years ago (almost exactly!) where I called this a 10-year-old bug: https://nedbatchelder.com/text/trace-function.html
Thanks! I should have thought about asking you about this (although I'm guessing you never want to stop tracing in coverage.py).
That sounds right, I'll pull the changes later today to dig thru them
thanks!
I realize that PEP 702 is still in draft/under consideration, but this docstring in your PR is outdated @feral island https://github.com/python/cpython/blob/342f28b6ef83fcf2aa60fdd11f74df914d3ae0bf/Lib/typing.py#L3585
Lib/typing.py line 3585
No runtime warning is issued. The decorator sets the `โ`โ__deprecated__`โ`โ```
thanks, I guess that probably means it's wrong on typing-extensions too
actually, it's important to stop tracing sometimes. Also, PEP669 (landed in 3.12) completely changes how tracing can work, though sys.settrace is now implemented on top of it in a backward-compatible way.
oh wait, I already wrote a better version in typing-extensions
I've read about that. Does the "bug" remain then, if it's backward-compatible?
I haven't checked. I think it would have to remain. I'm writing up a PEP669 bug right now...
Damn i love decorators, but holy sh*t they're way too difficult sometimes.
Only took me about 5 hours to get a decorator working that could wrap around a function with parameters and had 2 parameters of its own.
Basically i wanted a decorator that had its own parameters, could wrap around a function with parameters so it would still be able to use those parameters on call.
So i came up with this:
https://pastebin.com/XiiauQMK
This is probably a really bad way to implement it. as i haven't got that much experience with decorators yet.
I would have loved if i could have worked out a way to access the object the decorator is wrapping, that isn't also the first parameter.
Again, i assume there is already such a thing, and i just didn't find it. But just in case it isn't; might i suggest adding a property to decorators constructed from a class instead of a function. Just an exposed property that holds the object you've wrapped, in order to make sure the decorator's parameters are still left open.
I'm sure it is technically possible to find ways other then a special property to get the same results as i wanted. but i genuinely couldn't find them.
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
the way you did it is fairly standard as far as I can tell?
you can take a look at how dataclasses.dataclass is implemented, and/or some of discord.py's decorators
Does anyone know of proposals that try to address the problem of dependencies with conflicting subdepndencies? Like you depend on module A and module B but they both depend on different versions of module C? A naive solution could be to automatically rewrite module A and B to have a private module C. I imagine solutions like that fall apart quickly?
Thank you!
you can do something like: ```py
def decorator(func=None, *, boolean_1=False, boolean_2=False):
def actual_decorator(func):
@wraps(func)
def wrapper(*func_args, **func_kwargs):
print(boolean_1, boolean_2)
result = func(*func_args, **func_kwargs)
return result
return wrapper
if func is None:
return actual_decorator
return actual_decorator(func)
```py
@decorator
# or
@decorator()
# or
@decorator(boolean_1=True)
# or
@decorator(boolean_1=True, boolean_2=True)
# not like this:
@decorator(True)
my answer from the other channel : ```python
from importlib import import_module
if loader.name == "A": # current module name
C = import_module("CforA")
... # etc
The closest thing I know of is vendoring - https://pypi.org/project/vendoring/ - which isn't super supported, but is used internally by pip
I think doing this for pure Python modules is actually much trickier than doing it for C modules, because those Python modules do return objects that came from the library dependency. If you've got two libraries that want to pull in two different versions of pandas, let's say, and both of those libraries return dataframes to the user, then the user can get access to dataframes created by two different versions of pandas, and won't be able to use isinstance(obj, pandas.dataframe) on them both (that'll be false for at least one of them, if not both), and won't necessarily be able to call the same methods on both (the newer version of the library might have added or removed methods), etc.
vendoring shared library dependencies for C extension modules the way auditwheel repair does is basically safe because you can't take an object created in a non-Python C library and return it to the Python caller of your library. With Python libraries, you can, and that'll cause you a ton of trouble.
thanks for the link!
in the situation of Blender, the plugins can only interact with Blender apis and can't pass in outside module types so I think this works out in this situation
well, that might be a reason why you won't find anyone else who's done this, and will need to build your own home-grown solution, at least.
if you have enough control over the environment where these plugins are imported, you might be able to play a trick like manipulating sys.path and sys.modules before importing a plugin, so that it has its own private set of "what modules have been imported" as well as its own private set of "what directories do I look for modules that haven't yet been imported in"
that ones tricky because modules can dynamically load dependencies at unknown times right?
yes - that's not super common, but it is possible. But perhaps you could also mess with sys.path and sys.modules before calling from your core into any function defined by a module?
ah right
I'd be remiss if I didn't point out: the simple and obvious solution here is subprocesses. Run each plugin in its own dedicated Python process, where it has its own sys.path and sys.modules. That comes with the cost of needing to serialize data that needs to be shuffled back and forth to the "controller" process, but gives you isolation basically for free.
on IRC someone suggested subinterpreters in the same vein
yeah, that works as well, with basically the same caveats - you need to serialize data to get it between interpreters, or to whatever is controlling and coordinating all of the interpreters
you would still need to have a differently named package for each version of the dependency installed into venv right?
you'd just have different venvs
instead of one site-packages directory with two versions of numpy installed under different names, you'd have two site-packages directories that each have their own numpy
ohh, can you just switch the site_packages folder at runtime then?
for each subprocess?
yeah, that's what the sys.path manipulation I was talking about above would be doing.
oh right yeah that makes sense
each subprocess would just invoke the python in different venv
yeah that makes sense I forgot that subprocesses execute a new program, not just spawn a new interpreter
you could make it work by just spawning a new copy of the same interpreter and then messing with sys.path before importing the module, too.
but just using different venvs with different interpreters is the easy mode solution. Heck, that could potentially even let different plugins require different versions of Python.
I knew about this but i am extremely perfectionistic and since i'm using the decorator as a way of creating test functions, using my unit testing library. it would have been really annoying having to write the_setting=False every time you wanted to disable a setting
Yea just concerned about performance with subprocess. Much more expensive to start a full python process
Another option is to restrict decorator such that it cannot be applied to function without calling. So @deco() is good, @deco - bad
In this case you can use arbitrary arguments, including positional and callable in first place of *args
Again, i knew about this, but i'm just wayy too finicky about this kind of stuff. sorry
it's all good though, i found a way that works for me eventually
I have a template that I used to recreate a CPython bug that shows how to correctly (IIRC) use subinterpreters if you wanna go that route: https://gist.github.com/juliusgeo/da7eb99440161a5d0227b1d516688545
thanks!
true, but it's hard to get more isolated than this. You're paying a cost for that isolation. Other options will perform better, with the tradeoff of needing to try much harder to keep things isolated.
also if these plugins are going to be running for a significant period of time, the performance hit from spawning new processes won't be that bad
Is it possible to send object from one interpreter to another?
For example, one interpreter creates some object (with no references to it), returns it to C level, and then it is passed to other interpreter.
no
the C level is isolated by interpreter (not subinterpreter though) right?
I don't think I understand what you mean by that
couldn't you pass objects between subinterpreters if they have the same parent interpreter?
you can't pass objects between subinterpreters at all
each interpreter has its own set of objects, you can't move an object from one interpreter to another
the way I interpreted the original question was you would DECREF in one interpreter, convert to a C object, then send that object to another interpreter
like using PyInterpreterState_GetDict
I think almost every nontrivial object is tracked somewhere inside garbage collector, and you cannot extract it from there.
Also object is stored inside arena, and it can cause problems if other interpreter will try to deallocate that arena
I think if you want to send objects between subinterpreters, you should treat it as a form of interprocess communication. It's more difficult and expensive than just giving one subinterpreter a pointer to the other's object. But that's by design; it keeps the subinterpreters isolated from each other so that they don't need to share a mutex.
in my use case (Blender with multiple add-ons), I don't think anything has to move in between the subinterpreters. Just between the main process and individual subinterpreters
I guess the same point still applies
yep, it does
@feral island I just put a comment on the pull request for the fix from last night but i found an issue where the buffer passed into release_buffer is not properly tracked
so you can use it to access the memory of the exported buffer even after that memory is freed
>>> class B(bytearray):
... def __release_buffer__(self, buffer):
... B.leak = self.clear() or bytearray()
... B.backing = buffer
...
>>> b = B(bytearray.__basicsize__)
>>> m = memoryview(b)
>>> m.release()
>>> B.leak
bytearray(b'')
>>> B.backing
<memory at 0x109d1f700>
>>> B.backing.cast('P').tolist()
[1, 4454897440, 0, 0, 0, 0, 0]
>>> B.backing.cast('P')[2] = -1
>>> len(B.leak)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: <built-in function len> returned NULL without setting an exception
>>>
probably need to track down where this view is being exported and ensure that its ownership is tracked properly from the beginning
thanks, will take a look
maybe https://github.com/python/cpython/blob/main/Objects/typeobject.c#L9068 should explicitly .release() the memoryview because its underlying buffer won't remain valid
Objects/typeobject.c line 9068
Py_DECREF(mv);```
The memory view would still be unsafe inside of __release_buffer__
right, I may need to swap the order of calling the Python and C release slots
I think this would still be an issue due to the memoryview not being owned properly.
Inside of release_buffer you could reexport that buffer with memoryview.cast or similar and then even an explicit release wouldn't be enough
you're right. I concluded that there was no way with existing memoryviews to get this to work, so I am adding a new "restricted" mode to memoryview that doesn't allow any way to create a new memoryview off the existing one
does the implementation not allow for the buffer passed there to have proper ownership? that might be easier then a restricted memoryview mode
I don't think so. We'd have to "resurrect" the buffer, which I don't think the buffer protocol generally allows
where is the buffer created?
at various places in C code
- i meant the buffer that is passed into
release_buffer
restricted mode wasn't that hard to implement actually either
trying to figure out why that view doesnt have proper ownership
oh it's passed to the slot
it's created from PyMemoryView_FromBuffer
all we have is a Py_buffer*
oh its made here: https://github.com/JelleZijlstra/cpython/blob/pep688fix/Objects/typeobject.c#L8068-L8088
maybe alternatively we could do like PyMemoryView_FromObject(buffer->obj), but that feels like it's opening an even bigger can of worms
because then we're requesting a new buffer just to release the buffer
that's for __buffer__, not __release_buffer__
yea thats where the buffer being passed into release is made right?
i am trying to find the line of code that is resulting an an un-owned memoryview being passed into __release_buffer__ in this code: #internals-and-peps message
look at releasebuffer_call_python
Objects/typeobject.c line 9119
mv = PyMemoryView_FromBuffer(buffer);```
it almost feels like to do this elegantly you would need to redesign the buffer protocol
not something I'm signing up for ๐
fair enough, i'm just wondering if it would be more worthwhile moving forwards to do that before implementing this PEP
I think the "restricted memoryview" solution isn't too bad. It's not extremely invasive, and we can always relax the rules later if someone has a use case and a way to do it safely
I'm not sure what use cases people would have for overriding __release_buffer__ in Python anyway
my motivation is just to make buffers representable in the type system
i have the beginnings of an idea on how to implement this pep with less of the memory risks we are hitting now.
implement a new type ownedmemoryview that has an extra property owner which is the python exporter, and obj would point to the object from the memoryview returned by __buffer__. It would implement a bf_getbufferproc that creates Py_Buffer* that has its obj member set to the ownedmemoryview instance. then in slot_bf_releasebuffer you can check if the type of PyBuffer->obj is ownedmemoryview and call the respective __release_buffer__ with the values stored on the ownedmemoryview instance. then, regardless afterwards you can call the bf_releasebuffer of the obj properly.
isn't that basically what buffer_wrapper does?
the trouble is that sometimes things don't go through the __buffer__ codepath, so we can't make a wrapper
e.g. if you only override __release_buffer__
but you should still have PyBuffer->obj to make a proper owned memoryview right?
I don't understand what you mean
the current implementation has that too. but I think to make a "proper" memoryview you'd need to re-call the bf_getbuffer slot
might need to do that to ensure safety in all of the edgecases
because for example, even with a restricted memoryview you could still clear the underlying buffer because the exporter (bytearray) does not know it exists
( I assume, I do not know how you implemented the restricted memoryview )
this call happens before we call bf_releasebuffer on the underlying object, so it knows it still has a reference active
i can't think of any situations that would break the new setup
thanks for checking! I pushed another commit with some over-paranoid checks
(under class TCPServer(BaseServer):)
https://github.com/python/cpython/blob/472938316a85c706c06ad1b3727a205d5bffcb1f/Lib/socketserver.py#L450-L452
shouldn't that be a super() call? was super() not a thing wayyy back in time or could there be some reason to do it like that instead?
(BaseServer.__init__ doesn't does anything too special either, and it does not inherits any other classes)
Lib/socketserver.py lines 450 to 452
def __init__(self, server_address, RequestHandlerClass, bind_and_activate=True):
"""Constructor. May be extended, do not override."""
BaseServer.__init__(self, server_address, RequestHandlerClass)```
there was a point in time before super() existed, where explicitly calling your base class's __init__ was the thing to do
You still have to do that in JS
Call supers constructor at least I wasn't paying attention
when I run code as a module(with -m) why printing __builtins__ type is dict and when run as file its type is module? (py311_venv) E:\co\tpy>py -m tpy <class 'dict'> ... (py311_venv) E:\co\tpy>py tpy/tpy.py <class 'module'> ... (py311_venv) E:\co\tpy>
structure: E:\co\ ... tpy\ ... tpy\ __init__.py tpy.py __main__.py ...
__init__.py:```py
from .tpy import main
all = ['main']
version = '1.8' \_\_main\_\_.py:py
from . import tpy
tpy.main()```
it becomes annoying when doing some esoteric stuff(code doesn't run everywhere) and I am making a REPL and I want _builtins_ to be a module
By default, when in the
__main__module,__builtins__is the built-in module builtins; when in any other module,__builtins__is an alias for the dictionary of the builtins module itself.
โ https://docs.python.org/3.10/reference/executionmodel.html?highlight=__builtins__
Probably just an implementation detail. You're not supposed to touch __builtins__ anyways.
well I am making a REPL and for it's namespace I want the _builtins_ to be module instead of dict
tell us more about why it's important to be a module.
in order to mimc namespace of default REPL
as the author of a tool that tries to emulate default execution, i can say: that way lies madness.
and for running this code
... and there is madness
I am the madness
basically I don't wanna change how code executes in stock REPL and my REPL
and that's the only reason I want _builtins_ as module
is there anything I can do?
you can make a new module object, populate it from the dict, and define it as __builtins__.
but i recommend not driving yourself crazy like this
import builtins as __builtins__?
its possible to do that way?
I think that should work, yes.
let me try
seems to work
heres the namespace now is it ok?
namespace: dict = {'__builtins__': __builtins__, '__name__': '__main__', '__doc__': 'Automatically created module for TPython interactive environment', '__package__': None, '__loader__': None, '__spec__': None, '__annotations__': {}}
namespace_local: dict = {'__builtins__': __builtins__, '__name__': '__main__', '__doc__': 'Automatically created module for TPython interactive environment', '__package__': None, '__loader__': None, '__spec__': None, '__annotations__': {}}```
goofy ahh executor functionpy try: eval_return = eval(code, namespace, namespace_local) if eval_return != None: print(repr(eval_return)) err = False except: run = True if run: try: exec(code, namespace, namespace_local) err = False except Exception: exc() err = True
If by "ok" you mean "behaves like a normal REPL", then no: globals and locals should be the same dict.
well aren't they different dicts?
In [1]: globals() is locals()
Out[1]: True
ok
done
try:
eval_return = eval(code, namespace)
if eval_return != None:
print(repr(eval_return))
err = False
except:
run = True
if run:
try:
exec(code, namespace)
err = False
except Exception:
exc()
err = True```
```py
namespace: dict = {'__builtins__': __builtins__, '__name__': '__main__', '__doc__': 'Automatically created module for TPython interactive environment', '__package__': None, '__loader__': None, '__spec__': None, '__annotations__': {}}
[1]-> globals() is locals()
True
[2]->```
You still need to do exec(code, namespace, namespace), otherwise it inherits the globals of your REPL's code, right?
though I never understand how locals() works
>>> def a():
... b=1
... return locals()
...
>>> def b():
... return locals()
...
>>> a()
{'b': 1}
>>> b()
{}
>>> locals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'a': <function a at 0x00000215109004A0>, 'b': <function b at 0x0000021510E572E0>}
>>> ```
welp
[2]-> def a():
[:]-> b=1
[:]-> return locals()
[:]->
[:]->
[3]-> def b():
[:]-> return locals()
[:]->
[:]->
[4]-> a()
{'b': 1}
[5]-> b()
{}
[6]-> locals()
{'__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__', '__doc__': 'Automatically created module for TPython interactive environment', '__package__': None, '__loader__': None, '__spec__': None, '__annotations__': {}, 'a': <function a at 0x0000020F10D26B60>, 'b': <function b at 0x0000020F10D26E80>}
[7]->```
the behavior is same but ยฏ_(ใ)_/ยฏ
Right, but what's in globals() afterwards?
Ah, I guess it doesn't happen if you have __builtins__ set already
[7]-> globals()
{'__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__', '__doc__': 'Automatically created module for TPython interactive environment', '__package__': None, '__loader__': None, '__spec__': None, '__annotations__': {}, 'a': <function a at 0x0000020F10D26B60>, 'b': <function b at 0x0000020F10D26E80>}
[8]->
>>> globals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'a': <function a at 0x00000215109004A0>, 'b': <function b at 0x0000021510E572E0>}
>>> ```
exec() and eval() set the __builtins__ key in your globals dict if it's not already there I think
yeah they do but they still set it as __dict__
and that's a problem
what do you mean?
What @feral island said. The issue I thought I saw doesn't exist.
so currently in code there is no issue right?
and everything seems to work fine
----------------------------------------------------------------------
AttributeError in Cell 9 Traceback (most recent call last)
AttributeError: module 'builtins' has no attribute '__getitem__'
[10]->``` 
that's unexpected
what code is that running?
here
also import builtins as __builtins__
this, (it is expected to give IndexError)
oh wait I realized why it wasn't working
nvm
Hi, I have a question. running this code which behaves weirdly. It crashes and it is impossible to catch:
from itertools import chain
foo = []
for _ in range(10**6):
foo = chain([], foo)
try:
next(foo)
print("No Error")
except Exception as e:
print("Got Error: ", e)
# Process crashes, nothing is printed!
Presumably because the error occurs inside the underlying C call. My question is, is this intended design (ie. known about and not going to be changed), or a bug? And if it is a bug, is it reported somewhere already?
I believe these are considered bugs. chain doesn't check for recursion error, which is probably what's happening here.
ok, thanks
This sounds like https://github.com/python/cpython/issues/102356 so it may have been fixed recently
Crash report The in-built function filter() crashes as the following: i = filter(bool, range(10000000)) for _ in range(10000000): i = filter(bool, i) Error messages Segmentation fault (core dumped)...
hello why can't i post in #1035199133436354600
you need to !close your open help thread first -> #1105326652294176848
<@&831776746206265384> cross posting scam
Thanks!
I noticed weird wording in typing module docs (in TypedDict section):
Deprecated since version 3.11, will be removed in version 3.13: The keyword-argument syntax is deprecated in 3.11 and will be removed in 3.13. It may also be unsupported by static type checkers.
https://docs.python.org/3/library/typing.html#typing.TypedDict
what weird wording?
"Deprecated since version 3.11, will be removed in version 3.13" - this is repeated twice for some reason
Deprecated since version 3.11, will be removed in version 3.13: The keyword-argument syntax is deprecated and may also be unsupported by static type checkers.
this looks better for me
you could maybe open an issue or directly submit a PR to python/cpython
python/cpython#101441 wow
Oh, that pep was accepted
And implemented
@feral island found a small issue with the buffer protocol pep involving ReleaseBuffer, there are a few places where it is called with an exception already set, I think it is meant to be exception agnostic. But with __release_buffer__ those standing exceptions are raised in weird places. (one example is https://github.com/python/cpython/blob/main/Objects/bytearrayobject.c#L555-L558)
Objects/bytearrayobject.c lines 555 to 558
res = bytearray_setslice_linear(self, lo, hi, bytes, needed);
if (vbytes.len != -1)
PyBuffer_Release(&vbytes);
return res;```
oh good call, I guess we have to PyErr_Fetch/Restore around the call to the Python __release_buffer__?
>>> class A:
... def __buffer__(self, flags):
... return memoryview(bytes(8))
... def __release_buffer__(self, view):
... pass # do not need to do anything here, just needs to exist
...
>>> b = bytearray(8)
>>> m = memoryview(b) # now b.extend will raise an exception due to exports
>>> b.extend(A())
Exception ignored in: <__main__.A object at 0x10c19c530>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: <function A.__release_buffer__ at 0x10c306160> returned a result with an exception set
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: <method 'extend' of 'bytearray' objects> returned NULL without setting an exception
>>>
``` heres a simple replication example
yea that would solve the issue
you already have it setup to ignore errors in __release_buffer__ so that would not cause any new issues anywhere else
yes
would have submitted an issue and a pull request but i'm busy with end of my semester, figured pinging you would suffice
thanks! I'll make a PR later today
on a debug build it actually aborts, nice ```>>> b.extend(A())
Assertion failed: (!PyErr_Occurred()), function _PyType_Lookup, file typeobject.c, line 4707.
zsh: abort ./python.exe
!pep 703 was submitted to the SC :)
I dont get the point. It says that ~10 python threads is a bottleneck. Why? If they are calling C-side, GIL is released, so at the same moment several C-threads (threads that run C code) and at most one Python-thread can run. So if code on python side is very fast (or time on C-side is very big), there is no performance issue
Yes, but if they're not calling C-code, or the C-code doesn't release the GIL*, then there is.
it sounds like your question is, "Why is the GIL a problem?" or have I misunderstood?
ah yes
Goday I Learned
No, i know why gil is the problem. Im just trying to understand beginning of this pep.
There is also this phrase at the end:
the overhead of acquiring and releasing the GIL typically prevents this from scaling efficiently beyond a few threads.
I dont think it is true. I believe it is possible to acquire/release gil at least thousands times a second. If ratio "(time with released gil)/(gil acquire+ gil release time)" is low then there will be no problem
maybe it comes down to how many times per second you;d have to acq/rel the GIL for it to be performant?
it's odd to say, "if CONDITION then there will be no problem" after saying you understand that the GIL is a problem.
GIL prevents several python threads from executing simultaneously
If almost all work is happening with released GIL, then there is no performance issue.
yes, of course. Most programs are not like that though. It seems like you are trying to say that the GIL isn't a problem, but you've already said you know it is. I don't understand the point you are trying to make?
It sounds like you're saying "the GIL isn't a problem when there's low contention for the GIL", which is true, of course, but most multithreaded programs have fairly high GIL contention.
Consider a web server with a JSON API, for instance. That's primarily an IO bound program, but every request that comes in needs to be JSON parsed. That's something that needs the GIL held, so no matter how many threads you've got, a second thread can't start parsing the request to execute until the last request has been parsed. Likewise, the GIL needs to be held while turning the response from a Python object into a JSON string, so one thread may need to wait for another to finish serializing a response before it can serialize its own.
The GIL needs to be held:
- When executing Python bytecode through the Python VM
- When creating Python objects
- When calling methods or accessing attributes of Python objects
- When saving or dropping a reference to a Python object
- When calling almost any function from the CPython C API
Now, that's not everything. There are lots of things you can do that don't require the GIL to be held. But that is a lot of things, and multithreaded Python code generally needs to do some (if not all) of those things in every thread
also, a lot of things that could be done with the GIL released are done with it held instead. There's two major reasons for that:
- Releasing and re-acquiring the GIL is relatively expensive. It can lead to unnecessary context switches and cache thrashing under contention. The less expensive the operation to be done is, the less valuable it is to release the GIL before doing it and reacquire it after. For instance,
hashlibonly releases the GIL if the string it's asked to hash is at least 2048 bytes, and keeps it held otherwise. It could unconditionally release it, but for short strings that's more likely to hurt performance than to help it. - Even when an operation is slow enough to justify releasing the GIL, libraries might hold the GIL anyway because releasing and reacquiring the GIL introduces extra complexity. It's more lines of code that obscures the important stuff the library is doing, and it requires more error handling (you need to ensure that the GIL gets picked back up even if an exception is thrown, for instance). And of course, it's unreasonable to expect extension authors to even know which things are slow enough to justify releasing the GIL and which aren't. That's fundamentally something based on heuristics...
https://blog.danslimmon.com/2016/08/26/the-most-important-thing-to-understand-about-queues/ is an interesting thing to consider here, too (when there's contention for the GIL, there's a queue of threads waiting to acquire it)
!pep 684 has been accepted, targeting 3.12
already been implemented
python/cpython#104210
That makes sense, thank you ๐
Another thing that I don't think has been mentioned so far is that Python does not have its own thread scheduler, it's all deferred to the OS. So that means that if you have 10 threads, if one thread is holding the GIL, you would potentially have to check whether the current thread holds the GIL 9 times in the absolute worst case (if the OS scheduler happened to schedule all of the non-GIL-holding threads before ever going back to the one that holds it)
a short example: ```python
from threading import Thread
import sys
from timeit import timeit
def thread_func(num_iter):
x = 0
for i in range(num_iter):
x += 1
def test_case():
num_iter = 10000
num_threads = 10
threads = []
for i in range(num_threads):
threads.append(Thread(target=thread_func, args=(num_iter,)))
for thread in threads:
thread.start()
for thread in threads:
thread.join()
sys.setswitchinterval(0.005)
print(timeit(test_case, number=10))
sys.setswitchinterval(0.0000001)
print(timeit(test_case, number=10))
0.06188675000157673
0.2476623330003349
```this is the output I get
setswitchinterval changes how long python waits before asking the thread with the GIL to drop it, so it simulates a lot of GIL contention if you put it super low
what things can you do that don't require it to be held?
A couple of things come to mind
waiting for
- a child process to die
- an IO operation to finish
- a thread to stop
- OS synchronization primitives
notably, you can also perform a non-python CPU bound operation (e.g. using some number crunching C library).
Also theoretically waiting user input, though that generally falls into one of the previous categories already
but all those things are only triggered by events that do require it to be held
what do you mean?
like an IO operation is triggered by the Python bytecode or a Cpython api function, and those do require the GIL to be held
yup, the general flow is
lock gil
compute arguments for IO operation in python and convert them into C variables
unlock gil
recv(whatever)
lock gil
wrap the result in python objects
unlock gil
this is one such example https://github.com/python/cpython/blob/3.10/Modules/_multiprocessing/multiprocessing.c#L110-L112
Modules/_multiprocessing/multiprocessing.c lines 110 to 112
Py_BEGIN_ALLOW_THREADS
nread = recv((SOCKET) handle, PyBytes_AS_STRING(buf), size, 0);
Py_END_ALLOW_THREADS```
Yes, the normal way of working with the GIL is that it's held "by default" when executing Python code, and dropped temporarily while performing an operation that doesn't require it.
The exception would be threads that are spawned by a C library instead of a Python library. Those will start off in a C entry point without the GIL held, and for those threads the pattern will be to temporarily acquire the GIL to do things that need it (like calling a Python callback provided by the user, or saving a reference to a Python object), and then drop it when you're done doing those things
Hi, I've been told my question could have more probabilities to be answered here so: https://discord.com/channels/267624335836053506/1107644388793929819. Thanks in advance
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
!e
try:
[][0]
except Exception as e:
t = e.__traceback__
print(t.tb_lasti)
print(t.tb_frame.f_lasti)
edit this actually works fine, see below
@quick trellis :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 8
002 | 96
guys am i stupid or what shouldn't they be equal
btw what i actually want to find is last instruction that occured this g() call
try:
g()
except Exception as e:
tb = e.__traceback__
# ... tb.tb_lasti is something else
tb_lasti should be correct. check if tb_lineno is what you expect
Gaiz need help with data extraction from website
not here
Anyone knows how to do that
ok i see now that the issue persists only across function calls, consider the following
!e
def g():
[][1]
try:
g()
except Exception as e:
t = e.__traceback__
print(list(t.tb_frame.f_code.co_positions())[t.tb_lasti])
@quick trellis :white_check_mark: Your 3.11 eval job has completed with return code 0.
(7, 7, 8, 23)
guess ill make an issue
no, they shouldn't
t.tb_lasti is captured at the moment when exception occurs, so it sontains some opcode position from this: [][0]
t.tb_frame is the current frame, so t.tb_frame.f_lasti contains position of opcode that is currently executing: print(t.tb_frame.f_lasti)
[][0] and print(t.tb_frame.f_lasti) are obviously different places, so you get different numbers
python/cpython#103764 PEP 695 is here
this all happened before 3.12 beta 1
yay!
Inlining comprehensions was un-accepted for 3.12 :(
huh
oh it's a release blocker
damn
Because of this, basically:
foo = 23
class Example:
foo = 42
bar = [foo for _ in range(1)]
Example.bar is [23] in 3.11 and below, and would have been [42] now.
ok so the PR is still a draft
I think it was merged
and it doesn't really remove everything
python/cpython#104519
Ah, I was talking about the previous one that added this. I see, so there's still hope for the rest?
python/cpython#104528
why is deque implemented as a linked list of arrays and not a ring buffer?
recent discussion in pygen on tradeoffs of each #python-discussion message
ah i c
yeah I came here exactly because I saw that Rust uses a ring buffer 
||btw||
hey I know that guy
isn't he like, boring? ||jk||
<@&831776746206265384> I thought the NFT scammers would be done by now
NFT scammers
sounds like a tautology
not a tautology because there could be other kinds of scammers, like Bitcoin scammers
A pleonasm!
has anyone ever suggested explicit is_empty functions for str and the built in containers
I'm pretty tired of truthiness, and len(s) == 0 feels very clumsy
has this been attempted? Would it just get shut down immediately?
There's also s == "" i suppose, but what's wrong with the truthiness check? if your program is typed, and/or you're also checking the object's type beforehand, then there's no issues with ambiguity or anything
I just prefer to be explicit. Ironically, I just recently wrote a program and had a bunch of such checks, and was lazy, and used truthiness
and almost instantly got burned, because I assumed that Path("") was falsy
explicit is better than implicit, right ๐
whether or not pathlib paths implemented __bool__ wasn't even something I can considered up to this point, so I probably would've made the same mistake, i'll give you that ๐คท
but IMO, string truthiness is something that most intermediate python devs are aware of or at least have heard of, so the solution of doing if str(some_path): would be explicit enough
yeah, I'm not advocating for path to implment truthiness
but if member functions were used or at least could be used more consistently then there wouldn't need to be a question about it, or opportunity to make a mistake
Yeah I would definitely prefer explicit stuff. I don't really see the point of the implicit coercion
I don't understand this. Presumably a .is_empty() method should then exist on all containers? How would that differ in behaviour from bool(the_thing), if you want to be explicit?
Because not everything will have this method
Whereas erroring on __bool__ is a bit strange, I have only seen numpy arrays do it
So it's like arguing that object.__bool__ should raise an error.
fun fact, object.__bool__ doesn't exist
bool(the_thing) is terrible because it basically always works
and if there's no implementation it returns
true
if bool(x) adds no value at all over if x
that's not the sense of "explicit" that's valuable
if numpy arrays decided to deliberately error on bool(x) that actually tells you a lot ๐
Yeah, the problem is that the "emptiness check" (__bool__) is implemented for stuff which doesn't have the concept of empty/non-empty
Bool will use len if no bool dunder is present
<@&831776746206265384> I am on mobile ๐, please do something^
yeah, that's a good way to put it, thank you.
Also the fact that it's not really consistently an "emptiness" check, I would say. You could argue that a container and a string being empty are the same but it's a lot harder to make that argument that those are the same as None
A string is a "container" is it not?
Like, I still think a named function would be better, regardless, but __bool__ erroring by default would already be a huge improvement
(but obviously, that's not going to be changed)
idk, it depends how you look at it
either way, my main point wasn't to draw a line between strings and containers
Yes, if you mean that the emptiness of a value can be contextual. For example, if you accept a list[Foo] | None it has two different ways of being "empty"
yeah, that's yet another issue with it
I mean, idk, I feel like truthiness just has, so many gotchas and downsides, and there's literally no benefit but saves a few characters ๐คทโโ๏ธ
obviously it cannot be eliminated in python but at least facilitating truthy-avoiding coding styles seems reasonable
And shortcuts like or make it easy and convenient to write very subtly wrong code in these cases
yeah. I tend to dislike that trick in python.
it's a lisp classic
(or first-option second-option third-option) etc
Yeah, I have to disagree there. It's just far to useful for stuff that returns something or None that any object is truthy by default.
Yes, but how often does the difference matter?
That's a feature, not an issue.
how is it "far too useful"
it's literally saving you a few characters
"stuff that returns something or none"
x = returns_something_or_none()
if x is not None:
...
and when someone writes if x: you may have to figure out whether they knew that "containers" could be returned from the function and intended to skip empty containers
or if they assumed that the function wouldn't return a container
or they assumed that returned containers are always nonempty
is argumentless super() using something like frame->locals[0] (this is a C pseudocode) to get first argument passed to a method?
Objects/typeobject.c line 10340
static int```
whats the difference between that and this
https://github.com/python/cpython/blob/main/Python/compile.c#L4751
Python/compile.c line 4751
static int```
that happens at compile time, it's a new optimization in 3.12
oh damn
you can see that code essentially translating super() into super(__class__, self). It then uses the opcode LOAD_ZERO_SUPER_METHOD which will load the method object directly if possible (bypassing creation of the super object) and otherwise (e.g., if super has been shadowed) falls back to regular execution
though maybe that's a virtual opcode that gets translated into something else too? didn't follow this feature too closely. the interpreter code for super() is around https://github.com/python/cpython/blob/main/Python/bytecodes.c#L1597
Python/bytecodes.c line 1597
inst(LOAD_SUPER_ATTR, (unused/1, global_super, class, self -- res2 if (oparg & 1), res)) {```
Who is the "someone" in this scenario? I want to be able to write if x:, and often I don't care why the object is not falsy: could be None, empty list, False, ..., doesn't matter, because I only care about the "positive" case here.
It's not clear to the reader of the code which of those were actually considered
"someone" means, another person on your team, or even you, 6 months ago
Like it's clear from your response you're thinking mostly about how code writes, rather than mostly how code reads
No, I often see code written by e.g. coworkers that do if x is not None: and I have to think "is it really important to check for None here, or does this person just not like duck-typing?"
Whether reading or writing the code I often don't need to consider this (see previous message). Does re.search return False, None, or some falsy object when it doesn't match, and is it the same for re.findall? No idea, don't care.
Err, no, you don't have to think that
They checked for None
that's thepoint of the code, to check for None
But maybe it shouldn't check for None..
if x, you don't know if their intent was to check against None, against empty, or against both
....
this is just disingenuous
yes, any code could just be completely wrong
the issue is that if you're using if x wherever it's applicable, you're going to be using it in several overlapping situations
If this were to change, writing the equivalent of if x: would become much more complicated. If you want to check for one exact kind of truthiness now, you already can, with e.g. == "", which is readable and very clear.
I'm not suggesting anything change that would affect if x, the truthiness ship has sailed in python (though it's not likely to sail anywhere else)
just inquiring about adding a member function
you might notice that basically all languages have .is_empty() or equivalent functions for strings and lists
so, it's not exactly a minority opinion that .is_empty() is more readable than == ""
(or len(s) == 0)
Is that so, in other dynamically typed languages? I checked only strings, and I see Ruby does, JS doesn't, PHP doesn't (as far as I can tell, there's empty but it does something different), Perl doesn't, and then I stopped checking.
(I must also note that I'm not arguing that I share my opinions with the majority of Python users)
Whether there's an .is_empty sorta method seems to depend on how the language actually treats strings - if they're mutable, there's pretty much always one, whereas if they are immutable then it kinda just varies
I wouldn't have a problem with str.is_empty, it just seems rather pointless. If you think s == "" isn't clear, I don't know what to say. (Is it beautiful? No.)
if JS and PHP are doing the same thing then it's probably safe to do the opposite
In JS, only numbers, strings, null and undefined can be falsey
s.is_empty() is also better than s == "" because, s == "" will run fine no matter what the type of s is
if I think something is a string, then I want to call a string method, that's the whole point
that at least gives me a good chance of an error, if my expectaitons are violated
On top of all that there's the static typing dimension to this
if s will basically never error and it will basically never trigger a mypy error
does this person just not like duck-typing?
I'm actually not sure whether I like "duck typing"
well, depends on what you mean by it
could you give an example of such case? I usually know what kind of objects I'm working with
i'm actually gonna +1 that, i'm not a huge fan of relying on truthiness if i don't actually concretely know what type i'm dealing with (or someone reading the source wouldn't be able to quickly know) - but instead, its actually fairly useful if you do know the types you're dealing with
like, where would you want to be polymorphic over the type but still care whether it's truthy
fairly useful meaning, again, it saves you a few characters right?
like if there's literally no benefit except saving a few characters, i think reader clarity easily trumps that.
if there's some benefit other than golfing I'd like to know what it is
hey now, that's not trivial
my cost of ownership for an advantage kinesis, well, I'm afraid to calculate
responding to your joke probably cost me a quarter
unfortunately you can't send me an invoice
๐ข
def safe_frobnicate(x, items=None):
if items:
for item in items:
pre_frobble(item)
x.frobnicate_for_real()
so in this case you explicitly know that items is nullable, and a container, so why not be explicit about that
how is that better than if items is None?
items could be an empty list, or False, or whatever else. I admit it's not a good example since the for-loop would still work with the empty list, I'll try to find a real-world example.
def safe_frobnicate(x, items=None):
if items:
pre_pre_frobble()
for item in items:
pre_frobble(item)
x.frobnicate_for_real()
๐คทโโ๏ธ
but, again, in terms of reading this, it's alot clearer to do
saving a few characters is still an overall benefit, and at no cost when anybody reading it is aware of truthiness (or at least will know very quickly after, since its not that complex of a concept) ๐คท
if items is not None and not items.is_empty()
but items might not be None, but still be falsy. You just invented a new protocol that every iterable should implement..
there is cost though... we've discussed up and down how there' cost. forming expectations that are not violated
like, i've used the strick of doing string_that_may_be_empty or default_string quite a lot, which also has the added benefit of not evaluataing that string twice like a ternary operator would
yes. And at least that protocol is only implemented when it makes sense.
Compared to a protocol that is automatically implemented for every object
whether or not it makes any sense
like, truthiness is bad enough, but like, python has the worst possible version of it. automatic opt in truthiness.
I actually had a production bug with code like this. Namely passing an empty iterator as items
it is truthy! but also empty!
๐
great example
the relationship between iteration and truthiness isn't even consistent out of standard python types
At least if you did is_empty and iterators (as opposed to collections) did not implement that
you would get an immediate error
instead of the program quietly running through an unexpected codepath
I suppose that's one kind of confusion that type annotations can solve, you can require items to be None or a Collection[Stuff]
Yeah, but if items is annotated as iterable, this still doesn't actually solve the problem
Yep, that was the problem
i guess i'm just yet to ever fall into the trap of relying on the truthiness of something that hasn't actually implemented __bool__ itself, so its never been an issue to me - though i can easily see myself probably doing it at some point
def safe_frobnicate(x, items: Optional[Iterable[Whatever]]=None):
if items:
pre_pre_frobble()
for item in items:
pre_frobble(item)
x.frobnicate_for_real()
containers are still iterables
Yeah that's what I had at work, I think
so even with code that type checks perfectly
this code behaves in very different ways for non-container iterables
and iterables
truthiness is, after all, a remnant of python's past for the most part
this is exactly why it's wildly bad to have truthiness implemented by default
back in the super early versions before the bool type was a thing, boolean logic was done using the integers 0 and 1 (which is why bool is a subclass of int for compatibility reasons)
like, C++ has truthiness but in practical terms it' snever caused me 1% of the grief of python's truthiness
that's also why bool is a subclass of int and lets you do cursed stuff like adding booleans
I'm totally fine with that, I understand that these things evolve over time, etc
oh yeah I didn't finish reading
but then why do style guides still push truthiness so hard, so often
if people just said "yeah, this is historical, please do explicit is None, and added is_empty() checks
I'd be perfectly happy
I've seen advice to be explicit in Brandon Rhodes's talk https://www.youtube.com/watch?v=S0No2zSJmks
though I think he had a different rationale in mind
PEP 8 still recommends using the implicit stuff though
gross
if len(users) is a pretty decent recommendation for containers. I probably still prefer to just do len(users) > 0 but at least len(users) is clearly an integer
and this enforces users to support len, which is really the most important part
(enforces it/communicates it to the reader)
I can only really circle around to just like, the fact that its never been an issue for me when reading/writing code, because its generally only ever used (and implemented) by the built-in sequences and numbers
"a feature so intuitive that it works well as long as it's only touched by a small handful of built-in types"
well, it's implemented by everything
and i'm not gonna argue too much in favour of object.__bool__, if that was removed i wouldn't be too upset
unfortunately nothing can be removed for backwards compat etc
the classic gotcha is if self.is_done instead of if self.is_done()
๐ฆ
was about to write that @feral island
mypy could have a no-truthiness mode
might not even be that hard to implement, tbh
it does in part
I think for functions among others
welllllll, you can't really remove it fully
why not?
Because functions you call can rely on truthiness. Like filter(None, your_things)
Ah this is fantastic:
Warn when the type of an expression in a boolean context does not implement bool or len. Unless one of these is implemented by a subtype, the expression will always be considered true, and there may be a bug in the condition.
thanks jelle
when you say "remove it", obviously at the language level it's a huge breaking change
so you can't even regardless of that filter example
yeah, this should catch a lot of stuff
mypy should be able to catch that filter example, in principle
not sure if it actually does
like, mypy has all the information it needs
but this would still pass
because items can be falsey
yep
truthiness is the devil overall
it's just different degrees
everywhere implemented truthiness is definitely the worst of it
the best is to just not have it in you rlanguage because the benefits are so close to 0, if not negative
just one less thing to deal with
but that is obviously too late
conclusion: use <insert_language> btw
definitely
Lua is different from both Python and JS... In Lua, everything has a truthiness, but only false and nil are falsey
so 0 and "" are truthy
that seems to sorta link in with lua's disregard of 0 or emptiness being special, like how there's 1-indexing and the inability to get errors from accessing undefined table entries
in JS you also don't get an exception when accessing an array/map out of bounds
i wouldn't be upset if obj.__bool__ emitted a warning of some kind, like how numpy/pandas objects do
having multiple languages that do truthiness all with slightly different takes is another thing that also makes this painful
or even multiple libraries within python
somebody told me about a lib that had an operator bool but deprecated it
if we can do this for super(), then we can also do this for other built-ins?
What if it is True? Then you will get TypeError
!zen implicit
Explicit is better than implicit.
Good! If it's True, a TypeError seems like the sensible thing to throw.
But if it's an iterator that has no sensible truthiness operator, you don't get a type error ๐
Just behavior worthy of a JavaScript "wat" talk
Why should I get a TypeError then? If it's a generic iterable or iterator then there's no problem, as we can iterate over it.
For the if check
For a regular container, if it's empty, an if check fails
If it's an empty iterator , if it's empty , an if check succeeds
mypy has a check for this fyi (it's documented just after the one I linked to earlier)
Nice!
Our mypy is so behind, I really want to find time to upgrade it and "tighten up" some of the checks but it's not easy
With PEP 695, diffs like these will become common for a while to reduce boilerplate for typing
-from typing import TypeVar
from collections.abc import Iterator
-
-K = TypeVar('K')
-V = TypeVar('V')
-def pick(
+def pick[K, V](
dict_: dict[K, V],
keys: list[K],
) -> Iterator[tuple[K, V]]:
for k, v in dict_.items():
if k in keys:
yield (k, v)
already did it to typing itself ๐ https://github.com/python/cpython/pull/104553/files
Guys think this funky? I want to use __class_getitem__ to act as a factory for creating a type alias with attached metadata. Then do something with newly generated type alias at runtime. No idea how mypy will react, cannot test.
from typing import TypedDict, Annotated, TypeVar
import pandas as pd
Column = str
Dtype = TypeVar("Dtype")
class Table:
def __class_get__(cls, col_info: dict[Column,Dtype]):
return Annotated[pd.DataFrame, col_info]
UserData = Table[{"name": str, "age": int}]
Mypy will yell at you because this class is not generic
Did a similar thing in mine
https://github.com/Keyacom/likepep403/commit/ec456fc9d1da1fb8bbf6dd57b040437e44d77e4d
So if I subclass typing.Generic with TypeVar DF then it will work?
requires-python = ">=3.12" is brave half a year before 3.12 is actually released ๐
this is really nice!
thanks for sharing
Hey! So, I've been thinking about this for quite a while and Ive asked some experienced people but nobody quite seems to know either.
I know about name mangling and the concept behind it, but I dont think thats related here (?)
Basically, if you have a look at the builtins.pyi, you will often times see function parameters defined the following:
def pow(__x: int, __y: int, __z: int) -> int: ...
def getattr(__o: object, __name: str, __default: None) -> Any | None: ...
Now these are inconsistent too, sometimes there no underscored for same parameters in different overloads and sometimes its only one.
Really curious what the purpose of this is? I know it probably isnt of much importance, I just cant stop questioning it
you forgot to include the actual question (which was the title in the help post)
What do the double underscores indiciate in the builtins stubs?
oww my bad ๐
this means the parameter is positional-only
shouldn't it use / instead?
we can't use the native syntax for that (/) for technical reasons
ahh! Yea we thought it would use / if it was for positional only purposes, but I guess that was it afterall then
thanks for clearing that up!
After Python 3.7 goes EOL (pretty soon), I think we can switch to the / syntax since PEP 570 was in 3.8
Okay finally thats off my mind haha. Thank you
The consensus is that we should wait for 6 months after Python 3.7 goes EOL before we start using syntax that only works in Python 3.8 (https://github.com/python/typeshed/issues/10113). So we'll probably start using / for positional-only parameters in early January 2024 over at typeshed.
The convention around using __ for positional-only parameters was introduced in PEP-484: https://peps.python.org/pep-0484/#positional-only-arguments
Some functions are designed to take their arguments only positionally, and expect their callers never to use the argumentโs name to provide that argument by keyword. All arguments with names beginning with
__are assumed to be positional-only, except if their names also end with__
:incoming_envelope: :ok_hand: applied timeout to @worldly schooner until <t:1684961099:f> (10 minutes) (reason: role mentions spam - sent 4 role mentions).
The <@&831776746206265384> have been alerted for review.
That's interesting. Why not just immediately make a new PyPI release of typeshed with a python_requires=">= 3.8" and tell everyone vendoring to make sure not to pull in that new version without also updating their python_requires? Is there something that makes it hard to do that?
I guess I'm not seeing who benefits from being able to install new stubs that support being parsed by 3.7 at this point, and what would be lost by just saying "sorry, no more updates for you" to people who still need to be able to parse the stubs with a version that doesn't have positional only arguments
typeshed doesn't do releases by itself
well, not for the stdlib at least
we probably could be more aggressive in dropping support, but mypy still supports 3.7
Ah, right.
So are you insinuating that the mypy and typeshed releases are coupled enough that typeshed must support versions until mypy drops them?
There are no typeshed releases, only mypy (and other typechecker) releases that update their vendored typeshed
So if we drop 3.7 support, mypy won't be able to update typeshed while still supporting 3.7
Because they literally won't be able to parse it on 3.7
Right, I guess I'm asking: how does that vendoring work? Does each release grab the latest main, or are the updates explicit?
it's explicit. look in the mypy/typeshed/ directory inside the mypy repo
(though this is only the stdlib: third-party packages in typeshed are distributed separately)
I see. I guess your constraint is that, once you start merging PRs to add positional only args, mypy can't update its vendored typeshed without dropping 3.7 support, and you don't want to force mypy to make that choice indirectly because of something typeshed has done
Yes. Personally I'm also not that eager to drop support as soon as possible
The double underscores for pos-only args work fine
Though admittedly they confuse new contributors on a regular basis
I dunno. "If you need to run mypy using Python 3.7, the last release of mypy that came before Python 3.7's EOL date is the latest version you can use" doesn't sound like a bad policy to me
Sure. But 3.7's EOL hasn't arrived yet
I expect mypy and typeshed will drop support soon after that
Sure, but it's in ~4 weeks, not the 6 months Alex mentioned
If new mypy versions that vendor new 3.7-incompatible stubs have a python_requires=โ>= 3.8", then pip3.7 install mypy would automatically install the old version that still supports being parsed by 3.7, and pip3.11 install mypy could install a version that still knows how to analyze 3.7 code even though the stubs can't be parsed with a 3.7 parser - right?
That is, mypy would be giving up running under a 3.7 interpreter, but not necessarily giving up analyzing 3.7 code
That's possible, yes. I think historically Jukka has been fairly hesitant to drop support for old versions, and he's in charge where mypy is concerned ๐
Fair enough. ๐
I typically think of people running linters as being more sophisticated users from an SDLC point of view, and more likely to keep up to date on interpreter updates. I'd have guessed that the portion of mypy's user base running the tool using a 3.7 interpreter would be minuscule. With the possible exception of projects that are no longer being actively maintained but still have CI running, but for those it wouldn't matter if they stopped getting updates to the mypy tool.
They could also be people maintaining large proprietary codebases that are slow to upgrade
They may not have a choice about their environment, though.
That's... Entirely fair.
I think at work we barely got off 3.6 in time
I once had to maintain code that needed to run simultaneously on 2.6 and 3.6. It was Not Fun.
I've still got some (C API) code that needs to run on 2.7 and 3.6 through 3.11... I haven't quite figured out how to split that so I can freeze the 2.7 part while everything else moves on.
It shares one header across all versions, as it's deployed today. ๐
Which database should I learn
Which is preferable for interacting with Python
If I learn python and want to learn web developement is django enough?
not really on-topic to this channel ๐
I would be OK with a more aggressive schedule personally, but I'm also OK waiting. I hate the __ syntax for positional-only arguments โ it's ugly, and I found it really alienating when I first started reading through typeshed's stubs, prior to contributing. But it also doesn't really cause any massive problems right now; it basically works fine. We've put up with it for many years now, so waiting another six months doesn't make much of a difference ๐
๐
try coding in notepad++ for 3 weeks; not a very fun experience to say the least.
idk i've coded in notepad++ for a year and counting
i've coded in npp also for about a year, it was good enough
it wasnt python, i used some very unpopular language, so there is no huge difference between editors (because they all have no idea what this language is and what to do with it)
2018
You should ask in a help channel, see #โ๏ฝhow-to-get-help . But yes, you should random.shuffle the list and then go through it from start to end.
but then why do style guides still push truthiness so hard, so often
Maybe it's because PEP8 suggests it ๐
Or maybe python programmers like code golfing and a fog of mystery
there's nothing like the moment of wonder the first time you saw this:
!e
print(False in [False] == True)
@ancient jackal :white_check_mark: Your 3.11 eval job has completed with return code 0.
False
yeah, this is atrocious
!e
unrelated but
from math import nan
print(nan == nan)
print(nan in [nan])
print([nan] == [nan])
@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | False
002 | True
003 | True
(False in [False]) and ([False] == True)?
yep
The first case being False is obviously correct, the second one feels like it's "fine" on a surface level, but still breaking the "nan shouldn't comapre equal to itself" detail, and the 3rd case seems like it blatantly violates that detail, has this been discussed somewhere as intended?
no clue
but nan not being equal to itself is honestly the core issue...
I know that's how the floating point standard works, but it's cursed
not really something I'm going to agree with you on. nan needing to be handled seperately is a good thing that prevents other errors, and I'm eying that list comparison as breaking that. If have have nan in a dataset, something is wrong with the data and it shouldn't be being used to compare with.
Could you give an example maybe?
nan in a dataset is an error of some sort. pretty much everything you do that implicitly involves the nan without explicitly correcting for it (probably by exclusion, but this could impact statistics) would be a problem
IIRC the reason NaN was defined to never equal anything is that the actual bits of different NaNs could be different
it's because NaN isn't a number, it's a signaling tool
and yeah, you can use the multiple different nans as part of that signaling
i don't think that's the reason because then they could have had equal-bit NaN's compare equal
Yeah that is true. And negative zero is special-cased anyway
but doing that, doesn't really give you much, then you have to think about ordering of NaN's.... it's just not going anywhere good
i will also say that in real data analysis, most times, if you're comparing two columns in a dataframe, you do want them to compare equal when they have nans at matching spots
I guess it all comes down to the fact that identity is a concept only present in Python land and the language explicitly specifies that identity is compared before value in both list equality and membership (nan in [nan] is also true) so you'd have to "lose" somewhere for the list check to fail, changing nan is nan to be false would probably break a very important invariant and changing the list semantics is probably too big of a change so it's just kinda what we have
but that should be properly handled by the library, the [nan] == [nan] thing is still wrong
that just seems like a bad specification though?
what does
that identity is compared before value in list equality
that seems totally incorrect to me
lots of languages have a notion of identity equality and I don't know of any where list equality does that
it would be silly for a list to compare unequal to itself I think
why does list do that btw?
It's not though?
it's only silly if its always silly for things to compare unequal to themselves
nan compares unequal to itself and relatively few people think this is silly
||I do think it is silly
||
list1 == list2 should mean exactly len(list1) == len(list2) && for x, y in zip(list1, list2): x == y
yeah, I would argue [nan] == [nan] being True is incorrect.
as an optimization, I suppose. And possibly for consistency with dict, which needs this special case because otherwise you could use nan as a dict key and then never remove it or access its value
making an IEEE implementation overtake your language design seems like a silly rabbit hole to fall down
yeah, probably an optimization
it's bad language design though ๐คทโโ๏ธ
Like:
identity is a concept only present in Python land
This is not correct, identity is present in basically all mainstream programming languages
but IG it doesn't really matter in practice, so whatever.
and this is a very unusual way to handle things
yeah, a is b implying a == b is... wrong
but if we consider that nan == nan being false is not silly, then I think [nan] == [nan] should also be False
I think two lists should compare identical if they contain the same objects even if you override those objects not to compare equal to themselves because they are the same list, I think breaking this invariant would be more catastrophic than the current behavious
it's just not how python works
how would it be "catastrophic"
and the phrase "compare identical" is very confusing here
i assume you meant compare equal
two lists are equal if their elements are equal and in the same order.
yes
I don't see anything "catastrophic" still
NaN and its related semantics seems really weird to me to be honest. It's like... trying to hack error handling into a number format
i have someones ip address what can i do
write it on a piece of paper
yeah, that's what it is. It is very hard to put error handling into CPUs, so IEEE tried to make sure people avoid doing the stupid decisions they made before (try a custom callback for every fp error) by making a slightly less stupid decision
oh my god, I Just discovered kotlikn has th esame behavior ๐
fun main() {
val x = Double.NaN
println(x == x)
println(listOf(x) == listOf(x))
}
prints false true
raku simply compares NaN equal to NaN, which is IMO a fair enough decision
well, that's exactly what it is, so that's why it feels that way ๐
I would say that straying from IEEE is a pretty bad decision
surprisingly, PHP is the one who does this correctly and compares [NAN] != [NAN]
Haskell does this, but that's because it doesn't really have a notion of identity
They don't have the same model that Python does.
well, languages which don't have identity don't have this dillema at all
C++ and Rust have identity
Not in the same way, though.
that's not a meaningful thing to say, or at least, you mean something by it that's not clear
not sure about that
well, then you better define what you mean by identity
&a == &b is identity in C, no?
even in C++ if you compare pointers you'll get a true value because a pointer to NaN is still equal to itself
for many years for example, it was common when writing copy assignment to do exactly a check for identity
Only if a and b have addresses, which is not guaranteed.
Yeah I guess some values in Rust have identity, basically everything on the heap and such
ye^
not really
everything "has" identity. The identity of things is not really meaningful in most cases
js has this but for a completely different reason ๐
that's true in python too
afaik rvalues don't have a useful notion of identity (unless C++ allows taking an address of an rvalue)
in python identity is a fundamental language-level feature and is not tucked away the same way, is pretty well defined as everything is boxed, and everything that follows is pretty logical
it's just being used as a performance shortcut here, most likely
The & operator is a fundamental language level feature in C++ too ๐คทโโ๏ธ
but it doesn't work on everything
yeah but it's not really the same
it's not "the same" but these very vague "well but C++ doesn't have identity" just isn't accurate
and doesn't really clarify the issues
to be more specific, not every single type in C++/rust has a valid operation that returns a value that is unique to that object for its entire lifetime.
Aren't compilers required to assign an address to a variable if you use & on it? Like, doesn't using & preclude the optimization where a variable lives only in a register and isn't copied into memory?
you could pass the rvalue into a function and take the address inside the function (of course, in there, it's not an rvalue)
since getting the address of something is a function call in python anyway it's not worlds apart
Yes, I think they are. But if you never use & in the source code and then open up a debugger, the equivalent of &a may not make sense.
I mean Rust is different, I don't know if they have a formal enough definition of lifetime.
but yes, that is in fact exactly what the address of an object is in C++, subject to certain restrictions (i.e. objects can share addresses in very limited circumstnaces)
not every object has an address as per the C++ abstract machine afaik
you can of course give it an address through various means, but objects without an address can exist
prvalues surely don't, but prvalues aren't really values at all, they're basically just expressions that the compiler is delaying the instantiation of
"objects" in fact do
values are not objects
An object, in C++, has
size (can be determined with sizeof);
alignment requirement (can be determined with alignof);
storage duration (automatic, static, dynamic, thread-local);
lifetime (bounded by storage duration or temporary);
type;
value (which may be indeterminate, e.g. for default-initialized non-class types);
optionally, a name.
After implicitly creating objects within a specified region of storage, some operations produce a pointer to a suitable created object. The suitable created object has the same address as the region of storage. Likewise, the behavior is undefined if only if no such pointer value can give the program defined behavior, and it is unspecified which pointer value is produced if there are multiple values giving the program defined behavior.
ah, interesting, didn't know this distinction was made, my bad
it's fine, I didn't really mean to drag us into the C++ weeds
my point was just that it's not so simple to say "well python has identity and X doesn't"
sorry sorry my bad
all good. at any rate, I think python's behavior here is insane but at least it has company
Well, originally the discussion was about NaNs.
container == should recursively defer to ==
I would never have thought that was controversial
using identity as an optimizaiton is always okay, but not when it can change semantics
I think it's important to keep in mind that the IEEE specification was designed by and for numerical analysis. It's great if you're doing scientific computation and need to get things exactly correct.
I mean at this point it's also how the vast majority of computers work at the native level
ye, the assumption a is b implies a == b is used to optimise container comparisons in python/kotlin/likely java, which doesn't happen in C++/rust/other languages where nan is a 4/8byte string with no extra runtime information, PHP is noteworthy because despite having identity, it compares [NAN] == [NAN] as false
that was what I was trying to say
Yeah, I mean the term which I think helps here is not "have identity" but rather "have reference semantics"
python, kotlin, and java have reference semantics throughout, so when you "copy" nan you don't really copy it, you just have pointers to the same nan
C++ and Rust have value semantics
so that optimization is mostly worthless to begin with
java primitive arrays surprisingly also compare equal here, which is just... odd.
just to add fuel to the fire [nan] == [nan] is also sometimes false in python if you have two different float instances
>>> l1 = [float('nan')]
>>> l2 = [float('nan')]
>>> l1 == l2
False```
ah yeah, that makes sense
in Rust there's actually an explicit thing, floats implement PartialEq but not Eq
yeah, the real takeaway here is that equality of IEEE floats is questionably useful at best
well, their equality is quesitonably useful for additional reasons to this ๐
but yeah, python's behavior here is pretty terrible
like, it's not even consistent
if float('nan') were at least a singleton, it would at least be consistent
I'm kind of surprised it's not
hmmm not sure I'm convinced
about which
This is an intentional IEEE design choice.
math operations can return nans, should they also all be converted to the same singleton? (and therefore the same bit patter)
Most people think of NaN as a constant. Actually, there are different types of NaNs.
actually, I wonder how pypy and graalpy do this - they both do simplify lists of floats into direct unboxed arrays
The existence of multiple NaNs allows you to track the source of a NaN in a computation. Plus, you can give the NaNs different behavior (signaling versus non-signaling; basically, whether or not you raise an exception).
>>>> l1 = [float('nan')]
>>>> l2 = [float('nan')]
>>>> l1 == l2
True
```pypy is different (which does make sense, since pypy lies about identity preservation in containers a lot)
err what
nan being a singleton in python or not has nothing to do with IEEE
and this has nothing to do with the different kinds of nans either
NaN should not be a singleton, ever.
float('nan') could easily have returned a reference to the same object each time
that doesn't make any sense
Why not?
because float('nan') is a deterministic function, it's obviously returning one specific NaN bit pattern
there's no reason why it can't be a singleton in python
pypy has it effectively be a singleton
Since NaN is not a singleton in IEEE, doing anything else at the Python level is an invitation to disaster.
you're misunderstanding
IEEE for starters does not specify things like "singleton" at all; that's a softare engineering term and IEEE is a spec
second, yes, everyone in this conversation understands that many bit patterns are associated with NaN
You're asking why float('nan') is float('nan') returns False, right?
obviously, NaN's of different bit patterns cannot be represented by a singleton
but there's no reason why a NaN of the same bit pattern cannot be represented by a singleton
I'm not really asking "why" because it's just an optimization choice
there's nothing preventing it, contrary to what you're saying
to be fair, a lot of things with the same bit pattern are not optimised to singletons, nan is not special in that regard.
Oh, you're asking why there isn't a singleton Python object for each NaN with a particular bit pattern. Not why there isn't a singleton Python NaN object.
Well, that would be fine.
I was asking why float('nan') specifically didn't use a singleton for its return
if float('nan') were at least a singleton, it would at least be consistent
It could always return math.nan, for example.
I agree that there's nothing wrong with that behavior.
I would have thought it would be worthwhile to make it a singleton since it's kind of the "default" NaN you'd use if it didn't happen directly from a computation
but I guess for some reason it' snot
I just hit "publish" on my blog post series for the Python Language Summit this year -- read all about it here! https://pyfound.blogspot.com/2023/05/the-python-language-summit-2023_29.html
Had the most fun writing this one: https://pyfound.blogspot.com/2023/05/the-python-language-summit-2023-pattern.html
But there's also a blog that's ~1800 words or so on the C API, if that's your kind of thing ๐
you might want to DM @summer lichen to request the appropriate roles by the way
you were saying? ๐
btw, re the nan stuff above and the java/kotlin
apparently double/double comparison works properly but when you compare it "as an object" (which will occur in ay generic code, like list ==)
it has these special cases
apparently, to make hashing work correctly...
that's the same reason as dict special cases nan in Python - if it didn't work that way, py nan = float("nan") d = {} d[nan] = 1 d[nan] = 2 print(d) del d[nan] would print text {nan: 1, nan: 2} and then raise a KeyError
i dont think it is special cased
containers like dict and list are first doing identity check and only if it fails they do ==
so hash(nan) == hash(nan) and nan is nan, so nan's work the same way as all other objects
>>> class X: __eq__ = lambda*_: False; __hash__ = lambda x: id(x)
...
>>> hash(X())
2_657_535_760_144
>>> x = X()
>>> d = {}
>>> d[x] = 1
>>> d[x] = 2
>>> d
{<__main__.X object at 0x0000026AC17E6710>: 2}
>>> d[x]
2
so dict is also assuming that identity implies equality
joy
at least there it's more reasonable somehow, as it's a real performance increase. And it feels less dirty because you didn't explicitly ask for ==
Indeed, Java's hacks and python's hacks for NaN are actually totally different.
In [1]: nan = float('nan')
In [2]: nan2 = float('nan')
In [3]: nan is nan2
Out[3]: False
In [4]: d = {nan: 5}
In [5]: d[nan2]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[5], line 1
----> 1 d[nan2]
KeyError: nan
In [6]:
java is using structural equality, rather than doing this dubious thing of falling back on identity. But it wants nan to be well behaved, when treated as an object, so it forces nan to be equal to itself
Java's solution for NaN at least yields predictable behavior.
python is just using identity as a shortcut over structural equality in many places, for performance reasons. This doesn't consistently fix any issue with nan, rather the opposite, it makes the behavior inconsistent
it's checking identity first not just as an optimization, but as a requisite condition for being able to use nan (or any other object which compares unequal to itself) as a dict key
>>> x = float('nan')
>>> y = float('nan')
>>> x is y
False
``` ๐ค
why there is no global `nan` object in interpreter? every `float('nan')` could just return already created object
or float('nan') has to return new object every time?
I can't see a reason why it couldn't always return the same object. I think it's just like tuples or ints or strs: the interpreter is free to reuse equivalent immutable objects rather than creating a new instance
looks like you're right, though: it is entirely an optimization in the dict case that identity is checked for before equality. https://github.com/python/cpython/commit/ca756f2a1d8353812223af380166c826ffefcc2c
The best solution for this would probably break too many things. math.isnan exists, so deciding that python's nans don't need to compare unequal and that people should be explicitly handling nans anyhow could have been a pragmatic decision... years ago, but not now.
I am installing Python on my another machine. I would like to know the best path/directory (the one that is generally use which does not have any issues related to admin privileges and some libraries not working) for it. Where should I install it?
bytearrays have two "sizes":
len(obj)- size visible from python world- size of internal buffer
there are two functions:
PyByteArray_Size(PyObject *bytearray)- Return the size of bytearray after checking for a NULL pointer.PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len)- Resize the internal buffer of bytearray to len.
PyByteArray_Resize explicitly says that it modifies size of internal buffer
PyByteArray_Size returns "the size of bytearray", but i dont understand what size: is it a len(obj) or size of internal buffer?
i looked into source code, but it confused me even more
i want to:
- get
len(bytearray)efficiently (avoiding calling.__len__()if possible) - set to
len(bytearray)after i resized and modified internal buffer
can you help me please?
seems like PyByteArray_Size is the same as len(), they both end up calling Py_SIZE
is there a fast way to set to len()?
to set the allocated size to len()?
PyByteArray_Resize should do that, but resizing presumably means copying the internal buffer, so that's slower than not resizing
i think PyByteArray_Resize should not copy internal buffer if requested size is close enough to actual size of internal buffer
i am calling PyByteArray_Resize to ensure that some amount of bytes can fit into buffer, i dont want to change len() at this moment
in some cases i want to change len() (without copying anything, new len() definitely can fit into internal buffer)
i am confused, i will think about what i want and ask questions later
by the way seems like you can get the allocated size (ob_alloc in the C struct) from the __alloc__ method in Python ```>>> ba = bytearray(4)
ba.alloc()
5
len(ba)
4
ba.pop()
0
ba.alloc()
5
len(ba)
3
obj->ob_alloc - size of internal buffer (obj.__alloc__())
obj->ob_size - len(obj)
right?
yes
i did not know this existed
I don't think it's documented, found it from reading the C code
i knew about it, i found it from looking at bytearray.__dict__
>>> help(bytearray.__alloc__)
Help on method_descriptor:
__alloc__(...)
B.__alloc__() -> int
Return the number of bytes actually allocated.
what is float.__getformat__?
i cannot call it, it either raises TypeError: __getformat__() argument must be str, not float if i pass float to it, or ValueError: __getformat__() argument 1 must be 'double' or 'float' if i pass str to it
'IEEE, little-endian'
That one exists for testing purposes I believe, which is why it's undocumented.
yes
!rule 6 7 -- This is offtopic and inappropriately here. Sorry
6. Do not post unapproved advertising.
7. Keep discussions relevant to the channel topic. Each channel's description tells you the topic.
!cban 895141929971486741 Spam
:incoming_envelope: :ok_hand: applied ban to @teal torrent permanently.
>>> help(property)
class property(object)
...
| deleter(...)
| Descriptor to obtain a copy of the property with a different deleter.
|
| getter(...)
| Descriptor to obtain a copy of the property with a different getter.
|
| setter(...)
| Descriptor to obtain a copy of the property with a different setter.
...
why docstrigs are saying that these methods are descriptors? they are just regular methods
because they were written by people, and people don't always write good docstrings.
interestingly, the commit that added them called them methods in the commit message, but not in the docstrings:
r58929 | guido.van.rossum | 2007-11-10 14:12:24 -0800 (Sat, 10 Nov 2007) | 3 lines
Issue 1416. Add getter, setter, deleter methods to properties that can be
used as decorators to create fully-populated properties.
!cban 1099566257948327946 Advertising and probable scam
:incoming_envelope: :ok_hand: applied ban to @rich zenith permanently.
https://peps.python.org/pep-0695/ next step, add mypy as built in module 
Python Enhancement Proposals (PEPs)
can anyone tell me where I can find links to classes teaching html through pycharm?
This channel isn't the place for this question
Long time Pythoneer Tim Peters succinctly channels the BDFLโs guiding principles for Pythonโs design into 20 aphorisms, only 19 of which have been written down.
where is the 20th aphorism?
it is not offtopic, because it is about pep20!
You'd have to ask Tim, I suppose
IIRC, the 20th was left for guido to define, which he never ended up doing
yet
let's-hear-it-for-lambda-in-curly-assignment-stmts-ly y'rs - tim
Not sure which board to post this on, so here goes:
There has been a lot of hype with the recent AI showpieces that have come out, but I believe the true future of AI is likely to be single-task applications using small, focused models that fit on small devices. One such application can have a significant impact on the performance of Python.
There are two kinds of environment in which Python operates: the "development environment" is served well by the JIT compiler, but the "production environment" is not. Python is increadibly slow compared to other languages, but with the advent of AI a python app could be deployed in binary.
Theoretically, AI could be used to translate every package in PyPI into machine language (or you can just compile it) and include the binary files in the packages. You can then run a python program in the command line using a prompt like "python -P app.py" to let the JIT compiler load the binary for packages, rather than compiling packages every time they are imported.
Alternatively, you could provide two separate compilers: one for development and casual use (JIT) and another which compiles the whole program, including the import tree, into binary for production. Having the binaries in the package files will be more efficient though.
Python doesn't compile packages every time they are imported (by default). In general, this would not work like you think, and there are already third-party packages that more-or-less do something like this (translate Python code into machine-code), but they have limitations.
The reason Python isn't compiled to machine code isn't because people aren't smart enough to write a compiler and we need an AI for it
If you're interested in compiling Python code, one such project is Cython, which has a comparison to competitors on its README: https://github.com/cython/cython#differences-to-other-python-compilers
It can't be hard to train a transformer with input that is python code and the output is the output from the compiler after it has compiled everything (but it could be expensive). If it is small enough to run quickly on a CPU then it changes the design prospects for future compilers
It can and is.
Machine learning tools are inherently "black box", you can't peek inside and be relatively sure that it will do the right thing
And you would expect a compiler to be relatively correct.
In CPython (or any hand-written compiler/interpreter for that matter), if there's a bug which is causing otherwise valid code to be rejected or compile incorrectly, then you can (eventually) trace down exactly where that bug is originating from in the human-readable source code, and patch it
If your source-to-executable model has a similar bug, then good luck reading the weightings to figure out why its happening, let alone how to fix it
A compiler is one of the worst tasks for AI
You need your output to be semantically correct
and you better be able to justify people that it is correct (as opposed to how a black box behaves) or else what you have is a hazard ๐
And like @quick snow said, python files are not by default compiled each time an import is run. (The __pycache__ is built on first load)
(not to mention that machine code is highly platform-specific, so in practice you'd need to both train the model for each target)
i think it can generate some internal representation (which is (i hope) relatively simple compared to machine code)
then normal compiler can optimize it and compile to machine code
If the model loses track of its context at any point (which for large enough input is inevitable) you will have a miscompilation
anyways at that point you have a python parser
these techniques might have novel use in optimization passes, guaranteed that there is some proof of correctness
but it's a tough sell since "executing the output of a language model" is a known security risk
Yeah I don't see how "AI" is relevant to this problem
seem like PEP 703 - Making the Global Interpreter Lock Optional in CPython will not get accepted or not get progressed
from Sam Gross author of the PEP
Hi Gregory and the Steering Council,
Thanks for reviewing the PEP. The PEP was posted five months ago, and it has been 20 months since an end-to-end working implementation (that works with a large number of extensions) was discussed on python-dev. I appreciate everyone who has taken the time to review the PEP and offer comments and suggestions.
You wrote that the Steering Council's decision does not mean "no," but the steering council has not set a bar for acceptance, stated what evidence is actually needed, nor said when a final decision will be made. Given the expressed demand for PEP 703, it makes sense to me for the steering committee to develop a timeline for identifying the factors it may need to consider and for determining the steps that would be required for the change to happen smoothly.
Without these timelines and milestones in place, I would like to explain that the effect of the Steering Council's answer is a "no" in practice. I have been funded to work on this for the past few years with the milestone of submitting the PEP along with a comprehensive implementation to convince the Python community. Without specific concerns or a clear bar for acceptance, I (and my funding organization) will have to treat the current decision-in-limbo as a โnoโ and will be unable to pursue the PEP further.
https://github.com/python/steering-council/issues/188#issuecomment-1581534250
Please consider PEP 703 -- Making the Global Interpreter Lock Optional in CPython https://peps.python.org/pep-0703/ The PEP has been discussed in threads listed in its Post-History header The PEP w...
<@&831776746206265384> cross posting and rule 9 in #python-discussion earlier
Honestly, it's way too early to tell if this message spells the end of PEP 703.
For reference, here's what Sam Gross was responding to: https://github.com/python/steering-council/issues/188#issuecomment-1575106739
Given all the buzz generated by Sam Gross's nogil work, it's pretty clear that there is interest in offering GIL-free code execution for CPython, both from the developer side and community side.
The current steering council seems to have been discussing PEP 703 a lot (it's quite a long and substantial PEP) over the past months (see e.g. https://github.com/python/steering-council/blob/main/updates/2023-03-steering-council-update.md or the Discourse thread or mailing list).
Sam Gross's initial work at https://github.com/colesbury/nogil has been ongoing since late 2021, but only in January did he do a larger set of changes to the PEP and started a new repository at https://github.com/colesbury/nogil-3.12 for updating nogil to incorporate newer changes (e.g. immortal objects). Seems like other core devs want more time for discussion, and from what I can tell it seems like the steering council wants the same. Then again one could say that there has been enough time for discussion around nogil (there were language summits covering nogil in this and last year's PyConUS event, in addition to all the Discourse discussions around it, media references to nogil on e.g. YouTube, etc.).
Full-time work on nogil for such a long period of time must be pretty arduous (especially alone), so perhaps Sam is waiting for some kind of guarantee that PEP 703 will be accepted down the line, presumably soon.
Without specific concerns or a clear bar for acceptance, I (and my funding organization) will have to treat the current decision-in-limbo as a โnoโ and will be unable to pursue the PEP further
sounds like a big part of that message is the funding.
Not necessarily even that. It sounds like it's mostly just a lack of clarity. It sounds like funding wouldn't be an issue if the SC committed to move forward with the idea.
i mean it is reasonable that an organization would want to stop funding or he would want to stop working on it if it's going to go to waste
because that's a lot of time, work, and money
Without specific concerns or a clear bar for acceptance
It was pretty clear that this was the fundamental hangup. There's a "not no", but no path to a "Yes" given either.
To be honest this feels like both sides of the decision would have quite a lot of consequences, but also that there was perceived competition with the per interpreter gil.
This not only feels like almost each person on the council didnโt want to be the one that possible decided the wrong way and be linked to the consequences, but also from the thread on discourse, that quite a few didnโt actually really study the pep.
If this is the case (which I do not say it is, just that the optic of it is there) then that is actually worse for the structure as a whole.
Everyone can make easy or clear decisions, the council is needed for the hard ones, the dilemmas. By not making a decision this became the worst outcome and really gives of a bad vibe.
A No would be ok, a yes too, a missing decision can call the purpose of the body into question.
There is an even worse way of looking at it, but this would imply actual malice, which I donโt see nearly enough clear evidence for, to explicitly state it and therefore accuse people (cleary: I dont think there was any malice, but it is something that will infect the discussion if the whole situation is not adressed by the council)
Sorry for the long text and to sonewhat also state my bias:
I am someone who was hoping this would be accepted and actually prioritized.
A yes woild have made me happy.
A no would have made me sad and maybe frustrated.
A nothing is making me irrationally annoyed and angry.
It's not necessarily a nothing, it's just nothing yet. The core devs have all been very busy putting out fires for the 3.12 beta release for the last month or two at least. "It takes a long time to make decisions" is a necessary consequence of too few volunteers with too many responsibilities and too little time.
Historically, one of the biggest issues with attempts to remove the GIL is that people would go off and build some huge change set in total isolation without any feedback from the people who actually maintain the interpreter from day to day and year to year, and then when they get something that more or less, they try to convince other people to take ownership of this huge thing that only they really understand. I'm not saying that's what's happening here, but clearly there's costs to moving forward with this PEP, and clearly there's value in taking the time to fully understand the approach, figure out the pros and cons, and carefully decide whether it's a good idea.
15,000 lines of changes to CPython code, plus an additional 15,000 lines of vendored third party code, is no small maintenance cost.
Though that's quite a lot smaller than some previous GIL removal attempts.
I need some use cases and use case diagram . I need a system test design . I need object, class, and component diagram . I need possible risks and a risk analysis. I need a persistent data management explanation: the description of data schemes, the selection of a database, and the description of the encapsulation of the database. I need those for my project. Is there any anyone could help me ? If threre is anyone could help me please contact me on dm?
and you need to get tucked in to bed with a bedtime story
harsh
are there any peps that cite future features as reasons to implement things?
in the form "x will serve as a basis for y" where "y" is it's own pep, I think so, but I can't remember any from memory.
There are certain PEPs which references other at-the-time unimplemented PEPs, e.g. 684 and 554 (Per-interpreter GIL and subinterpreters in stdlib) reference one another, or 695 (new 3.12 typehinting) references 649 (different way of evaluating type annotations)
or the one which introduced __matmul__ (a @ b) specifically said "we're not gonna use this in the stdlib, its for other libraries to make use of"
Oh subinterpreters yeah
Has there ever been discussion of implementing C-style ++ and -- operators as a shorthand for += 1 and -=1?
not sure... but why do you want that?
this sounded odd to me when I first heard it, but it's true: Python doesn't need incrementing as much as other languages do.
+= already isn't guaranteed to be in place, and ++/-- wouldn't be either. what would the perceived benefit be?
it's a good point that += is more general than adding to a number, but ++ doesn't translate as easily to other datatypes.
s = "Hello"; s++ # ????
And there's evaluation order as well
and also how many chars do you save? one
More to my point, if the motivation is because another language has it, it's probably important to make sure the person asking about knows that both the current solution and what's proposed both do (or would have to, in the case of proposed) behave differently anyhow.
It was mostly used in C-like for loops, but they are vulnerable to off-by-one bugs.
That was in Swift as well, but was removed in 3.0.
I see, that makes sense
i think it can be applied to iterators to fetch next(it)
but difference between it++ and ++it is not clear in this case
also it-- is not possible, because iterators doesnt support backwards steps
I think C++ actually uses it for other purposes...
like advancing an iterator
indeed it does.
s++ is โelloโ
in C, yes ๐
++x and --x already exist in Python ๐
somewhere is the hack of using __pos__ to add .5 to a number....
very handy
in C++, you can't know what s++ is without knowing what type s is. You've assumed it's const char*, but it could be any type that defines an operator=(const char*)
I want to naively say "HelloHello" bc s+s=="HelloHello" but then you think about ++ works for ints in other languages and my assumption feels weird
in python there's isn't a logical meaning to s++
Now that would be cool, a different way to do next
x = 2
x++ # x is now 3
Yeah, I'm saying if we were to define it, that's what I think it should be for str naively, like without much thought
Yeah it doesn't
I'm coming late to this discussion, but I just want to say: 
y'all know how to regenerate the configure script in the Interpreter's source code?
These instructions cover how to get a working copy of the source code and a compiled version of the CPython interpreter (CPython is the version of Python available from https://www.python.org/). It...
ah, thanks
Is cpython using CMake?
my are enums implemented as inheritance and dataclasses with a decorator? what's the reasoning?
all enum classes have some common methods, so it is more reasonable to use inheritance
dataclasses generate new methods, so there is no way to inherit from something, because methods are generated at runtime (__init__ can have arbitrary number of arguments, __eq__ can check equality for arbitrary number of fields)
Dataclasses are more of an implementation detail than an actual parent, they just generate boilerplate for you, and as such they are designed to be specifically opt-in - if you inherit class Child from a dataclass-decorated class Parent, Child will not be a dataclass itself unless you decorate it explicitly.
Enums, on the other hand, are generally way easier to spot than dataclasses, very rarely inherited from, and also have a fundamentally different use case compared to regular classes, so the effects on inheritance will come up less and be more predictable
Is there any footgun regarding inheriting from an enum? I use a base enum that only changes the way _missing_ is handled to a way I personally like more. There has been no problems with having almost all my enums inherit from that base enum, BUT should I be on the look out for trouble?
Why there is no float.is_nan and float.is_inf?
I think it is natural to put these functions inside float class instead of math module.
(There is one downside: it would require int.is_nan and complex.is_nan to exist)
Also float('inf') to construct special value seems ugly to me. Why there is no float.nan and float.inf classvars?
there's math.nan and math.inf
in fact there can be many different nans iirc
I thought cpython creates only one "static" nan object and converts any c-level nans to that canonical nan object
There are only two infinities, right?
I think they also can be allocated only once
Nope, many different nans can exist.
!e
import struct
nan1 = float("nan")
nan2, = struct.unpack("f", b"\x12\x00\xc0\x7f")
print(nan1, nan2, nan1 is nan2)
print(struct.pack("f",nan1), struct.pack("f", nan2))
@quick snow :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | nan nan False
002 | b'\x00\x00\xc0\x7f' b'\x12\x00\xc0\x7f'
That feels wrong, but ok
nan doesn't mean infininty. It means, "I don't know what this is, but I know it's not a number".
Yes, i know. But nans and infs are similar because they are special-cased in float('inf')/float('nan') and math.is_inf/math.is_nan
And they have no literals
i guess i didn't understand what you meant by "that feels wrong"
I think it would be better to have only one "canonical" nan object, so you can do checks like x is float.nan instead of math.is_nan(x). And in this case we would be able to rely on fact that float('nan') (or float('inf') always returns the same object
the floating-point standard says there can be different NaNs
And it would be more space and time efficient because there is no more need to create new nans (which can happen a lot of times if there is some nan in the beginning of your calculation)
!e print(sum(range(10), start=float('nan')))
@dusk comet :white_check_mark: Your 3.11 eval job has completed with return code 0.
nan
There's no need for float.__add__ (or the C equivalent) to return a new nan object here
Objects/floatobject.c lines 594 to 602
static PyObject *
float_add(PyObject *v, PyObject *w)
{
double a,b;
CONVERT_TO_DOUBLE(v, a);
CONVERT_TO_DOUBLE(w, b);
a = a + b;
return PyFloat_FromDouble(a);
}```
I think it creates new object every time
It might, but it wouldn't need to.
I see a potential benefit for one pre-allocated nan, like for small ints (although of course it'd need a benchmark). But that doesn't require enforcing singletonness everywhere
it does
It does do, or it does need to?
it does create a new object every time
if your code is too slow because it's creating too many NaNs, you have other things to worry about....
adding a branch just for the NaN (and inf?) special casings wouldn't really make sense
every floating point add would incur a huge performance penalty for what
saving an allocation on a select few cases
NaN was originally a signaling option. The bit pattern of NaNs is semantically important in some cases. Keep in mind that FPUs (and now GPUs) can often do floating point math in specialized ways, and rather than error out, you get back a bunch of operations at once, and then handle whatever wasn't valid for whatever reason.
I didn't know about that, thank you.
Just reported my first bug to python/cpython ๐ Please take a look and see if u can reproduce it.
https://github.com/python/cpython/issues/105829
Works fine on my machine:
Traceback (most recent call last):
File "/home/myusername/scratch/deadlock.py", line 56, in <module>
main()
File "/home/myusername/scratch/deadlock.py", line 23, in main
assert False # Never reached
^^^^^^^^^^^^
AssertionError
(Linux 5.19.0-41, Ubuntu 22.04, a Lenovo T14 (x86_64, 32 GB RAM, 16 processors), Python 3.12.0a3)
I also tested it on 3.11 and 3.10, same behaviour on both.
Thanks. Did you try removing the prints or increasing the tweaking parameters in the script?
If I turn off DO_PRINT it does indeed appear to hang.
:incoming_envelope: :ok_hand: applied timeout to @unkempt rock until <t:1686926429:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
:incoming_envelope: :ok_hand: applied timeout to @unkempt rock until <t:1686926432:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
:incoming_envelope: :ok_hand: applied timeout to @unkempt rock until <t:1686926435:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
:incoming_envelope: :ok_hand: applied timeout to @unkempt rock until <t:1686926438:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
:incoming_envelope: :ok_hand: applied timeout to @unkempt rock until <t:1686926441:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
:incoming_envelope: :ok_hand: applied timeout to @unkempt rock until <t:1686926445:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
Does anyone here happen to know how to raise an error in cpython? I found PyErr_SetString(PyExc_ValueError, "the text of the value error"); but that doesn't seem to actually raise it.
๐จ ๐ applied timeout to @fallen slate until <t:1686926445:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The Moderators have been alerted for review.
that works, but you also have to explicitly return NULL afterwards
(or sometimes something else, depending on where you are)
Will try adding return NULL, thanks
That worked, thank you ๐
For context, I finally decided to start working on https://github.com/python/cpython/issues/102450 ๐
is that a leap second
Nope, that would be 23:59:60
It's only in the fromisoformat calls (not constructors), and it converts to 00:00 so .hour is still 0..23
hi, could someone perhaps help me comprehend this portion of the grammar?
simple_stmts:
| simple_stmt !';' NEWLINE # Not needed, there for speedup
| ';'.simple_stmt+ [';'] NEWLINE
```(<https://docs.python.org/3/reference/grammar.html>)
what i understand is:
- either a single simple stmt, followed by a newline
- any number of simple stmts, separated by semis, with an optional trailing, followed by a newline
effectively, that ends up being equivalent to the latter (which i presume is what the comment means?)
does that sound right?
also, i don't understand why the !';' is in the one at the top. is that part of the performance optimization?
The top line there prevents the parser from looking around too much in the case of a simple statement followed by a new line. (which is much more common due to commonly accepted style conventions)
If you're more familiar with regex than formal grammar, it's a similar trick people use with alternating patterns to skip a version of the pattern that requires backtracking if the more common case can be captured without it.
got it, thanks, i see why that's used. so, then, would it be right to look at that as the following, if the optimizations are ignored?
simple_stmts: simple_stmt (';' simple_stmt)* ';'? NEWLINE
At a glance, that looks as if you can consider it equivalent outside of nudging the parser to better performance, but I'm hesitant to give an affirmative "yes" without viewing everywhere simple_stmts is used as well in case there's any additional subtlety here.
I'm mostly sure it's fine to consider that equivalent.
(I'm not sure if there's a reason the second line is constructed the way it is rather than how you have constructed it which isn't obvious in some way)
that's good enough for me haha. thanks!
i have a feeling that's just how the writers of the grammar designed that syntax. for example, lists are written
list:
| '[' [star_named_expressions] ']'
star_named_expressions: ','.star_named_expression+ [',']
no idea if it's a performance thing or what but that seems to be a fairly common pattern
probably, but I'd still want to either test or inspect some other ways things are defined that I don't remember from memory in case this is somehow important before giving an affirmative yes on. I am at least sure that conceptually you can fold the two lines of simple_stmts together outside of performance concerns. As for the other part, it's entirely possible it's just stylistic choice here.
makes sense. thanks again!
hi, could i get a little sanity check on my understanding of how one might implement semantic indentation while lexing python? i've summarized an implementation of the rules as i understand them and was wondering if anyone would be willing to look it over. the summary is here:
The lexer handles them by keeping a stack of
Indentations, a pair of the count of tabs and spaces on a given indentation level. Determining what the whitespace at the start of a line represents is done by the following rules:
- If either the space and tab count increase, and the other increases or remains the same, it is an indent, and a new level is pushed onto the indentation stack.
- If either the space and tab count decrease, and the other decreases or remains the same, it is a dedent. All indentations greater than the current are popped off the stack, and if the new indentation level is not the same as any of the ones currently on the stack, an error is emitted.
- If the space and tab count change in different direction (that is, either the space count increases and the tab count decreases, or the space count decreases and the tab count increases), an error is emitted, as per the following clause in the reference:
Indentation is rejected as inconsistent if a source file mixes tabs and spaces in a way that makes the meaning dependent on the worth of a tab in spaces; a TabError is raised in that case.
- If both the space and tab count both remain the same, then no indents or dedents are emitted.
There are a few exceptions to the indentation rules. If a line is empty, only whitespace, only a comment, or a mixture thereof, then it does not contribute to the addition or removal of a level to the indentation stack.
the rules are officially documented at https://docs.python.org/3.13/reference/lexical_analysis.html#indentation. i had a fair amount of trouble fully comprehending what that section all meant, and this is what i've understood of it, but i'm not exactly confident in its accuracy. if anything there is not clear, i can try to explain.
i think it's accurate
although there are more special cases
- if a form feed (
\014/\f) is encountered, the indents reset as it is considered a newline(?) - if a backslash (
\) is encountered in the middle of indenting, the indentation remains the same as the indentation before the first backslash in a group of continued lines
i just realized i forgot to copy in the backslash-at-eol and open-delimiter-pair rules too
could you give an example of the last? I'm a little confused there
but thanks for looking over it! that's very helpful
def a():
b = 2
\
c = 2 # the indents on this line are ignored
\
\ # so are the indents on this line
\ # and this line
c = 3 # until this line
oh right that's just the backslash-at-eol rule