#internals-and-peps
1 messages ยท Page 11 of 1
how do you keep track of what classes and hooks are made?
i don't keep track as of rn
I've been trying to weakref.finalize hooked methods but that doesn't seem to work well with non-weakrefables like property
maybe I'll switch to finalizing on the type instead
all types are weakref-able (I think?)
why do you want to keep track?
restoring the original attribute on finalize I guess 
finalize isnt passed the class tho
thought about atexit but I think the type might not exist by then?
atexit and finalize both get called at the same place for static types it seems
this was hit by weakref.finalize(int, hit_bp);exit()
I attach it to the method and it gets the type as *args
>>> weakref.finalize(A(), lambda *A:print(A))
()
<finalize object at 0x105f6a8c0; dead>
>>>
``` wdym?
like here the weakref.finalize in on the __hash__ object, and it's passed a stored strong ref to int
@impl(int, detach=True)
def __hash__(self):
print("in hash", self)
return orig(int).__hash__(self)
finalize doesnt pass any arguments as far as i can tell
you can't store the object itself though, since that will make it never be GC'd
seemed to work here
or actually weakref.finalize handlers that are not called by deconstructors are just called at exit
so cyclical ones will be called at interpreter teardown
Can anyone point me to lexers for python source code?
I've seen a few, but they're few and far between
https://github.com/gvanrossum/ctok what about this one?
Wonderful!
is it possible to subclass ctypes.Structure without defining _fields_
Hi
I just want to provide some common mixin methods using a base Structure class
!e i think you can ```py
from ctypes import *
class A(Structure):
def method(self):
print(self)
class B(A):
fields = [('ob_refcount', c_ssize_t)]
b = B.from_address(id(1))
print(b.ob_refcount)
b.method()```
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 1000000084
002 | <__main__.B object at 0x7f4c952f4710>
huh... 
I thought it said _fields_ must be defined before the class is subclassed
ยฏ_(ใ)_/ยฏ
!e
from ctypes import *
class Struct(Structure):
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
cls._fields_ = [(k, v) for k, v in cls.__annotations__.items()]
class Foo(Struct):
ob_refcnt: c_ssize_t
ob_type: py_object
@warm breach :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 8, in <module>
003 | File "<string>", line 6, in __init_subclass__
004 | TypeError: ctypes state is not initialized
__init_subclass__ can't assign to _fields_ apparently though, weird
weird
it is because the STGdict is not initialized
is it possible to delay init_subclass to happen after that
when does init_subclass happen anyways?
I thought it was after the class was defined, as if you had a decorator
!e ```py
from ctypes import *
class Cmeta(type(Structure)):
def init(self, name, bases, mapping, **kwargs):
super().init(name, bases, mapping, **kwargs)
self.fields = list(mapping.get('annotations', {}).items())
class Struct(Structure, metaclass=Cmeta):
# methods here
pass
class PyObject(Struct):
ob_refcount: c_ssize_t
ob_type: py_object
print(sizeof(PyObject), PyObject.fields)```
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
16 [('ob_refcount', <class 'ctypes.c_long'>), ('ob_type', <class 'ctypes.py_object'>)]
I got it to work
o.O
Gotta love metaclasses
so when does Cmeta.__init__ get called here
When subclasses are initialized
Modules/_ctypes/stgdict.c line 427
if (!stgdict) {```
still don't understand how this is null in __init_subclass__ though
seems like a bug
idk, ctypes has a lot of hackery in its use of STGdict
the Cmeta trick should work for einspect tho @warm breach
yeah thanks for that, I'll probably switch over
(you can technically do class PyObject(Structure, metaclass=Cmeta) instead of having the interim class)
having to do the decorator plus ctypes.Structure plus mixins
@struct
class Foo(Structure, AsRef, Display)
was quite annoying
yea I can imagine
you should be able to define multiple utility classes ex: class AsRef(Struct) to inherit from and they should compose correctly.
you should also test what happens if _fields_ is already set, cause I did not
๐
can i use chat gpt too help me
if you don't care if the answers are correct, yes.
so much easier to override __setattr__ for every struct now, finally have my NULL singleton working everywhere
from einspect import view, NULL
v = view(int)
v.tp_as_number[0].nb_power = NULL
print(3 ** 85)
>> TypeError: unsupported operand type(s) for ** or pow(): 'int' and 'int'
got NULL comparisons working as well
from einspect import view, NULL
n = view(int).tp_as_number.contents
print(n.nb_add == NULL)
# False
print(n.nb_matrix_multiply == NULL)
# True
does is work or no?
well no but you wouldn't do that in C either right
== NULL just checks if a pointer is null, not that the pointer address has to be the same
You could just make it return your NULL singleton if the pointer is null
Then is would work
how would I do that with Structures though
I guess I could override __getattr__ to detect returned LP_PyObject pointers and replace them
but the types like ctypes Arrays of PyObject pointers I wouldn't be able to change iirc
like arr here comes from cast(<ob_item_0_ptr>, POINTER(PyObject) * 2)
from einspect import view
t = (1, 2)
arr = view(t).item
idk it just feels like == None for me
since it's just a ctypes.Array I'm not sure how I'd override what it returns
!e ```py
from ctypes import *
base_size = sizeof(c_ssize_t)
def getclsdict(cls):
d = cls.dict # hold reference due to cls.__dict__ being a getter in static classes
if isinstance(d, dict):
return d
return py_object.from_address(id(d) + 2 * base_size).value
creates modded handlenull type to shim null values
Null = type('Null',(),{'repr':lambda self:f'<NULL>'})()
GETFUNC = PYFUNCTYPE(py_object, c_void_p, c_ssize_t)
class StgDictObject(Structure):
fields = [
('-', c_ubyte*(
dict.sizeof({}) +
sizeof(c_ssize_t * 7) +
sizeof(c_ushort * 2)
)),
('getfunc', GETFUNC)
]
def get_stg_dict(cls):
return StgDictObject.from_address(id(getclsdict(cls)))
p_stg = get_stg_dict(py_object)
orig_getfunc = p_stg.getfunc
@GETFUNC
def getfunc(ptr, size):
if c_void_p.from_address(ptr).value:
return orig_getfunc(ptr, size)
return Null
p_stg.getfunc = getfunc
print(py_object().value is Null)```
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
True
huh
@warm breach thats an example of changing the py_object getfunc
(you can also make your own subclasses of _SimpleCData and inject get and set funcs)
but like... that wouldn't change a POINTER(PyObject) field?
the same thing should work with any subclass of _SimpleCData (with some edits)
can I override what the POINTER type does in general?
I am checking rn
!e
from ctypes import *
from einspect.structs import PyObject
x = POINTER(PyObject)
print(type(x))
print(type(x).__mro__)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | <class '_ctypes.PyCPointerType'>
002 | (<class '_ctypes.PyCPointerType'>, <class 'type'>, <class 'object'>)
hm
https://docs.python.org/3/library/dis.html#opcode-PUSH_NULL pushes a C NULL to the stack
part of the method caching in 3.11 iirc
something which is not a python object but is in the stack hmm
does NULL point to a python object
well they're all PyObject pointers
NULL will be interpreted as a NULL PyObject pointer if it is casted to one
but they probably check it for NULL and do something with it
a *PyObject can be a null pointer, yeah
!e
from einspect import NULL
from einspect.structs import *
t = PyTupleObject(
ob_refcnt=1,
ob_type=PyTypeObject(tuple).as_ref(),
ob_size=3,
ob_item=[NULL] * 3
).into_object()
print(t)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
(<NULL>, <NULL>, <NULL>)
python builtin collections are able to show reprs of NULL PyObject pointers somehow
but if you try to access those indices it will segfault
strange that it can print NULL
!e
from einspect import NULL
print(NULL)
print(id(NULL))
@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | <NULL ptr[PyObject] at 0x7fc6b2a861d0>
002 | 140491379206784
it has handling for it for some reason
why don't the ids match ```pycon
0x7fc6b2a861d0
140491377631696
that's just a singleton instance of a null POINTER(PyObject) I have, not really a null object
NULL pointer is not 0 smh
the repr is showing ctypes.addressof
ok
!e
from ctypes import addressof
from einspect import NULL
print(NULL)
print(hex(addressof(NULL)))
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | <NULL ptr[PyObject] at 0x7f34fee562f0>
002 | 0x7f34fee562f0
!e
from einspect import NULL
from einspect.structs import *
t = PyTupleObject(
ob_refcnt=1,
ob_type=PyTypeObject(tuple).as_ref(),
ob_size=1,
ob_item=[NULL]
).into_object()
print(t)
@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.
(<NULL>,)
!e
from einspect import NULL
from einspect.structs import *
t = PyTupleObject(
ob_refcnt=1,
ob_type=PyTypeObject(tuple).as_ref(),
ob_size=1,
ob_item=[NULL]
).into_object()
print(t)
print(t[0])
@gray galleon :x: Your 3.11 eval job has completed with return code 139 (SIGSEGV).
(<NULL>,)
well before you print it, t[0] tries to convert the pointer into a python object for you
the repr happens in C so they can handle NULLs
!e
from einspect import NULL
from einspect.structs import *
t = PyTupleObject(
ob_refcnt=1,
ob_type=PyTypeObject(tuple).as_ref(),
ob_size=1,
ob_item=[NULL]
).into_object()
null = t[0]
print(dir(null))
@gray galleon :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).
[No output]
!e python is doing this essentially (but without the safety check, hence segfault)
from einspect import NULL
print(NULL.contents.into_object())
@warm breach :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 3, in <module>
003 | ValueError: NULL pointer access
yeah
the t[0] is the culprit
thx
now i know how to create a tuple that breaks when indexed
also its impressive that python can be so unsafe
at the point where you're pulling in a C FFI, you're not really writing Python anymore.
well, python you write in CPython should be safe, the C which CPython is written in is naturally not safe
just rewrite python in rust then
it'd be pretty much the same thing but slower
"safe" calls in rust aren't free speed-wise
also making python in rust without unsafe calls would probably be close to impossible
how
You can even make it without using the CFFI
how
ctypes?
that is an ffi
!e ```py
import gc
class magic:
def length_hint(self):
return 1
def __iter__(self):
for obj in gc.get_objects():
if isinstance(obj, tuple):
try:0 in obj
except SystemError:
yield obj
break
weird = tuple(magic())
print(weird[0] is weird, weird)```
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
True ((...),)
sure. I guess another option is directly creating a code object
@gray galleon that abuses the gc to grab a tuple as it is being created. You can change length_hint to larger values and it will leak NULLs
does it break python?
looks impressive tho
It makes a tuple that contains itself
hm that seems like a fixable bug. I guess it should call __length_hint__ before creating the tuple
Hash it
>>> weird = tuple(magic())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: D:\_w\1\s\Objects\tupleobject.c:927: bad argument to internal function
``` um
ik
that doesn't cause python to crash when indexed
Hash it
!e ```py
import gc
class magic:
def length_hint(self):
return 10
def __iter__(self):
for obj in gc.get_objects():
if isinstance(obj, tuple):
try:0 in obj
except SystemError:
yield obj
break
weird = tuple(magic())
print(weird)```
@gray galleon :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 15, in <module>
003 | SystemError: Objects/tupleobject.c:927: bad argument to internal function
lmao
it fails one of these checks https://github.com/python/cpython/blob/3.11/Objects/tupleobject.c#L923-L929
Objects/tupleobject.c lines 923 to 929
if (v == NULL || !Py_IS_TYPE(v, &PyTuple_Type) ||
(Py_SIZE(v) != 0 && Py_REFCNT(v) != 1)) {
*pv = 0;
Py_XDECREF(v);
PyErr_BadInternalCall();
return -1;
}```
!e ```py
import gc
class magic:
def length_hint(self):
return 1
def __iter__(self):
for obj in gc.get_objects():
if isinstance(obj, tuple):
try:0 in obj
except SystemError:
yield obj
break
weird = tuple(magic())
hash(weird)```
@pliant tusk :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).
[No output]
or passes, actually
worked on my 3.11.1 though
not on my 3.11.0
o
hash bug?
It fails those when you change length hint. Just leak the tuple by setting it as a global inside iter
you can't really hash self-referencing objects
or at least python collections don't prepare for that
wdym?
is there a non memory-patch way of making custom get set funcs?
from ctypes import *
from einspect.structs import PyObject
PyCPointerType = type(POINTER(c_void_p))
class LP_PyObject(PyCPointerType):
_type_ = PyObject
not as far as I know
like can I customize how LP_PyObject gets converted when it's a Structure member
I think you need to use memory patching
!e ```py
import gc
class magic:
def length_hint(self):
return 1
def __iter__(self):
global weird
for obj in gc.get_objects():
if isinstance(obj, tuple):
try:0 in obj
except SystemError:
weird = obj
return
yield
try:tuple(magic())
except:pass
print(weird)
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
(<NULL>,)
ok
wtf ๐ฅด
so uh - has someone reported that bug? The tuple probably shouldn't be getting tracked by the GC (and thus discoverable through gc.get_objects()) until after it's in a valid state
gc.get_objects() get all living instances in python?
!e```
import gc
print(len(gc.get_objects()))
@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.
4026
big
It's been known for a while afaik
don't all tuples start GC-tracked before they're released?
!e```
from gc import get_objects
print(len(get_objects()))
@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.
4027
how does the from..import version have more instances than import version
probably - that seems like a bug, though
isn't the gc module unreachable?
I first posted code with that bug in 2021
wdym
this
@warm breach
you imported get_objects, that's another gc tracked reference
sorry i pinged the wrong person
but should gc be unreachable after get_objects was imported
no, it's still in sys.modules
@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.
4035
There are many objects that classes are made up of
__name__,__qualname__,__module__str__bases__,__orig_bases__tuple__subclasses__()list__mro__list__annotations__,__dict__dict__flags__,__dictoffset__,__itemsize__, ... int
Yes, it is possible to make custom get and set functions without using memory patching. One way to do this is to subclass the desired data type, such as PyCPointerType in your example, and define custom get and set methods as class methods or properties. Then, you can use instances of the subclass to perform operations with custom behavior.
I don't think ctypes even recognizes the subclass as a ctypes type though
trying to make it a Structure field will have an error that it's not a ctypes type
Yes, you are correct. Subclassing a ctypes data type does not automatically make the subclass a recognized ctypes type. To use your subclass as a field in a ctypes Structure, you need to register the subclass as a ctypes data type using the ctypes.POINTER function.
Here's an example:
from ctypes import *
class LP_PyObject(c_void_p):
pass
LP_PyObject_p = POINTER(LP_PyObject)
class MyStructure(Structure):
fields = [("obj", LP_PyObject_p)]
I mean, the whole point was having a custom POINTER type that I can override from_param on null values
you sound an awful lot like ChatGPT
I believe @warm breach was refering to the automatic unwrapping that basic ctypes types have
I think just overriding my structure __getattr__ is probably the least cursed way to do it though, then I could have this work
from einspect import view, NULL
n = view(int).tp_as_number.contents
print(n.nb_add is NULL)
# False
print(n.nb_matrix_multiply is NULL)
# True
that would work
not sure about Array types made with LP_PyObject * n though
is subclassing ctypes.Array a thing
or is it one of those dynamic type types
Im taking a look rn to see if you can modify Array type unwrapping
it looks like Arrays use their proto get/set funcs @warm breach
Pointers are a bit weirder tho
yeah looks like I can just
class MyArray(ctypes.Array):
_length_ = 3
_type_ = ptr[PyObject]
seems length has to be known at define time though 
you can just make a class factory
I suppose it's not too different from what I do now with dynamically ptr[PyObject] * 3
yea
I have no idea how to even make a class of a pointer type though
use _ctypes._Pointer and set the _type_
can I just define from_param or something for that
wait no
that's python to ctypes
noooo I deleted my nicely crafted message ๐ฆ
!e
from ctypes import *
from ctypes import _Pointer
from einspect.structs import PyObject
class LP_PyObject(_Pointer):
_type_ = PyObject
@classmethod
def from_buffer(cls, buffer):
print("in from_buffer")
return super().from_buffer(buffer)
@classmethod
def from_param(cls, param):
print("in from_param")
return super().from_param(param)
@classmethod
def from_address(cls, address):
print("in from_address")
return super().from_address(address)
class MyObject(Structure):
_fields_ = [
("ob_refcnt", c_ssize_t),
("ob_type", LP_PyObject),
]
x = MyObject.from_address(id(5))
print(x.ob_type)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
<__main__.LP_PyObject object at 0x7f62a1c124e0>
seems like it doesn't call any of those
you would need to wrap the internal C functions to get them called
afaik
eh honestly might not do this
would also break usages of assignments of pointers after getting them from structs
src/einspect/structs/py_object.py lines 64 to 67
if obj_ptr:
obj_ptr.contents.DecRef()
# Set new
obj_ptr.contents = PyObject.try_from(value).with_ref()```
== NULL is a thousand times easier since the class just does its own __eq__ and compares whatever it wants
crazy idea, __matmul__ alias for Structure.from_address? ๐ฅด
from einspect.structs import PyFloatObject
obj = PyFloatObject @ id(1.5)
print(obj.ob_fval)
>> 1.5
!e ```py
from ctypes import *
from fishhook import hook
@hook(type(Structure))
@hook(type(c_void_p))
def matmul(cls, addr):
return cls.from_address(addr)
print(py_object @ (id(1) + 8))```
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
py_object(<class 'int'>)
Hey peeps! Quick question about match statement mechanics
I'm told that this (following) desugars to an if-else ladder with approximately O(n) lookup time
match some_character:
case 'a':
...
case 'b':
...
case 'c':
...
But what about this...?
match some_character:
case 'a' | 'b' | 'c':
...
I could see a smart optimization step seeing this and converting the characters into some kind of set
4 0 LOAD_FAST 0 (some_character)
5 2 DUP_TOP
4 LOAD_CONST 1 ('a')
6 COMPARE_OP 2 (==)
8 POP_JUMP_IF_FALSE 8 (to 16)
10 POP_TOP
6 12 LOAD_CONST 0 (None)
14 RETURN_VALUE
8 >> 16 DUP_TOP
18 LOAD_CONST 2 ('b')
20 COMPARE_OP 2 (==)
22 POP_JUMP_IF_FALSE 15 (to 30)
24 POP_TOP
9 26 LOAD_CONST 0 (None)
28 RETURN_VALUE
11 >> 30 LOAD_CONST 3 ('c')
32 COMPARE_OP 2 (==)
34 POP_JUMP_IF_FALSE 20 (to 40)
12 36 LOAD_CONST 0 (None)
38 RETURN_VALUE
11 >> 40 LOAD_CONST 0 (None)
42 RETURN_VALUE```
15 0 LOAD_FAST 0 (some_character)
16 2 DUP_TOP
4 LOAD_CONST 1 ('a')
6 COMPARE_OP 2 (==)
8 POP_JUMP_IF_FALSE 8 (to 16)
10 POP_TOP
17 12 LOAD_CONST 0 (None)
14 RETURN_VALUE
16 >> 16 DUP_TOP
18 LOAD_CONST 2 ('b')
20 COMPARE_OP 2 (==)
22 POP_JUMP_IF_FALSE 15 (to 30)
24 POP_TOP
17 26 LOAD_CONST 0 (None)
28 RETURN_VALUE
16 >> 30 DUP_TOP
32 LOAD_CONST 3 ('c')
34 COMPARE_OP 2 (==)
36 POP_JUMP_IF_FALSE 22 (to 44)
38 POP_TOP
17 40 LOAD_CONST 0 (None)
42 RETURN_VALUE
16 >> 44 POP_TOP
46 LOAD_CONST 0 (None)
48 RETURN_VALUE```
this is the bytecode, respectively
you can see it's identical
Awesome!
Thanks
How do you get this bytecode? Pass the source code as a string to dis?
!e py import dis dis.dis(''' match some_character: case 'a' | 'b' | 'c': ... ''')
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 0 0 RESUME 0
002 |
003 | 2 2 LOAD_NAME 0 (some_character)
004 |
005 | 3 4 COPY 1
006 | 6 LOAD_CONST 0 ('a')
007 | 8 COMPARE_OP 2 (==)
008 | 14 POP_JUMP_FORWARD_IF_FALSE 1 (to 18)
009 | 16 JUMP_FORWARD 17 (to 52)
010 | >> 18 COPY 1
011 | 20 LOAD_CONST 1 ('b')
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/ohosimuyav.txt?noredirect
Just be careful, make sure it's faster:
!timeit ```py
x = 'q'
x == 'a' or x == 'b' or x == 'c' or x == 'd'
@radiant garden :white_check_mark: Your 3.11 timeit job has completed with return code 0.
2000000 loops, best of 5: 160 nsec per loop
!timeit ```py
x = 'q'
x in {'a', 'b', 'c', 'd'}
@radiant garden :white_check_mark: Your 3.11 timeit job has completed with return code 0.
5000000 loops, best of 5: 46.4 nsec per loop
Good to know it is in fact faster!
yup that's a handy optimization
...which breaks if you introduce a module-level constant
๐ฆ
this would be a good use case for macros
because then all usages of the macro would expand to a set literal which would then turn into a single frozenset at compile time
๐
eh, not sure it deserves expanding the language with such complex feature
if you really want faster lookups, create the frozenset once explicitly
these are not functionally equivalent though
x == 'a' or x == 'b' or x == 'c' or x == 'd'
x in {'a', 'b', 'c', 'd'}
!e
def char_match(some_char):
match some_char:
case "a" | "b" | "c":
return 1
case "d" | "e" | "f":
return 2
case _:
return None
print(char_match("a"))
print(char_match([1, 2]))
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 1
002 | None
!e
def set_match(some_char):
if some_char in {"a", "b", "c"}:
return 1
elif some_char in {"d", "e", "f"}:
return 2
else:
return None
print(set_match("a"))
print(set_match([1, 2]))
@warm breach :x: Your 3.11 eval job has completed with return code 1.
001 | 1
002 | Traceback (most recent call last):
003 | File "<string>", line 10, in <module>
004 | File "<string>", line 2, in set_match
005 | TypeError: unhashable type: 'list'
and custom types can == a str but not nessasarily be hashable
was there not already a PEP for macros?
!pep 638
Hey internals people
that's called "surgeons"
Should the literal 0b123 be lexed as an invalid-integer token, or as two tokens (0b1 and 23)
i would assume an invalid integer token
0b123
^
SyntaxError: invalid digit '2' in binary literal
``` that's what Python does
as opposed to ``` Input In [47]
0b1 23
^
SyntaxError: invalid syntax
the mailing list has been talking about macros recently
i dont know how i feel about it honestly
for all its benefits its still a massive change
it gives a huge amount of possibilities but I feel the danger is it will be used in a lot of places as well
when everyone is using custom parsed macros it's harder to look at python code and be able to tell what's going on
which I feel like is kind of where rust macros are at
a lot of things that really shouldn't be macros are macros in rust libraries because why not and everyone wants to do something "magical" for themselves
since python doesn't really have a compile time (that at least it can provide to macros) it will be limited to literals, which I kind of question how useful it would be
wdym by limited to literals 
maybe I'm misunderstanding how this works
does the compiler compile the ast at bytecode time? or is it fully runtime
the source gets parsed into an AST and then compiled to bytecode by the compiler
so it's just an AST object that is inlined in the bytecode?
it seems using it will be fairly complex
I imagine you'd implement them as code that runs at compile time and outputs something like an AST
but there are other options
how would the macro creator resolve clashing local / global names or something
PEP 638 probably discusses this, I haven't read it recently
sounds like it might have good implications for rewrite libraries though or FFI
numba.njit is pretty much only interested in AST so a macro would be perfect there
It seems unreasonable to assume that people will start using it "just because". People don't tend to use metaclasses Just Because, or import hooks, or .pth files, or __init_subclass__, or any of the other tons of customization points that the language provides for advanced use cases
Python programmers tend to be pretty judicious about only using a more magical feature when the alternatives would provide a much worse user experience.
though under that standard what would really justify as needing a macro?
for example this is cool but currently numba works just fine with decorators
it's just a minor performance overhead (which considering LLVM and everything else it's almost negligible)
pytest could potentially use macros for its assert rewriting, instead of the nastiness that it does today. There was a proposal to implement match/case using syntactic macros, rather than adding it to the language - or even to trial it with macros as a library to decide on the desired syntax and semantics, and then upgrade it to a first class language feature.
fair yeah
not sure how tooling support will go though
even rust IDEs have a hard time giving completions in macros
people keep asking for macros in order to build DSLs in Python - from that PoV, it makes sense that completion would be tough, since you're effectively in the domain of a new language when you're using a DSL
would type checkers and linters even be able to parse python without running run-time code
That would depend on the implementation, I suppose. If the implementation of the macros is generating Python code dynamically while creating the AST, they'd need to match that
That is, the static analysis tools would need to run compile time code, I guess
But static analysis tools already can't understand all sorts of stuff you can do in Python.
but they can check syntax without runtime code right?
I think I can see a fairly comprehensive implementation of this but it sounds more like a python 4 level feature
Not necessarily, no. They can't handle https://pypi.org/project/cstyle/ for instance
but ideally we do want macros to be statically analyzable?
๐คทโโ๏ธ
I guess it would be a nice to have, but I don't think it's a requirement
if we're losing IDE syntax checks and type checks for macros I'm not sure how good of a trade that is over strings / decorators and type hints (or whatever we use instead of a macro currently)
dataclasses are already not statically analyzable for type checkers, for instance - they all needed to add special support recognizing and special casing them
that's just a type thing though, this changes ast
pytest changes the AST.
hm, how?
It rewrites assert statements
In order to include information about why an assertion fails
yeah but that doesn't concern what the user writes right
you don't need a special ast support to have IDE completions for writing pytest tests
(which, not saying all macros must be invalid python ast, but just that it seems they can be now)
intellij / pycharm currently supports injected language ast natively
but it seems pylance / vscode has decided not to
Sure, and it wouldn't if pytest could use a macro to do its assertion rewriting, either.
that's assuming we just treat the macro as in-line python ast and have all the normal rules of statements and types
Just because macros could modify the AST doesn't imply that all uses of macros would be totally opaque to static analysis, is all I'm saying. Pytest runs tests with a modified AST, and pytest tests are understood by static analysis tools. In the pytest case, static analysis tools work fine because it rewrites assert to do something nearly totally compatible with what it would ordinarily do
My biggest concern with macros is how easy they make it to obfuscate code. I have actually seen C code where someone did #define BEGIN {, #define END }, and #define LOOP for. (Actually I think LOOP might have been a little fancier, but I don't remember how.) The result was completely illegible to anyone but the original author; several years later he admitted this had not been a good idea.
I don't want to tell people that all macros are evil, because they're not. But I feel like they're different from other advanced language features because they're so easy to use. There's a steep barrier before most people even know what a metaclass is, but there's very little to prevent you from littering your code with awful macros.
(Though I have to admit that I'm a little tempted to see if there's a sneaky way to convert ! into a factorial operator.)
It wouldn't be the first time that an advanced feature was made easy to use and got overused, I guess. That's basically the situation with namespace packages today
isn't this an example of how pytest does not need a macro?
if all you're doing is things you don't need a macro for, why use one in the first place?
Python is Turing complete, so you never need a macro.
In the pytest case, it manages because it's the runner. Instead of importing your code, it compiles, rewrites, and executes the rewritten code. That technique isn't broadly applicable, it only really works for frameworks. And it's a lot of work.
I think I'm more referring to whether it makes a difference in the public api experience
like np.einsum
Oh, I'm only half right there. It does import your code, bit only after installing its own import hook. I'm right that that only works because it's importing you and not the other way around, though.
how would using pytest with macros look like anyways?
Why not also consider how it affects the implementation of the library? https://www.pythoninsight.com/2018/02/assertion-rewriting-in-pytest-part-4-the-implementation/ describes how pytest does what it does, and it's incredibly complex
Why knows, they don't exist so we'd just be guessing at hypothetical syntax
I'm assuming with macros we'd have pytest without type inference or attribute suggestions, so I'm not seeing how that's better
I'll agree it'll make the library simpler, but I don't see how the end product is better
which is purely an assumption, but how would an IDE know what the macro would do to your statement
Why would it need to affect the end user experience at all? I don't think that follows
and if types and attributes are still valid
Lots of things make a difference in the experience. This is really a question of aesthetics, not functionality. There is nothing you can do in Python that you can't also do in C, Rust, Lisp, various assembly dialects, BASIC, etc. Part of why I like Python is because I like its aesthetics. Judicious use of macros can make certain things clearer and easier. But I expect that if they're easy to use, there will be codebases where they're used in preference to function calls (with some flimsy justification like "it avoids the overhead of setting up a stack frame"), and those will be horrible to work on.
having things like IDE autocomplete work, flake8, mypy, etc.
It already doesn't know what assert will do, and it works fine anyway
huh?
it does the same thing as normal python assert
No, it doesn't
I'm talking about the statement after assert
it's a valid normal python statement
where names need to exist and normal rules need to be obeyed
but knowing whether or not it can be is not statically inferable (again assuming, but it seems to be the case from the pep)
once your IDE sees the macro all bets are off about what is valid inside
that's already the case for assert in pytest
you're saying that the expression fed as an argument to assert is still using the names visible in the function scope, and so the IDE can blindly use its existing inference machinery without needing to know that assert has been replaced and isn't the normal assert statement anymore.
That's true, but that's only because pytest implemented it that way. There's nothing that would stop pytest from injecting a name into the scope that that expression is evaluated in, for instance.
and if pytest was implemented with macros, it would still make the same guarantees about what names are visible in the expression that's fed as an argument to its asserting macro. Because that's the contract that it wants to provide to its users.
so macros could indicate "this is a normal python statement" somewhere I guess?
or they can indicate they have custom ast and the IDE can skip parsing name and other checks for that part?
perhaps, but that's not what I'm getting it. My point is that it's certainly not the case that static analysis tools would need to throw their hands up and give up whenever they encounter any macro, just as it's certainly not the case that static analysis tools need to give up whenever there's any AST rewriting happening
it depends entirely on what the macro/AST rewriting does
if it injects new variables, or changes the flow of control, or something, then sure, they might get confused. If it just expands to a bunch of valid Python statements, they probably won't.
so we'll have macros but tooling won't actually work if they do anything outside of what we could already do without macros?
we should probably aim for actually working inferencing and ast parsing that something like rust at least tries to do in macros
we can already do everything without macros - you can literally rewrite files at import time
if you're looking only for things that require macros to do, you won't find any.
you can't have a library that you just import and be able to write your own ast-valid statements right?
like say... np einsum
don't you need to run python with a special argument for this to work?
no, you just need to call ideas.examples.fractions_ast.add_hook() before importing your code.
hello people, i have just started out in python and i installed pycharm, but the text isnt colourful, why is it, and how can i fix it?
try asking in #editors-ides
your file needs an extension .py
okay thank you for guiding!
is that all?
oh yes, it worked flawlessly, thank you @warm breach
this works through an import hook, rewriting the AST when new modules are imported. It's the same trick that pytest plays, basically, except pytest rewrites assert and this rewrites /
but with macros you can do the same in the current code you're writing right?
๐คทโโ๏ธ you can do that with an import hook by installing it and then re-importing yourself, I assume
not if your code contains syntax errors
(removing your entry from sys.modules in the middle)
it will fail at compile time before imports run
true, for that you'd have to use something like https://aroberge.github.io/ideas/docs/html/lambda.html
hm... curious, can you do anything with this or is it limited to replacing valid python identifiers
you can do anything with it.
that's you defining a function that gets called with the bytes of your .py file, and that returns a str that will be compiled
ah it hooks the string source?
yep. That lets you make any textual substitutions you'd like on the contents of the file
honestly why does python still support custom encodings ๐ฅด
I'd be willing to bet there are people using this trick for real DSLs.
and even without all of these tricks, it's always been possible to read a file written in some DSL language of your choice, transpile it to valid Python code, and then exec that Python code, all from a Python session.
macros aren't giving you any new capability in that sense - they're just making it easier to use, and making it integrate more nicely with the rest of the language
eh...
I agree it would be nicer but
macros aren't giving you any new capability in that sense
not sure if this can really be said though
it's kind of like saying making tuples mutable doesn't give us any new capability, we could mutate them via ctypes all along
I disagree - ctypes is jumping through an FFI and breaking out of the Python languages. All of those things I've been linking are things that can be done in the Python language
a lot of what you describes are already only supported in CPython
C is also an implementation detail of CPython
I think other implementations support coding comments, but I'm certain other implementations support import hooks
In any case just because something is possible through some hack doesn't mean it deserves a place in the formal language
if macros are added they should be due to their own merits
which it seems like it may be already
sure, obviously. But those merits can't be "must enable you to do things you couldn't do with macros", because it's already possible to do anything at all without macros, whether by rewriting bytecode or ASTs or the text of the imported module on the fly.
or indeed, by generating Python code and exec'ing it
well, the implementation wise it'll be vastly different from the current import hooks or custom encodings
the pep also describes a runtime specification for the parser and a static value / type structure for static inferencing?
as well as restrictions on side effects a macro can have
it seems it will be a fairly complex reference implementation if we get one
I'm just arguing that this
isn't a good argument, because you could use it to apply to absolutely everything. Given sufficient setup (.pth files, usercustomize.py, sitecustomize.py, PYTHONSTARTUP, etc) you can already modify any Python file to do something drastically different than what a static analysis tool thinks it will do
Isn't the whole point of macros for better static analysis though? over source rewrites with .pth and encodings
that's one possible advantage. I don't think it's the whole point - it's hard to imagine any implementation of macros that would be more arcane and difficult to set up and use than hacking in custom import hooks to rewrite ASTs
but the difference is the macros would be a standard that we would adhere to, and tools could possibly support
Python Enhancement Proposals (PEPs)
also I assume these could be offered support by IDEs in some way
though performance wise repeatedly rerunning macro preprocessors isn't too ideal
also I guess python would finally have non-syntax compile time errors? ๐
or, runtime errors in the preprocessor?
does that count as compile time or runtime 
can preprocessors use macros themselves
why is PyObject_DelAttr not stable ABI?
though PyObject_SetAttr and PyObject_HasAttr already are
isn't SetAttr with the last argument set to NULL equivalent to DelAttr?
Include/abstract.h line 101
#define PyObject_DelAttr(O, A) PyObject_SetAttr((O), (A), NULL)```
it's not in the stable ABI because it's not in any ABI
there's other macros that have been added to the stable ABI as functions, though
I'm just gonna pretend it exists ๐ฅด
@bind_api(pythonapi["PyObject_SetAttr"])
def SetAttr(self, name: str, value: object) -> int:
"""Set attribute `name` of the PyObject. Returns -1 on failure."""
def DelAttr(self, name: str) -> int:
"""Delete attribute `name` of the PyObject. Returns -1 on failure."""
return self.SetAttr(name, ctypes.py_object())
I've only needed to limit myself to the stable ABI once, and I found the experience pretty painful. There's so many convenience things that are missing from the stable ABI, forcing you to reimplement stuff yourself
I still don't know how to set a list slice from c api
both PyList_GetSlice and PyList_SetSlice only work on start:end without steps
and start end need to be computed as real indices (not negative)
you can't do steps even from Python, right?
!e ```py
x = list(range(10))
x[::2] = [0, 0, 0, 0, 0]
print(x)
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
[0, 1, 0, 3, 0, 5, 0, 7, 0, 9]
TIL! I had no idea that worked.
ls[::-1] is popular for reversing
I'm guessing that that's just handled manually somewhere in the implementation of list, then
!e
ls = [1, 2, 3, 4, 5, 6]
ls[0:6:2] = ["a", "b", "c"]
print(ls)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
['a', 2, 'b', 4, 'c', 6]
assignment steps work as well
yeah, that's special cased. https://github.com/python/cpython/blob/main/Objects/listobject.c#L2953-L3085
!e had to implement that myself for tuple set slice ๐
from einspect import view
t = (1, 2, 3, 4, 5, 6)
view(t)[0:6:2] = ("a", "b", "c")
print(t)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
('a', 2, 'b', 4, 'c', 6)
that's easy enough to deal with, though - you add the length of the list to any negative index to get its positive counterpart
Objects/object.c lines 1003 to 1012
PyObject_SetAttr(PyObject *v, PyObject *name, PyObject *value)
{
PyTypeObject *tp = Py_TYPE(v);
int err;
if (!PyUnicode_Check(name)) {
PyErr_Format(PyExc_TypeError,
"attribute name must be string, not '%.200s'",
Py_TYPE(name)->tp_name);
return -1;```
is there a way to get errors like this that return -1 instead of NULL
ctypes.pythonapi automatically raises NULL returns with errors
different C functions have different conventions for errors. NULL is the most common one but not all functions return a pointer
some don't return any error sentinel, forcing you to check yourself after every call
if PyObject_SetAttr returns -1, that means that PyErr_Occurred() is true, and you can fetch the exception that occurred with PyErr_Fetch
oh huh
!e
from ctypes import *
SetAttr = pythonapi.PyObject_SetAttr
SetAttr.argtypes = [py_object, py_object, py_object]
SetAttr.restype = c_int
class Foo:
pass
SetAttr(Foo, [], 123)
@warm breach :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 10, in <module>
003 | TypeError: attribute name must be string, not 'list'
it's automatic as well? 
I guess -1 is also special cased?
can no pythonapi function return -1 as a real value then
I'm sure that's not the case. Without checking the implementation, I'd bet that pythonapi always calls PyErr_Occurred(), and always propagates an exception if so
ah yeah probably
I guess that's why PyDict_GetItem segfaults when it returns NULL with restype py_object
since it doesn't set an exception
actually, thinking more about it, I'm betting that ctypes doesn't do anything special to handle an exception having been raised. You're calling ctypes via the normal eval loop, and the normal eval loop has machinery in it to propagate exceptions
I don't think ctypes.pythonapi needs to check whether an exception occurred - it just blithely ignores it, and then the eval loop that made the call into ctypes notices that the exception indicator is set and propagates the exception
@pliant tusk made slot deletes work now
from einspect import view
del view(int)["__pow__"]
try:
print(2 ** 65)
except TypeError as e:
print(e)
view(int).restore("__pow__")
print(2 ** 65)
unsupported operand type(s) for ** or pow(): 'int' and 'int'
36893488147419103232
nice
@warm breach I just noticed a super weird bug #esoteric-python message
@feral island do you know if it is possible to download the exact disk image that the eval command uses? I want to run that binary in a debugger
!gh pythondiscord/snekbox
I know nothing about the internals of the bot
ah whoops, misread your roles
you may know the other side of the bug tho, do you know if there are any conditions where PyCLEAR will decref a pointer but not set it to NULL? Because thats what I think might be happening
thanks, saw it
I reproduced it on ubuntu 3.11
on windows there is no segfault
but on windows with PYTHONDEVMODE=1 will segfault with Windows fatal exception: access violation
that is even weirder
on my macbook it gives SystemError (which is what it should be doing given the C code that runs)
PYTHONMALLOC=debug will make it segfault on windows
oh in 3.12 there is no segfault
3.12.0a4 windows:
print(corrupt.__reduce__())
^^^^^^^^^^^^^^^^^^^^
SystemError: NULL object passed to Py_BuildValue
thats what it should be doing in 3.11 as well
wait no
3.12.0a4 ubuntu, prints with no segfault:
(<built-in function iter>, (<function at 0x7f7cf79e9f80>, 0))
wtf
yea this is a weird one
most recent branch from github?
it's an error to make a call to a Python C API while the exception indicator is already set, with few exceptions
I get this too on 3.11 ubuntu
thats what you should get, but it seems that on some platforms the Py_CLEAR isnt properly clearing the pointer
what compiler did you build with?
makes sense. I don't have too much knowledge here. I looked at the definition of Py_CLEAR and it does seem like it should always set its arg to NULL (unless perhaps there's a threading race condition, but that seems unlikely here)
yea this shouldnt be a threading issue, I wonder if it is the compiler optimizing something out incorrectly on specific platforms
the compiler is (almost) never wrong
thats the only reason I can think for the bug only happening on specific platforms
or some flags that are passed cause this
if it's happening on more than 1 platform, with different compilers, the odds of it being a bug in two different compilers is basically 0
the much more likely explanation is that there's undefined behavior in CPython
so summary
- 3.11 , windows
- 3.12.0a4, windows
(<built-in function iter>, (<function at 0x000001B8D6A8C720>, 0))
- 3.11, windows,
PYTHONMALLOC=debug - 3.12.0a4, windows,
PYTHONMALLOC=debug
Windows fatal exception: access violation
> exit code -1073741819 (0xC0000005)
- 3.11, ubuntu
(<built-in function iter>, (<function at 0x7fb772c3c4a0>, 0))
> terminated by signal SIGSEGV (Address boundary error)
- 3.12.0a4, ubuntu
(<built-in function iter>, (<function at 0x7f3480d71f80>, 0))
- 3.12.0a4, ubuntu,
PYTHONMALLOC=debug
Fatal Python error: Segmentation fault
oh yea fair enough, I just don't see anything in the code it hits that looks like it would be undefined behavior
could be a use-after-free, if PYTHONMALLOC=debug is changing the behavior
that'd be my first educated guess, before looking at the code at all...
try it with valgrind or asan, perhaps... (and PYTHONMALLOC=malloc)
also this was me accidentally using python 3.12.0a4 I built with debug mode
the resulting bug would be a use after free, since Py_CLEAR is supposed to clear iter->seq_callable
otherwise I can't get the SystemError in release binaries
and the rest of the code there sets up the Py_CLEAR to happen after iter->seq_callable is checked
but it should be NULLed out by Py_CLEAR and raise a SystemError Exception
weird, I only get SystemError on my machine(s) so far
on 3.11?
hm, my 3.11 on ubuntu was built with GCC via pyenv
windows 3.11 is using binary from python.org
hm so the SystemError would happen if only one of it_callable and it_sentinel is NULLed out?
Yea, the SystemError is triggered inside PyBuild_Value
what pointer is it that Py_CLEAR ought to be clearing and you think it isn't?
It->callable
Objects/iterobject.c line 208
calliter_iternext(calliterobject *it)```
From callable_iterator
specifically I think the branch on line 226 should clear both it_callable and it_sentinel
It should and the fact that it is segfaulting means that it is at least decreasing the refcount
well, the crash is happening in tuplerepr, because the first element of the tuple is an already freed object
Yea that's where the use after free actually gets hit
I wouldn't assume that, I think there's memory corruption somewhere and anything could be happening
But I'm pretty sure the root cause is Py_CLEAR
I was assuming that because I have printed functions directly after freeing them and that's what it typically looks like
Because the func_name is cleared it displays like that
But we could guarantee with a custom callable and a __del__
this isn't true, it's PyEval_GetBuiltin that is NULL
The system error is triggered in PyEval_GetBuiltin?
well at least, it checks for both fields being non-NULL, so the SystemError can't be from only one of them being NULL
the one I get on 3.12 debug build triggers here https://github.com/python/cpython/blob/main/Python/modsupport.c#L472-L489 in do_mkvalue
Objects/iterobject.c lines 243 to 244
return Py_BuildValue("N(OO)", _PyEval_GetBuiltin(&_Py_ID(iter)),
it->it_callable, it->it_sentinel);```
the call to _PyEval_GetBuiltin to find the iter builtin is calling Cstr.__eq__, which exhausts the iterator, causing the Py_CLEAR in calliter_iternext to be executed, setting it->it_callable and it->it_sentinel to NULL. But the order of evaluation of arguments in a function call isn't specified, and modifying an argument by evaluating another argument is a bug
this is basically the same bug as printf("%d %d\n", i++, i++); just in a trickier package.
if it->it_callable and it->it_sentinel are evaluated before _PyEval_GetBuiltin(&_Py_ID(iter)) then it passes pointers to freed objects to Py_BuildValue, and if they're evaluated after then it passes null pointers.
that's unspecified behavior, not undefined, actually - but same difference.
does this really need to use _PyEval_GetBuiltin(&_Py_ID(iter))?
can't it just do a direct call to the object
it'd do something entirely different if it didn't look up __builtins__.iter, right?
hm, does __reduce__ need this behavior? (fetching iter from __builtins__)
nice work, thanks for finding this
can you take it from here? ๐
sure, I'll file a bug and PR tonight
or if anyone else here wants to make a CPython contribution, feel free to do that and ping me for a review
the fix should just be hoisting the _PyEval_GetBuiltin(&_Py_ID(iter)) out of the if, and then adding a big comment explaining why that's necessary
yep and a unit test
I made python/cpython#101765 if you want to post that there
Ah I didn't know that the order of arguments is unspecified
wonders of C ๐ฅด
Hello, I'm stuck on a problem and I can't seem to figure out a solution. I have a dictionary whose key item is a class object. When I change that key item class object's .name attribute, the key no longer works in the original dictionary. Is there anyway around this?
I had it right the first time, it's undefined behavior and not unspecified. Check the "undefined behavior" section at the bottom of https://en.cppreference.com/w/c/language/eval_order
This is hitting case (2) there.
don't include the name in the hash. It's up to you to define a __hash__ that doesn't use any mutable fields.
ehm, sorry can you elaborate a bit? I'm a little new to Python still ๐ฆ
you'd have to show your code for me to be any more specific than that
Hopefully this makes sense. key_signal is an object that's used as a key to ldf._signal_representations dictionary. It works when I do not modify the key_signal object members, but when I update the .name member of key_signal, the key no longer works in the ldf._signal_representations dictionary
but the memory addresses look intact
There's a lot of places where that pattern happens and I think they all would need to be fixed
dicts in Python are implemented using a data structure called a hash map. The idea behind a hash map is that, when you look for a key, you only need to look at other objects with the same hash code, and you can totally ignore every key in the hash table except for ones that have the same hash code. When you change the .name attribute, that changes the object's hash code: https://github.com/c4deszes/ldfparser/blob/06e9cd02f5fbf120de112c92df22a588279ffa55/ldfparser/signal.py#L44-L45
Which is why the dict stops being able to find that object.
ldfparser/signal.py lines 44 to 45
def __hash__(self) -> int:
return hash((self.name))```
ohh ok
I know the problem now, but how do I fix it?
just get the hash and update the dictionary and remove the old instance?
why does reduce need to get itself from builtins dict anyways?
haven't seen that outside reduce
Knowing this, there's actually a lot of places where you can produce undefined behavior with this pattern. I'll start sifting through the ones that I know about and see if they're actually triggerable.
It needs to get it so that it can return the information needed to reproduce the object
calliter_reduce(calliterobject *it, PyObject *Py_UNUSED(ignored)) couldn't this just get ob_type of it here
No because ob_type is not iter it's callable_iter which cannot be constructed from python code
@raven ridge
I was genuinely confused for a moment ๐
Hey internals people. You're the one's to talk to when it comes to nuances problems
I've got a question happening over here, if anyone want's to weigh in https://discord.com/channels/267624335836053506/1073382454028664902
do we have iter tests? or would it go in pickletester
there are a lot of places that this pattern exists, not just callable_iter. most of the default iterable types have it
like the builtins dict call?
Objects/tupleobject.c lines 1049 to 1056
tupleiter_reduce(_PyTupleIterObject *it, PyObject *Py_UNUSED(ignored))
{
if (it->it_seq)
return Py_BuildValue("N(O)n", _PyEval_GetBuiltin(&_Py_ID(iter)),
it->it_seq, it->it_index);
else
return Py_BuildValue("N(())", _PyEval_GetBuiltin(&_Py_ID(iter)));
}```
hm, yeah all of these appear affected
Makes me wonder how often similar patterns exist
how come compiling with debug mode makes the systemerror work?
does it change the arg eval order?
Nasal demons
because anything like function(nested_call(), object->member) would have the bug if nested_call can be manipulated into calling python code that can manipulate object->member
seems like vs knows something is off as well ๐ฅด
Only if it can manipulate that pointer, not if it manipulates the pointed to object
ah yea your right unless the pointed to object is checked for some condition that you can then bypass
seems like a lot of calls can though
tbh I found this while working on my list of different C functions that can eventually call into python code
although this didnt add any as it just ends up at PyDictGetItemWithError
It's a very persistent source of issues in the interpreter that calls into Python code can invalidate assumptions or state of C code higher up on the stack. There's a ton of ugly stuff in the interpreter just guarding against cases where this can occur
yea most of the bugs I have reported are caused by this exact issue
why is __builtins__.__dict__ mutable anyways ๐
the repl uses that to add help and license
I wonder if you could add some sort of global flag that can be used as part of a sort of PROTECT macro. Basically at the start the flag would be set to 0, then if bytecode is executed the flag would be set to 1. then C code can check that flag to see if it needs to re-adjust assumptions
to see if it needs to re-adjust assumptions
If it's able to re-adjust its assumptions, then it's also able to just not make those assumptions at all
that might be slow for very hot paths as well
the overhead of the check would probably overcome the performance benefits in the branched "assumption" path
ah yea fair enough
static PyObject *
listiter_reduce_general(void *_it, int forward)
{
PyObject *list;
+ PyObject *builtin_iter;
+ PyObject *builtin_reversed;
/* the objects are not the same, index is of different types! */
if (forward) {
_PyListIterObject *it = (_PyListIterObject *)_it;
if (it->it_seq) {
+ builtin_iter = _PyEval_GetBuiltin(&_Py_ID(iter));
return Py_BuildValue("N(O)n", builtin_iter,
it->it_seq, it->it_index);
}
} else {
listreviterobject *it = (listreviterobject *)_it;
if (it->it_seq) {
+ builtin_reversed = _PyEval_GetBuiltin(&_Py_ID(reversed));
return Py_BuildValue("N(O)n", builtin_reversed,
it->it_seq, it->it_index);
}
}
/* empty iterator, create an empty list */
list = PyList_New(0);
if (list == NULL)
return NULL;
return Py_BuildValue("N(N)", _PyEval_GetBuiltin(&_Py_ID(iter)), list);
}
these look kind of weird tbh
could probably get away with just PyObject *builtin; and using that variable in both places
is there really no frozen constant pointer to builtins?
__builtins__.__dict__ holds the only reference to them?
__builtins__ can change depending on what frame you are evaluating
so you can't use a constant frozen pointer if there is some utility lib that changes some
This doesn't look correct to me. You need to call _PyEval_GetBuiltin before you check if (it->it_seq)
because (presumably) calling _PyEval_GetBuiltin can unset it_seq
ah yea that would fix the SystemError exception from passing NULL to Py_BuildValue
currently this hits SystemError since it->it_seq would be NULL
if called before it would go to the empty iterator
ok, but every SystemError is a programming bug
I guess the empty case is more correct?
yeah.
every SystemError is raised because code inside the interpreter or an extension module has a bug.
if you're able to provoke the interpreter to set a SystemError from one of its own calls, that's proof there's a bug in the interpreter ๐
when you can't import help and license
!e ```py
help
@pliant tusk :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 1, in <module>
003 | NameError: name 'help' is not defined
!e```
from sitebuiltins import *
print(help)
@gray galleon :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 1, in <module>
003 | ModuleNotFoundError: No module named 'sitebuiltins'
!e```
from _sitebuiltins import *
print(help)
@gray galleon :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 3, in <module>
003 | NameError: name 'help' is not defined
could I just copy this into every iter_reduce
/* _PyEval_GetBuiltin can invoke arbitrary `__eq__` code
* calls must be *before* access of _it pointers
* since C/C++ parameter eval order is undefined.
* see issue #101765 */
or is that too much repetition
you also have to do help = _Helper()
Wouldn't hurt. I don't think you need to specify __eq__
this won't work with a star import though
!e ```py
from _sitebuiltins import _Helper
print(help := _Helper())
@rose schooner :white_check_mark: Your 3.11 eval job has completed with return code 0.
Type help() for interactive help, or help(object) for help about object.
!e```
from _sitebuiltins import _Helper
help = _Helper()
print(help)
@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.
Type help() for interactive help, or help(object) for help about object.
help is not instantiated in _sitebuiltins smh
given that it is almost always a singleton
That thread is closed, so I'll weigh in here.
The structure seems very odd to me. Normally, I think of tokenizers as extracting a single token at a time. Your seems to follow a pattern where you look for as many three-long tokens as you can, and if you can't find any, then you look for as many two-long tokens as you can, and if you still can't find any, then you look for one-long tokens. Maybe your functions return at most a single token? I can't tell. Regardless, it seems very odd: Ultimately, for each possible token, you want to know either it appears at the present location (True) or not (False). That boolean structure does not appear very clearly in your code.
A more common pattern for a hand-crafted lexer, I think, is to attempt to match your first token possibility; if it matches, yield it; if not, attempt to match your next token possibility; and so on. This turns tokenization into a simple loop: You just iterate over regular expressions, one for each token, testing each one for a match; when you get a match, you return the matched characters and advance your input pointer by the length of the match.
For an automatically generated lexer, you use the same idea, but you combine all the regular expressions into a single DFA. This is faster (when implemented correctly). (You could replicate this effect in pure Python by taking all the regular expressions you're interested in and combining them with | branches. To find out which token you matched, you examine the match object.)
If your lexer is a more general grammar (like we've discussed here for handling indentation and f-strings) then you'll need a more complicated strategy. But in this case, the right strategy really depends on the complexity of the grammar you're parsing.
how would you make a DFA in python tho
given that it has no goto
You rely on re to do a good job of that.
You could build the state machine yourselfโyou just assign every state a number and have an outer loop which looks at the number and determines the possible transitionsโbut it's going to be terribly slow compared to re.
skywalker's intention is not to use regex i think
Well, it's totally possible. You just end up with a big table.
I can understand how you might have gotten that impression, but in actuality I'm doing what you're describing. Obviously, you need to match tokens whose prefixes are longest first. This way, the token >>= will match before the token >>, which will match before >
that'd be a dict mapping state number to state handler function, and each state handler would return the number of the next state to transition to. Or something like that.
what i had in mind is trampolining lol
which is a technique to optimize tail calls
then you can write each state as a function
and a transition as a tail call
I need to yield from each of the "sub tokenizers" because a one or two of the methods for creating tokens yield multiple in a single pass (namely, newlines trigger the creation of a newline token as well as indents or dedents). Thus, I've got to either yield multiple in one pass or else cache any tokens beyond the first, and address them in the next loop
The thread you saw was me asking about a more graceful way of short-circuiting the outloop upon collecting tokens from one of the inner tokenizers
I actually built a DFA generator
using yield from?
A few weeks ago. It turned out well, and I'm going to revisit it for the lulz when this hand rolled lexer is done
Nooooooo, totally different thing
that will create a lot of frames
so how
Lemme just find the code
It isn't perfect, but it was at least mostly operational. What you end up with is a big directed graph of states, connected by transitions. Each tick of an outer loop you'd query the next character, and then move from the current state of the next accordingly
Long term, though, you'd convert the data structure to an if-else FSM in raw source code
so you just simulate a dfa bruh
_>
<_<
That depends on your definition. As far as I'm concerned, a DFA is an abstract specification which might be implemented in any number of ways
Question
About python's handling of escaped newlines
def escaped_newline(self) -> str:
if self.observe(0) != '\r' and self.observe(0) != '\n':
raise SyntaxError(f"backslashes must be immediately followed by newlines")
if self.observe(0) == '\r' and self.observe(1) == '\n':
return self.advance(2)
if self.observe(0) == '\r':
return self.advance(1)
if self.observe(0) == '\n':
return self.advance(1)
Does that about cover it? The backslash is already consumed by the time the function is entered. Thus, check to make sure that a some kind of newline occurs after it, consume that newline, and otherwise raise an error?
Leading whitespace on the next line will just be ignored the same as any other whitespace, and there are no indentation considerations which need to be made?
what does observe(0) do?
@warm breach here is all of the places I have found so far that trigger undefined behavior due to the same issue as callable_iter
class A:
def __len__(self):
return 0
def __getitem__(self, name):
raise StopIteration
types = [
([A],),
([list], range(64)),
([bytes], 64),
([bytearray], 64),
([tuple], range(64)),
([lambda:(lambda:0), 0],)
]
def do(item):
(callable, *flag), *initializer = item
corrupt = iter(callable(*initializer), *flag)
class Cstr:
def __hash__(self):
return hash('iter')
def __eq__(self, other):
[*corrupt]
return other == 'iter'
builtins = __builtins__.__dict__ if hasattr(__builtins__, '__dict__') else __builtins__
oiter = builtins['iter']
del builtins['iter']
builtins[Cstr()] = oiter
try:
print(callable, corrupt.__reduce__())
except Exception as e:
print(callable, e)
for typ in types:
do(typ)```
๐
I did this as well
Look at the next character
do you intend to support files with legacy Mac line endings?
nothing has used \r as a line ending for over 20 years
but my backwards compatibility! /s
yeah, Windows uses \r\n, and everything else that still exists uses \n
Gotta love that consistency
Mac OS 9 and earlier used to use \r
That aside
With respect to handling escaped newlines
I just need to make sure the backslash is followed by a newline and consume it, right? And otherwise throw an error?
seems reasonable - I can't think of anything else that can come after a \ outside of a string literal
I think this is still amenable to the kind of structure I described, as long as you make some minor adjustments. Think of each potential token as a pair. One entry of the pair is a regular expression that matches when you find the token. The other entry is sequence that you emit. The point of this division is that it lets you separate the yes/no question of whether you have a match from the question of what to do if you did match. The structure is still a loop over regular expressions (or still a DFA). When you match, you look up the sequence to emit for that token and yield from. Something like:
for tok_re, new_toks in token_data:
if tok_re.match(input_str):
... # Update internal state
yield from new_toks
break
else:
raise SyntaxError
@warm breach I feel the test can go into Lib/test/test_iter.py
Objects/methodobject.c lines 176 to 184
static PyObject *
meth_reduce(PyCFunctionObject *m, PyObject *Py_UNUSED(ignored))
{
if (m->m_self == NULL || PyModule_Check(m->m_self))
return PyUnicode_FromString(m->m_ml->ml_name);
return Py_BuildValue("N(Os)", _PyEval_GetBuiltin(&_Py_ID(getattr)),
m->m_self, m->m_ml->ml_name);
}```
it looks like builtin functions / PyCFunctionObjects might also be affected
but I couldn't reproduce with your example structure
oh I guess we'd have to mutate m_self or ml_name in eq
I wonder if you could do something even more evil where Py_BuildValue allocates a new tuple -> GC is triggered -> that mutates the object
should the scope of this be all _reduce methods with pointer access alongside _PyEval_GetBuiltin or only things that could be affected by python code?
technically in this one m_self and ml_name shouldn't be mutable
but the call order is still UB
hm now that you found the same issue in a bunch of other tests I think it makes more sense to group the tests
I currently am just running https://paste.pythondiscord.com/zakeseguca
will try to fit it into test_iter somewhere
I don't think it's UB if there's no way the PyEval_GetBuiltins call can have a side effect on the other arguments
but I feel like it's cleaner to just always do the PyEval_GetBuiltin call separately so it's more clear the behavior is safe
also I think the UB is a bit of a red herring here. It doesn't really matter what order the args are evaluated, what matters is that the PyEval_GetBuiltin call has a side effect that invalidates the earlier if statement
(A(),),
(list(range(64)),),
(bytes(64),),
(bytearray(64),),
(tuple(range(64)),),
((lambda: 0), 0),
all of these ones are for sure affected with segfaults on 3.11
good work
well, the unbound thing is what causes the segfault sometimes as opposed to SystemError
right, but it's a bug either way
but moving before the if, fixes the systemerror also
so there was kind of 2 levels of bug here I guess
Really just one bug with two possible effects depending on argument evaluation order, I'd say
The bug being that PyEval_GetBuiltin is modifying the object in a way that violates the invariants of the running __reduce__ call.
is there some way to modify __builtins__.__dict__ for the test but not affect the other tests
maybe we should just run your test in a subprocess
you could also restore the old builtins after the test
Monkeypatch it in a context manager's __enter__, restore it in __exit__
Or a subprocess, but that's way slower and far more overhead...
Which adds up, especially if you're adding a bunch of similar tests
I mean...
if it fails it might segfault and crash, not sure how much __exit__ will help
we'll count segfaults as test failures ๐
It won't, but that's not what you're restoring it for. You're restoring it for the case where the test succeeds, because the fixes are applied, and you need to put things back into a sane state for the next test to run
def test_reduce_mutating_builtins_iter(self):
# Backup of original iter
builtins = __builtins__.__dict__ if hasattr(__builtins__, "__dict__") else __builtins__
orig_iter = builtins["iter"]
def run(item):
(fn, *flag), *initializer = item
corrupt = iter(fn(*initializer), *flag)
class CustomStr:
def __hash__(self):
return hash("iter")
def __eq__(self, other):
list(corrupt)
return other == "iter"
_iter = builtins["iter"]
del builtins["iter"]
builtins[CustomStr()] = _iter
return corrupt.__reduce__()
types = [
([EmptyIterClass],),
([bytes], 8),
([bytearray], 8),
([tuple], range(8)),
([lambda: (lambda: 0), 0],)
]
self.assertEqual(run(([str], "xyz")), (orig_iter, ("xyz",), 0))
self.assertEqual(run(([list], range(8))), (orig_iter, ([],)))
for case in types:
self.assertEqual(run(case), (orig_iter, ((),)))
# Restore original iter
del builtins["iter"]
builtins["iter"] = orig_iter
I think I'll do this?
try:
self.assertEqual(run(([str], "xyz")), (orig_iter, ("xyz",), 0))
self.assertEqual(run(([list], range(8))), (orig_iter, ([],)))
for case in types:
self.assertEqual(run(case), (orig_iter, ((),)))
finally:
# Restore original iter
del builtins["iter"]
builtins["iter"] = orig_iter
might be simpler than having a context manager
Sure. It's effectively the same thing, every context manager can be rewritten as a try/finally. But splitting it out into a context manager might let you reduce duplication and copy/pasting between tests
!e
x = list[int]
it = iter(x)
print(repr(next(it)))
print(it.__reduce__())
@warm breach :x: Your 3.11 eval job has completed with return code 1.
001 | *list[int]
002 | Traceback (most recent call last):
003 | File "<string>", line 5, in <module>
004 | SystemError: NULL object passed to Py_BuildValue
SystemError path is pretty simple to reproduce with this example even
hm
what should be done about genericaliasobject here? Moving the _PyEval_GetBuiltin will still result in SystemError
static PyObject *
ga_iter_reduce(PyObject *self, PyObject *Py_UNUSED(ignored))
{
gaiterobject *gi = (gaiterobject *)self;
return Py_BuildValue("N(O)", _PyEval_GetBuiltin(&_Py_ID(iter)), gi->obj);
}
gi->obj is NULL when the iterator is exhausted
I'm guessing we need a
if (gi->obj)
return Py_BuildValue("N(O)", iter, gi->obj);
else
return Py_BuildValue("N(())", iter);
kind of surprised that wasn't there before
Tomorrow is the day
That I wrap my mind around whatever the hell python does to handle leading tabs and spaces
Okay, so, walk me through this
I'm not sure I'll be able to sleep until I've given this a bit of effort
I know python get grumpy about mixed tabs and spaces, but, I know it can in the very least handle tabs followed by spaces
And I know it does some kind of normalization, converting every tab to exactly eight spaces
Beyond that, hows it all work?
are you trying to parse python?
In the interest of simplicity, I'll just say yes
I've read the docs โ the lexical analysis document at least. Many times. And I think I understand most of it
But I do my best learning by way of Q&A
"Tab characters count as one, then round up to the nearest multiple of eight."
Wut?
as an example
3 spaces + 1 tab = 4 characters
those characters are rounded to 8 (because of the tab)
so the final amount of indentation is 8 spaces
it will do that for every tab character it encountera
thats how i interpret it
def round_to(x):
return 8 * round(x/8)
indentation = 0
while char := self.next_char():
if char == ' ':
indentation += 1
if char == '\t':
indentation += 1
indentation = round_to(indentation)
Like this?
yeah
it keeps 2 numbers: one for total "space length" of the indent and one for keeping track of the number of spaces/tabs used in the indent
those have to always be consistent with the top of 2 stacks, one for the "space length" and the other for the number of spaces/tabs
Hmmmm
I might have to give this problem a little more thought, and employ another method. Apparently Python's handling of whitespace is one of the reasons it can't support multiline lambdas
really?
And that's an absolute must me for (though I intend to go with the much more attractive => syntax)
Supposedly, yeah. Something to do with switching of context between whitespace sensitivity and whitespace agnosticism
Which sounds like a job for the lazy lexer I've already got planned to handle fstrings, now that I think of it
look at nim for a language that manages to have both
Siiiiiiick. I'll do that tomorrow
it's totally possible to do multiline lambdas in Python since they're expressions
and expressions ignore indents
I had a feeling. I've always had the impression that the hatred for multiline lambdas has always been more about dogma than anything. Especially now in the era of async (and hence, callbacks as arguments), the extra flexibility is important
unless indents determine what part is in the lambda and what part isn't
a core part of python is the strict separation between statements and expressions, which multiline lambdas very fundamentally cannot be
lambdas were meant to be for convenience and so was allowed only one expression in its body
anything else should use a def
though honestly, that whole idea is pretty simply incorrect, separating the two leads to a worse language
On account of the fact that it's an expression that contains statements?
ye
as evidenced by about every modern language except go.
I had never considered that. That said, it seems perfectly palatable in that you've got an expression with an isolated environment in it.
You guys are too smart for your own good. I'll be back tomorrow to soak up some more knowledge
XD You know what happens when you leave smart people alone too long? Programming languages.
And no good can come from such things
i'm trying to make a programming language but i made it too powerful, which meant it required too much work, which caused me to lose motivation because i'm a lazy person
i'm still waiting for the motivation to spike again
It's a design decision. It was decided that an expression can't contain a "block" structure.
other langs can do fine without that decision, and easily so if they're "expression-oriented" (i.e. everything is an expression)
That's far too big of a concept for me to grapple with as of right now
What I'll say is that I'm glad I took my time with this lexer. Everyone has been "gently encouraging" me to just slap something together and move on to "the more important things"
Which isn't exactly an unwise position. But taking the pains to really consider everything has let me take a much longer, deeper view of what I want. I'd have hated to have written a poor lexer, move on to the parser, and then half way through realize I need scrap it all and start over because I hadn't considered multiline lambdas early on
which is a shame```py
wonโt work :sob:
btn.on_click = lambda:
print('hello')
print('hi')
eww, yet another named function
def btn_on_click():
print('hello')
print('hi')
btn.on_click = btn_on_click
if python multiline lambdas are a thing then it's the wrong thing for any job
a def is the "only one obvious way to do it"
despite having a name which seems to be a constant problem for a lot of programmers
if that way is suboptimal in these use cases, better have another way
if โonly one obvious wayโ is allowed, might as well remove lambda altogether
that's what guido wants
and you agree with that unironically
lambda is pretty convenient for use cases that evaluates and returns only one expression
the majority of needed anonymous function uses in python satisfies that requirement
a multiline lambda is pretty much too niche and its costs outweigh the benefits and frequency of use
could allow def btn.on_click(): ...
that is until you write event driven code
and that is a big enough use case
some of the costs include:
- special handling for statements in an expression
- new grammar
- more handling in the compiler
the compiler isn't so easily readable and by extension maintainable (i've experienced it myself)
^ this is better
the tokenizer is even worse and that's where all the indent/dedent handling happens
โซit's already very hard to explain and adding special cases for statements in expressions will just make it harder to maintain
sure
wonโt work in the general case```py
btn.add_event_handler('click', btn_on_click)
I'd imagine frameworks would offer expose an attribute for the callbacks if it was a possibility, but yes it is somewhat limited as it doesn't solve the lambdas in args
ig at that point you can just demand the gui framework to try something else more pythonic?
the convention in python for this kinda stuff are decorators
if you really wanna use a lambda here in this specific case you'll just do ```py
btn.on_click = lambda: (print('hello'), print('hi'))
"But how about other cases like this, that, etc?"
that's where you just use `def`
it's easier to just use `def` instead of arguing for something that doesn't make sense cost-wise
just use _ as a name
yeah, that's easiest
not being able to do assignment in lambdas has been the most annoying thing for me with event callbacks
for every python statement there is an expression equivalent
although you probably won't like the equivalent
Yes, doing setattr isn't exactly nice
like```py
@btn.event_handler("click")
def _():
...
ye
So what exactly are the implications of PEP 649 being accepted?
I see it won't break pydantic and the like but for the future, do we think more libraries that support type hints based features are going to pop up?
all of this conversation gave me an idea
how about ruby inspired blocks```py
btn.event_handler("click") do:
...
should support all those use cases
most importantly it is cleaner-looking than the current โdefine a named function then use it as an argโ approach
and does not introduce named functions
people like typing
so demand for them wonโt go down
Less maintenance overhead and a cleaner implementation of annotations overall? Not all that much from a consumer perspective. Annotation-handling code using (good practice) typing.get_type_hints() will be unaffected, and code using eval just needs to skip that call. External tools that parse python won't be affected much, as pep 563 support already means forward references in annotations.
I'd say lambdas being so limited is annoying on a fairly regular basis. Certainly, anytime you want to do something slightly more complicated in a list or dict comprehension, I'd much rather have nice lambdas.
I don't really get the whole python thing of "multi line lambdas are niche".
In other languages that have both good lambdas and nested functions, you still see multiline lambdas used a lot.
there's nothing massively different about python that makes that not the case. Just folks looking at a python downside and justifying it, rather than simply admitting that it's a downside.
I wouldn't say unaffected, some code currently write to the __annotations__ string dict expecting get_type_hints to be affected, and the proposed new future co_annotations import will mean existing code importing future annotations will stop evaluating forward references and likely break when annotations is deprecated (which, we have never deprecated a future import)
cough cough more complicated parsing
sure, that's a downside for implementers though, not users ๐
a valid reason not to have multi-line lambdas
but not a valid reason not to admit that it's a downside
Is there any plan for a alternative forced statically typed mode for 3.12 or 3.13 (where i believe jit will be added ?)
no i think
And will the annotations be ever used in the jit compiler ?
Ahh okok
the only way is to use external typechecking tools
Yeah I am aware of this
I am just wondering if python will ever use the annotations while jitting code
python annotation system is pretty inconvenient tbh
Why ?
I wish they overhauled it a bit
So that we would be able to add type qualifiers and type modifiers
But it doesn't make sense if they never use it in jit
can this be fixed with the same syntax ?
yes
pep 649
you mean the types need to be defined before we use it ? (you cant annotate a method of a class with the class itself, it needs to be a string or requires the import statement)
?
ahh okok I was right then ^
quick question @feral island , are we allowed to use functools.partial in test_iter?
sure, why not?
not sure
thought it was supposed to not depend on anything or
hm...
NameError: name 'reversed' is not defined
Warning -- Unraisable exception
Exception ignored in: <module 'threading' from '/home/ionite/repos/C/cpython/Lib/threading.py'>
Traceback (most recent call last):
File "/home/ionite/repos/C/cpython/Lib/threading.py", line 1571, in _shutdown
for atexit_call in reversed(_threading_atexits):
^^^^^^^^
NameError: name 'reversed' is not defined
did you not put it properly back in builtins?
ah it was a separate test failure
apparently format exception happens before finally and fails there due to the patches
wait no it doesn't, I just had a failed del in finally, will fix
also currently empty reversed iter reduce results in iter instead of reversed, is that intentional?
Hey smart people
Question about lexing, parsing, and context sensitivity. I hust read this little article here: http://trevorjim.com/python-is-not-context-free/
I think it's fine since either way it's an empty iterable
or iterator rather
It makes the obvious claim that lexing and parsing as Python (and I assume many other languages) is not context free. A lexer shouldn't (puritanically speaking) store any internal state except its current position in the input stream. In practice, who cares. Keeping a stack representing indentation/parentheses is a value add, without any real drawbacksโข๏ธ
But he does raise the point that "we're, technically, using the wrong tools for the job". Alternatively, it might be that we're doing the job inside out in some way. This begs the question: what's the alternative?
so essentially
>>> it = iter(reversed([]))
>>>
>>> try:
... next(it)
... except StopIteration:
... pass
...
>>> it.__reduce__()
(<built-in function iter>, ([],))
Hm. That would unpickle to an object of a different type. That seems not great...
Might be worth filing a bug report for that, too...
I feel like the exact iterator type is an implementation detail. You pickle an empty iterator, you get an empty iterator back
You don't think it's reasonable to ```py
assert type(it) == type(pickle.loads(pickle.dumps(it)))
That's reasonable, but until the type discrepancy causes a real-world issue I'm not sure it's worth changing. In some cases it may not be practical to create an empty iterator of the same type.
True, though in this case it is
I'm not sure it's worth fixing, but it's probably worth reporting so at least there's a record of it in Google and some documentation about why it wasn't worth fixing
Assuming it hasn't been reported already ๐
not that it's changed by this PR, it's still like that now
just that something even more dangerous happens when builtins dict access mutates reversed
Right, that's why it warrants a totally separate bug report
is there something you can do with reversed list iter and not iter
actually there is technically one behavior change for generic alias iter
it now also returns a plain iter on __reduce__ of an exhausted one
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
(<built-in function iter>, (tuple[()],))
Looks like it always returns a plain iterator, exhausted or not
That doesn't seem changed, either.
!e ```py
it = iter(reversed(""))
list(it)
print(it.reduce())
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
(<class 'reversed'>, ((),))
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
(<built-in function iter>, ([],))
yeah it's only for lists
reversed() on a list gives a different type than general reversed() I believe
Strange. That seems very odd
Ah. Ok, that makes sense
Well, then - this is the existing behavior for pickling an exhausted reverse list iterator, right?
static PyObject *
reversed_reduce(reversedobject *ro, PyObject *Py_UNUSED(ignored))
{
if (ro->seq)
return Py_BuildValue("O(O)n", Py_TYPE(ro), ro->seq, ro->index);
else
return Py_BuildValue("O(())", Py_TYPE(ro));
}
seems reversed reduce calls Py_TYPE instead of builtins dict access
we care about the reduce for iter(reversed) though, right?
Regardless, calling that at a different point (before the Py_BuildValue call) doesn't result in a behavior change
iter(reversed(...)) returns itself it seems
for anything besides list
I think the one-obvious-way doesn't really work in practice...
It doesn't matter that it's already kind of weird, we can fix the bug without worrying about its other weirdness
incidentally why is reversed() in enumobject.c of all places
Because
- Languages and language features evolve over time. What's obvious today is not obvious tomorrow. That's why we have 4 ways of formatting strings, not 1.
- That kind of assumes the language creator has thought about all possible use cases of all the users.
It's pretty clear from modern usage that sometimes people prefer lambdas over def'd functions.
You could go all the way in on the one-obvious-way philosophy and remove lambda. But that would make a lot of existing code extremely verbose, with function names that don't add any meaning
It's okay to offer options ๐ and it's true that there is such a thing as too many options.
Good points all around
At the end of the day, though
Multiline lambdas are nice. They're useful, they're pretty, and people want them
I want them, lots of other people want them
Purity be damned, that's what I'm going to give them
As an aside, we don't have four ways of formatting strings. We have two ways โ fstrings and .format(). We also have two atrophied, wholly intolerable vestigial means of formatting that don't bear thinking about
string concatenation will be forever in my heart
also, %-formatting is used in logging ๐ though the utility is questionable
I think #4 was meant to be string.Template?
replace my 4 with 5 then ๐
Five XD
and then add number 6 standing for all the templating engines...
I do use string concatenation occasionally. Suppose you want to implement the repr for a tuple. I would do it like this:
def __repr__(self) -> str:
return "(" + ", ".join(map(repr, self)) + ")"
``` (let's just ignore recursive repr handling...)
Hehe, I'm not sure I'd count straight up concatenation as a method formatting
But yeah, sometimes its the only tool for the job
%-formatting is useful for creating binary strings. I also used it recently for generating TypeScript code so I wouldn't have to keep writing {{
These are the alternatives, it seems ```py
def repr(self) -> str:
return "({0})".format(", ".join(map(repr, self)))
def repr(self) -> str:
return "(%s)" % [", ".join(map(repr, self))]
def repr(self) -> str:
return f"({', '.join(map(repr, self))})" # yuck
def repr(self) -> str:
amogus = ", ".join(map(repr, self))
return f"({amogus})"
Yeah I've definitely seen it used with code generation when you have {}s
Oh, there's also this ```py
def repr(self) -> str:
return "".join(["(", ", ".join(map(repr, self)), ")"])
Seems like the counter is 7 now
it's too bad that the logging using %, at least by default
ideally, what you really want is to pass logging functions lambdas that return a string, rather than actual strings. then the logging framework decides whether to evaluate them.
Then you could just use f-strings for logging and still be perfectly efficient
lambdas again ๐
also lambda is such an awkward keyword
assuming that the cost of creating a lambda doesn't exceed the cost of creating an f-string
especially for a language that tries to emphasize readability, in theory

