#internals-and-peps
1 messages Β· Page 8 of 1
You want to make sure to relock int afterwards, otherwise you could hit code paths that will cause a segfault
hm
was wondering if I could just do what this does in python to bypass the Py_TPFLAGS_IMMUTABLETYPE check
https://github.com/python/cpython/blob/main/Objects/typeobject.c#L4333
Objects/typeobject.c line 4333
type_setattro(PyTypeObject *type, PyObject *name, PyObject *value)```
but looks like it calls a lot of internal methods
The first version of fishhook used to reimplement type_setattro in python
https://github.com/chilaxan/fishhook/blob/2859cef28ece63aa31d4a6456bd9610a7bed03fa/fishhook/__init__.py as you can see it was way more complex
(Yes it did sort through every method on every object subclass to calculate offsets for dunders)
I initially just tried doing
from einspect.views import MappingProxyView
MappingProxyView(int.__dict__).mapping["__getitem__"] = lambda self, index: "abc"
and the direct dunder call works
(5).__getitem__(50)
>> abc
but subscripting still wouldn't
5[1]
>> TypeError: 'int' object is not subscriptable
Oh that also would not update the method cache, so you would run the risk of it vectorcalling the wrong address. Need to call PyTypeModified
What PyTypeModified is doing? Is it checking all class dictionary, finds dunders and updates all slots?
It recursively invalidates the type cache
And more recently notifies any watchers of a given type that it has changed
so um, what is this thing doing 
https://github.com/chilaxan/fishhook/blob/master/fishhook/fishhook.py#L101-L107
fishhook/fishhook.py lines 101 to 107
def patch_object():
'''
adds fake class to inheritance chain so that object can be modified
also patches type.__base__ to never return fake class
in theory is safe, if not, possible alternative would be injecting a class
into all lookups by modifying type.__bases__?
'''```
would patching inherited classes not work without this?
Patching inherited classes would be fine, this is needed to allow for hook(object)
To see why, comment out patch_object, then hook(object) some dunder, then make a new class
In some versions it triggers a segfault due to assumptions made about object
like class Foo(int) or such?
tbh i cant remember off the top of my head let me look at my notes
was wondering if that had anything to do with how you backup the original slot function
since after unlocking and setattr the cfunc is literally overwritten right?
oh no thats done seperately, i back it up using orig python has the wiring to grab a given func pointer out of a slot_function if it is set into a class
@warm breach without patch_object() this happens
with patch_object() classes can be initialized properly
hm 
I can replicate that with fishhook but not with my thing somehow
its a side effect of me allocating structs, the crash happens inside of PyType_Ready
ah, wait which ones do you allocate?
all 4 of them
isn't there only these?
PyAsyncMethods as_async;
ah hm
does tp_as_buffer matter?
does cpython list anywhere which one of these may be null 
not for my code, you cannot change it with python functions, tp_as_buffer is only for classes defined in C
nope
this is where the crash happens if you hook(object) without patch_object()
for now, PEP 688 will change that
good to know, ill have to special case that in fishhook
@warm breach the reason the crash happens is because type is the new type (which is a heap class so it has all of the tp_as_* structs, and base is object (which now has tp_as_number due to our hook). line 6265 grabs object->tp_base which is NULL (without patch_object), and then access tp_as_number on that NULL pointer
so patch_object avoids that by adding a type to the chain before object that does not have any tp_as_* structs
π makes sense, thanks
yea, that was actually a big bug in the first version of fishhook for a while
Wait is PEP 688 accepted?
I was under the impression that the buffer protocol would never be accessible from the python layer because it involves interacting directly with PyBuffer objects
Or at least not accessible directly
read PEP 688 π
do you have a fork with it implemented?
also @pliant tusk your code helped defeat a CPython optimization π https://github.com/python/cpython/pull/100818 (check the grep.app link)
π¬ whoops lol, the irony is that file is hacky as hell and should not be used as an example of code in the wild, it implements brainfuck using list slicing
https://github.com/JelleZijlstra/cpython/blob/pep688v2/Objects/typeobject.c#L8570-L8576 so you wrap the returned memoryview? (also I would probably make a point that __buffer__ should support returning anything that follows the buffer protocol, not just memoryview)
Objects/typeobject.c lines 8570 to 8576
wrapper = PyObject_GC_New(PyBufferWrapper, &_PyBufferWrapper_Type);
if (wrapper == NULL) {
goto fail;
}
((PyBufferWrapper *)wrapper)->mv = ret;
((PyBufferWrapper *)wrapper)->obj = Py_NewRef(self);
_PyObject_GC_TRACK(wrapper);```
yeah these contortions were necessary to prevent various bad scenarios and to enable __release_buffer__ to work correctly
yea I was reading the first version of that function and thinking about bad scenarios
what file is PyBufferWrapper implemented in?
typeobject.c
(can't search forks, annoying github)
it's a very simple type
Objects/typeobject.c lines 8488 to 8492
typedef struct _PyBufferWrapper {
PyObject_HEAD
PyObject *mv;
PyObject *obj;
} PyBufferWrapper;```
btw how do you drop into the (clion I think?) debugger from python code like that?
you need to compile python locally (git clone ... && ./configure && ./make -j4) then you run it outside of clion. then you can open clion in the python git repo and use attach debugger to process
im sure there is a better way than using attach but it has worked for my purposes pretty well
huh..
I guess it'd have to be a REPL or something?
or it wouldn't really last long enough to attach to?
or you can stick a wait loop somewhere in your code
if im testing something that I don't want to be running in the repl then i just add a input() call at the top of the file
like i said, not a perfect solution, but it works
(afaik, you can also attach clion to an already running gdb/lldb session, but i haven't done that)
This looks pretty complete, it'll be interesting to see if it is accepted
o I have
what's the func_type mode in compile() for?
does anyone know why i cant speak in the help section
it looks like its for parsing this syntax for type hints (argtype, ...) -> return_type into an ast.FunctionType? I have no idea why
when you use annotations with from __future__ import annotations, annotations are parsed and then unparsed and stored as strings:
>>> from __future__ import annotations
>>> def f() -> a+b: ...
...
>>> f.__annotations__
{'return': 'a + b'}
>>> def f() -> 1+2: ...
...
>>> f.__annotations__
{'return': '1 + 2'}
``` This is implemented in C, there is a restricted AST unparser somewhere that is doing that.
discord updated colors?
it looked like this
yes but it just gets converted into an annexpr_ty in Python/compile.c then this is done https://github.com/python/cpython/blob/main/Python/compile.c#L2386-L2392
Python/compile.c lines 2386 to 2392
static int
compiler_visit_annexpr(struct compiler *c, expr_ty annotation)
{
location loc = LOC(annotation);
ADDOP_LOAD_CONST_NEW(c, loc, _PyAST_ExprAsUnicode(annotation));
return SUCCESS;
}```
there is literally no use for func_type
is setting attributes not currently supported by fishhook?
from fishhook import hook
@hook.property(int)
def __dict__(self):
return {"a": 1, "b": 2}
>> AttributeError: attribute '__dict__' of 'type' objects is not writable
β«for type objects
it isn't even supported by normal classes ```pycon
class A:pass
...
A.dict = {}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: attribute 'dict' of 'type' objects is not writable
π
from einspect.types import ptr
from einspect.structs import PyTypeObject, PyObject
obj = PyTypeObject.from_object(list)
d = {"x": 1, "__str__": 2}
obj.tp_dict = ptr(PyObject.from_object(d))
print(list.__dict__)
>> {'x': 1, '__str__': 2}
print(list.x, list.__str__)
>> 1 2
how about one with a property
hm?
β«?like assign a property to it
well, you can but
print(list.__dict__)
^^^^^^^^^^^^^
TypeError: mappingproxy() argument must be a mapping, not property
the internal call to generate a mappingproxy seems to fail
obj.tp_dict is a *PyObject, so I'm not sure if you can ever put a property there
or you can but it wouldn't actually resolve the property's normal descriptor protocols
this will be an interesting thing to implement
or like try to do
nvm it's easy
to override the .__dict__() descriptor at least
hm
?
it's a property no? .__dict__ where's the call from
>>> import ctypes, fishhook as fh
>>> ctypes.pythonapi.PyType_Modified.argtypes = [ctypes.py_object]
>>> fh.unlock(type)
2148031744
>>> (type.__dict__ | type('',(),{'__ror__':lambda _,o:o})())['__dict__'] = property(lambda _: {"x": 1, "__str__": 2})
>>> ctypes.pythonapi.PyType_Modified(ctypes.py_object(type))
>>> type.__dict__
{'x': 1, '__str__': 2}
yeah but it seems this doesn't affect the actual lookup
print(type.__dict__)
>> {'x': 1, '__str__': 2}
print(type.x)
>> AttributeError: type object 'type' has no attribute 'x'
why does for k in a_dict loop over keys instead of (key, value)
einspect not updated with PyTypeObject
yeah is not in the current one 
i'm trying to make something that'll work with the property
mainly since I'm still going through all the PyTypeObject fields
there's like a million
how to find the offset of tp_dict
@struct
class PyTypeObject(PyVarObject[_T, None, None]):
tp_name: c_char_p
tp_basicsize: int
tp_itemsize: int
tp_dealloc: CFUNCTYPE(None, POINTER(PyObject))
tp_vectorcall_offset: int
tp_getattr: ptr[c_void_p] # TODO
tp_setattr: ptr[c_void_p] # TODO
tp_as_async: ptr[c_void_p] # TODO
tp_repr: CFUNCTYPE(POINTER(PyObject), POINTER(PyObject))
# Method suites for standard classes
tp_as_number: ptr[c_void_p] # TODO
tp_as_sequence: ptr[PySequenceMethods]
tp_as_mapping: ptr[PyMappingMethods]
# More standard operations (here for binary compatibility)
tp_hash: c_void_p
tp_call: c_void_p
tp_str: c_void_p
tp_getattro: c_void_p
tp_setattro: c_void_p
tp_as_buffer: c_void_p
tp_flags: Annotated[int, c_ulong]
tp_doc: c_char_p # Documentation string
tp_traverse: c_void_p # TODO
tp_clear: c_void_p # TODO: delete references to contained objects
tp_richcompare: c_void_p # TODO
tp_weaklistoffset: int # weak reference enabler
tp_iter: c_void_p # TODO
tp_iternext: c_void_p # TODO
tp_methods: c_void_p # TODO
tp_members: c_void_p # TODO
tp_getset: c_void_p # TODO
tp_base: pointer[Self]
tp_dict: ptr[PyObject]
tp_descr_get: c_void_p # TODO
tp_descr_set: c_void_p # TODO
there's still more attributes π
well I ran print(PyTypeObject.tp_dict.offset) and it's 264 apparently
but otherwise I guess just count through source
Yea fishhook can't modify those properties safely so it doesn't work thru hook.property. But you can use find_offset and a classes memory to find the offset for tp_dict then overwrite it. It's just not safe
btw I was meaning to ask, where exactly does the set attributes go after unlocking with fishhook and setattr? It doesn't actually overwrite the tp_... slots right?
!e ```py
from fishhook import find_offset, getmem, pythonapi, py_object, getdict
n= {'a': 1}
lmem = getmem(list)
lmem[find_offset(lmem, id(getdict(list)))] = id(n)
pythonapi.PyType_Modified(py_object(list))
print(list.dict)```
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
{'a': 1}
The py_object is placed in class.tp_dict, and python automatically places the generic handler function in the slot that calls the py_object
so it does override the original c function pointer there?
but how do you get the original function then 
Look at the fishhook code (specifically stuff regarding add_cache and get_cache)
not sure if I fully understand but it seems you make a backup of the slot wrapper and not the original c function directly?
Yea
slot_wrappers hold a reference to the C function directly
So they continue to function even if you change what's in the slot
I was just trying to make a memory copy of, say, tp_repr somewhere and calling a casted call to there to get the original function
but it seems the setattr also affects my copied one somehow 
Show your code?
from ctypes import *
from einspect.structs.py_type import PyTypeObject, reprfunc
st = PyTypeObject.from_object(int)
buf = create_string_buffer(sizeof(st))
memmove(buf, st.tp_repr, sizeof(st.tp_repr))
orig_tp_repr = reprfunc.from_buffer(buf)
@reprfunc
def __repr__(self):
return "hi"
st.tp_repr = __repr__
print(10)
>> hi
but if I do
print(orig_tp_repr(10))
> Process finished with exit code 138 (interrupted by signal 10: SIGBUS)
π
That makes sense you're trying to call memory that is not executable
Also, if I'm not mistaken, you're only copying the size of a function pointer?
You should be able to just store the original tp_repr, and then call it later
reprfunc = PYFUNCTYPE(py_object, py_object)
Like just orig = st.tp_repr
from einspect.structs.py_type import PyTypeObject, reprfunc
st = PyTypeObject.from_object(int)
orig = st.tp_repr
print(orig)
>> <CFunctionType object at 0x10091e150>
@reprfunc
def __repr__(self):
return "hi"
st.tp_repr = __repr__
print(orig)
>> <CFunctionType object at 0x10091e150>
print(orig(10))
>> hi
it seems to be affected by the set somehow
PyTypeObject is a ctypes.Structure with the field ("tp_repr", PYFUNCTYPE(py_object, py_object))
Oh, that's weird it doesn't store the function address, it's stores the address of that address
Yea
I tried doing POINTER(PYFUNCTYPE(...)) but that immediately segfaults so
Yea that would break
I would need to see more of your code to be able to help fix this. I don't know how ctypes and your code is interacting.
hm
I would try to get something reproducible but PyTypeObject is really long
can I not define some fields and just put an offset? Is that a thing
I have no idea, but modifying a structure should not be changing things that you've already stored off of the structure
!e
from ctypes import *
# typedef PyObject *(*reprfunc)(PyObject *)
reprfunc = PYFUNCTYPE(py_object, py_object)
class PyTypeObject(Structure):
# Needs offset +88
_fields_ = [
("tp_repr", reprfunc)
]
st = PyTypeObject.from_address(id(int)+88)
orig = st.tp_repr
@reprfunc
def __repr__(self):
return "hi"
st.tp_repr = __repr__
print(repr(40))
print(orig(40))
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | hi
002 | hi
Yeah, that's weird
!e
from ctypes import *
reprfunc = PYFUNCTYPE(py_object, py_object)
class PyTypeObject(Structure):
# Needs offset +88
_fields_ = [
("tp_repr", reprfunc)
]
st = PyTypeObject.from_address(id(int)+88)
orig = st.tp_repr
@reprfunc
def __repr__(self):
return "hi"
st.tp_repr = __repr__
print(st.tp_repr)
print(orig)
print(PyTypeObject.from_address(id(int)+88).tp_repr)
print(st.tp_repr is orig)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | <CFunctionType object at 0x7ff57cf10870>
002 | <CFunctionType object at 0x7ff57cf106d0>
003 | <CFunctionType object at 0x7ff57cf10870>
004 | False
!e seems to be something on the Structure casting for PYFUNCTYPE
from ctypes import *
reprfunc = PYFUNCTYPE(py_object, py_object)
class PyTypeObject(Structure):
# Needs offset +88
_fields_ = [
("tp_repr", c_void_p)
]
st = PyTypeObject.from_address(id(int)+88)
orig = cast(st.tp_repr, reprfunc)
@reprfunc
def __repr__(self):
return "hi"
st.tp_repr = cast(__repr__, c_void_p)
print(repr(40))
print(orig(40))
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | hi
002 | 40
if manually casting, the orig is unaffected
Yeah, that makes sense, but the other way of doing it should've worked as far as I can tell
Are slots orthogonal to dunder methods? I'm inspecting some C code mypyc generated very carefully (I don't trust my patch :P) and I was surprised it generated two wrappers (one slot, one method) for the underlying dunder functions. I thought if a dunder method wasn't defined, then the C-API / CPython runtime would fallback to calling the slot functions.
I'm a little confused about the difference between slots and methods (for dunder methods specifically).
I suppose the slots are used by the C-API while the (dunder) methods are used by the Python runtime via a PyObject_CallMethod(obj, "__dunder__", NULL) call (or an equivalent)...
so on classes implemented in python a given slot function will parse some arguments, then use PyObject_CallMethod or something similar to call the dunder in the class dictionary. on classes written in C that indirection can be skipped and the slot function can directly do whatever it needs to do
when you create a dunder, the slot is automatically populated and vice versa
not sure how mypyc handles this, but in CPython there are wrappers both ways (slot -> Python dunder and Python dunder -> slot)
slot function
static PyObject *CPyDunder___abs__Duck(PyObject *obj_self) {
PyObject *arg_self;
if (likely(Py_TYPE(obj_self) == CPyType_Duck))
arg_self = obj_self;
else {
CPy_TypeError("unary.Duck", obj_self);
return NULL;
}
return CPyDef_Duck_____abs__(arg_self);
}
method function
PyObject *CPyPy_Duck_____abs__(PyObject *self, PyObject *const *args, size_t nargs, PyObject *kwnames) {
PyObject *obj_self = self;
static const char * const kwlist[] = {0};
static CPyArg_Parser parser = {":__abs__", kwlist, 0};
if (!CPyArg_ParseStackAndKeywordsNoArgs(args, nargs, kwnames, &parser)) {
return NULL;
}
PyObject *arg_self;
if (likely(Py_TYPE(obj_self) == CPyType_Duck))
arg_self = obj_self;
else {
CPy_TypeError("unary.Duck", obj_self);
goto fail;
}
PyObject *retval = CPyDef_Duck_____abs__(arg_self);
return retval;
fail: ;
CPy_AddTraceback("unary.py", "__abs__", 11, CPyStatic_globals);
return NULL;
}
actual dunder logic
PyObject *CPyDef_Duck_____abs__(PyObject *cpy_r_self) {
PyObject *cpy_r_r0;
PyObject *cpy_r_r1;
PyObject *cpy_r_r2;
CPyL0: ;
cpy_r_r0 = ((unary___DuckObject *)cpy_r_self)->_value;
CPy_INCREF(cpy_r_r0);
cpy_r_r1 = PyNumber_Absolute(cpy_r_r0);
CPy_DECREF(cpy_r_r0);
if (unlikely(cpy_r_r1 == NULL)) {
CPy_AddTraceback("unary.py", "__abs__", 12, CPyStatic_globals);
goto CPyL2;
}
CPyL1: ;
return cpy_r_r1;
CPyL2: ;
cpy_r_r2 = NULL;
return cpy_r_r2;
}
For instance, this is how how mypyc implements __abs__.
so on classes implemented in python a given slot function will parse some arguments [...]
This slot function is autogenerated right?
on classes written in C that indirection can be skipped [...]
Makes sense. So when PyNumber_Absolute() is called on this object, it'll always use the slot function?
maybe I should go read the C-API source code, that'll clarify a lot of the confusion probably
it is not autogenerated, it is compiled at compile time, the same c slot function is used for all of the same given dunder on all python classes (like one for all __abs__ implemented in python)
right, I'm dumb.
nah you're all good, slot functions and how they work is a weird part of python internals
ah the C-API only looks at slots, or at least PyNumber_Absolute https://github.com/python/cpython/blob/bc0a686f820d7d298a0b1450b155a717972de0fc/Objects/abstract.c#L1366-L1377
Objects/abstract.c lines 1366 to 1377
PyNumber_Absolute(PyObject *o)
{
if (o == NULL) {
return null_error();
}
PyNumberMethods *m = Py_TYPE(o)->tp_as_number;
if (m && m->nb_absolute) {
PyObject *res = m->nb_absolute(o);
assert(_Py_CheckSlotResult(o, "__abs__", res != NULL));
return res;
}```
So in a loose sense, slots are the C level interface while methods are the Python level interface?
exactly
cool, doesn't help slots (the term) is also used in the Python side sometimes π
oh yea, makes it a bit more confusing
slots on the python side is referring to adding members to python classes to remove the memory footprint of needing a dict on every instance
Thanks (and thanks @feral island too) for explaining! This makes a lot more sense now.
I think my confusion stems from when I read this issue: mypyc/mypyc#839
The part that I was missing when I first read this issue is that I forgot len() is implemented in C so it'd use the length slot. I was translating len() to obj.__len__() in my head, so I assumed Python code can access the slots directly (this makes no sense having written it down, oh well) ... because how else would mypyc messing up the slot cause len() to fail?
mypyc seems to violate the assumption that cpython makes that slots will always be defined if the dunder exists
Yeah that's the bug that issue describes :)
Sooo in theory mypyc could eliminate the slot functions OR methods and things would still work as CPython would create the necessary wrappers to fill whatever is missing?
cpython only does that for classes defined in python, so mypyc will have to manage their slot functions and the methods
Ah okay, that makes sense. Thank you!
what does mean with the tp_getattro slot defining both __getattribute__ and __getattr__?
__add__ and __radd__ also are defined in one slot, how does that work 
On the C level, only __getattribute__ exists - the default __getattribute__ calls __getattr__. For the binary methods, the r versions aren't present in slots - instead the single pointer is used in both cases, and your code has to check the types of both parameters.
The other one with different behaviour is mp_ass_subscript/sq_ass_item - NULL is passed as the value if it's being deleted.
Objects/typeobject.c lines 8173 to 8182
/* There are two slot dispatch functions for tp_getattro.
- _Py_slot_tp_getattro() is used when __getattribute__ is overridden
but no __getattr__ hook is present;
- _Py_slot_tp_getattr_hook() is used when a __getattr__ hook is present.
The code in update_one_slot() always installs _Py_slot_tp_getattr_hook();
this detects the absence of __getattr__ and then installs the simpler
slot if necessary. */```
mp_ass_subscript
Assign, yes.
!e
from fishhook import hook, orig
@hook(int)
def __hash__(self):
return orig(self)
print(hash(100))
@warm breach :x: Your 3.11 eval job has completed with return code 139 (SIGSEGV).
100
any idea why this segfaults
@pliant tusk
that's weird
it doesn't output anything when i do py -X dev
C:\Users\rog>cd C:\Program Files\Python311\Lib\site-packages\fishhook
C:\Program Files\Python311\Lib\site-packages\fishhook>cd ..
C:\Program Files\Python311\Lib\site-packages>C:\Users\user\cpython-main\PCbuild\amd64\python_d.exe -X dev
Python 3.12.0a1+ (main, Dec 18 2022, 10:53:40) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from fishhook import hook, orig
>>> hook.property # just checking that it's the right fishhook
<class 'fishhook.fishhook.hook_property'>
>>> @hook(int)
... def __hash__(self):
... return orig(self)
...
>>> print(hash(100))
100
>>> ^Z
Assertion failed: static_builtin_index_is_set(self), file C:\Users\user\cpython-main\Objects\typeobject.c, line 81
``` used the debug build of cpython that i have
Objects/typeobject.c lines 77 to 83
static inline size_t
static_builtin_index_get(PyTypeObject *self)
{
assert(static_builtin_index_is_set(self));
/* We store a 1-based index so 0 can mean "not initialized". */
return (size_t)self->tp_subclasses - 1;
}```
Objects/typeobject.c lines 99 to 103
static inline static_builtin_state *
static_builtin_state_get(PyInterpreterState *interp, PyTypeObject *self)
{
return &(interp->types.builtins[static_builtin_index_get(self)]);
}```
curious
i tried to make it error but instead it outputted this ```
Assertion failed: !PyErr_Occurred(), file C:\Users\user\cpython-main\Objects\typeobject.c, line 4193
https://github.com/python/cpython/blob/main/Objects/typeobject.c#L4156-L4208
specifically line 4178
where's _PyType_Lookup() used?
is the assert removed in release python
ok i removed the assert() and replaced it with a propagating logic
this is the output ```py
Fatal Python error: _Py_Dealloc: Deallocator of type 'weakref.ReferenceType' overrode the current exception
Python runtime state: finalizing (tstate=0x00007ffaa6ec8360)
Current thread 0x00002524 (most recent call first):
File "C:\Program Files\Python311\Lib\site-packages\fishhook\fishhook.py", line 204 in get_cache
File "C:\Program Files\Python311\Lib\site-packages\fishhook\fishhook.py", line 214 in get_cache_trace
File "C:\Program Files\Python311\Lib\site-packages\fishhook\fishhook.py", line 244 in call
File "<stdin>", line 3 in hash
get_cache()?
well yk that's still not the error i did but i mean we traced it back
I guess something on finalization breaks
def get_cache(code, key):
consts = get_consts(code)
for cache in tuple_iter(tuple_getitem(consts, new_slice(None, None, -1))):
if isinstance(cache, Cache): # <----- ERROR ORIGIN
if cache.key == key:
return cache.value
else:
break # caches are injected at end of consts array
return NOT_FOUND
maybe something on finalization requires the use of int hash?
yeah it overwrites the basic long hash not just PyObject_Hash
actually what even is this type https://github.com/python/cpython/blob/main/Include/object.h#L231-L234
Include/object.h lines 231 to 234
#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x030c0000 // 3.12
typedef PyObject *(*vectorcallfunc)(PyObject *callable, PyObject *const *args,
size_t nargsf, PyObject *kwnames);
#endif```
PyObject *const *args
is that a constant pointer to an array?
yes
if python is optimized via python -O, then __debug__ becomes False, which kills assert (but raise AssertionError should still work, iirc)
C assert()s
one that can only be controlled by compilation
That is weird, but I can replace that check with cache.__class__ is Cache which shouldn't need int hash
doesn't crash in 3.12 now but 3.11 does for some reason
With that change? Or with how it is now?
with that change
Ah, add get_class = vars(object)['__class__'].__get__ on line 198 and use get_class(cache) instead of cache.__class__
That should avoid as many dunders as possible inside orig (and will be my fix as soon as I get up and test it)
still crashes 3.11
Damn ok. I'll have to do more work on it. You just did the int_hash hook and that crashes it?
yes
looks like the stuff i did lol
but without the TpFlags thingy
src/einspect/structs/py_type.py lines 32 to 33
@struct
class PyTypeObject(PyVarObject[_T, None, None]):```
here's PyTypeObject finally with everything typed
i think i lost the file with it
can do all sorts of cursed stuff now π₯΄
from einspect.structs.include.object_h import TpFlags
from einspect.structs.py_type import PyTypeObject
st = PyTypeObject.from_object(int)
st.tp_flags |= TpFlags.BASE_EXC_SUBCLASS
raise 9000 from RuntimeError
I'm impressed that didn't segfault
it does with int < 256 I think
not sure why that is
related to cached smallints?
it's got to be. Maybe it writes some stuff into memory around the "exception" object, and for your big int you got lucky and it wrote into useless memory
but haven't looked at the code
#define PyException_HEAD PyObject_HEAD PyObject *dict;\
PyObject *args; PyObject *notes; PyObject *traceback;\
PyObject *context; PyObject *cause;\
char suppress_context;
wonder how the memory after the int even got interpreted as a PyBaseExceptionObject at all
seems it works even better with larger ints π₯΄
raise 1000
TypeError: print_exception(): Exception expected for value, str found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test.py", line 7, in <module>
raise 2000
int: 2000
Process finished with exit code 1
raise 2**300
Traceback (most recent call last):
File "test.py", line 7, in <module>
raise 2**300
int: 2037035976334486086268445688409378161051468395192701393101146849214080221716593330731614208
Process finished with exit code 1
I guess with large ints it can just write into the int's digits
the value of 2**300 is actually 2037035976334486086268445688409378161051468393665936250636140449354381299763336706183397376
how is it printing that message from though?
(note the last digits are different from your output)
it just calls str() I think
which library is better for handling CSV's ? python's csv library or pandas ?
Python/pythonrun.c lines 1539 to 1543
if (print_exception_recursive(&ctx, value) < 0) {
PyErr_Clear();
_PyObject_Dump(value);
fprintf(stderr, "lost sys.stderr\n");
}```
what does the lost sys.stderr error mean 
I think that gets printed if sys.stderr is gone so it can't print to it
pythonrun.c has to have a lot of handling for extreme edge cases
home-made raise from π
from einspect.structs.include.object_h import TpFlags, reprfunc
from einspect.structs import PyTypeObject
st = PyTypeObject.from_object(tuple)
st.tp_name = b"TupleError"
st.tp_flags |= TpFlags.BASE_EXC_SUBCLASS
st.tp_repr = reprfunc(lambda self: self[1])
raise (0, "some msg", 0, 0, RuntimeError("hi"))
RuntimeError: hi
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test.py", line 9, in <module>
raise (0, "some msg", 0, 0, RuntimeError("hi"))
TupleError: some msg
Process finished with exit code 1
@rose schooner is this the error you got with that change?
(If so I know how to work around that one)
Hi everyone, do think ishashable should be part of inspect module?
I have added an issue but looks like people are not agreeing to add it.
can't you just
try:
hash(a)
except TypeError:
...
```? Most of the `is` stuff in inspect seems related to more complex checks.
but the use-case of inspect is to provide as much info we get about a live object right?
btw, my first tough was to have a ast Node named Hashable which could be the base class of ast.Constant and ast.Tuple and other hashable nodes or maybe adding a new type in the types module named Hashable so in the process of thinking about that I thought maybe inspect should also have it
what about adding a new node named Hashable to ast ? should I raise a issue about this ?
what does this have to do with the AST?
a proxy node Hashable
proxy to ast.Constant ast.Tuple and other hashable nodes
As i need to add more check in my lib to check for all of the hashable nodes
if there was a proxy named ast.Hashable i could use that only
but hashable is a runtime property, it cannot be known from the AST
consider (a,b), if a is [], it's not hashable, but if a=b=1, it is hashable
which python api is currently giving the right answer for this? hash doesn't?
pretty much. maybe in the context of GC it would refer to a reachable object
you can check for the attr __hash__ or isinstance(..., collections.abc.Hashable) but the only way to know for sure is actually running the hash function
but hashable is a runtime property, it cannot be known from the AST
But, I didn't understand this properly, please help me understand this with an example, btw, I am still a begineer
hashability at typing time might be possible for simple literals, but AST is a long stretch
@olive marsh what are you using the AST module for?
I am kind of agreeing with this, but I have presented a use-case. maybe the problem i am solving shouldn't be solved with ast.
finding a key inside a nested dict
that doesn't sound like something you'd use ast for
why does that need ast? just bfs or something
these have the exact same AST, one tuple is hashable while the other isn't
t = (eval('1'),)
t = (eval('[]'),)
yes, 
yes, I was bsfing on ast nodes
why doesnβt python remove the reference counting if it already has a gc
I think GC is just for cyclic references?
it's a good question though why we can't have only GC
but why are you using ast at all. just bfs on your dict
I think part of it is probably historical: refcounting came first and removing it would change some observable behavior (objects would get collected later)
@rose schooner ohhhh this is an issue with python's clean up, I just need to add atexit to unhook everything on exit
I suspect it may also be less efficient, since there are many short-lived ints and strs that would easily be handled with refcounting
would gc cycles take longer without refcounting?
likely
it should work for any kind of unreachable object
I don`t have valid reasoning to use ast for that rn, but I will try to come back with a valid reasoning in future, i must had something in mind when i considered ast for the use-case
since way more objects would have to be processed by the GC
is atexit any more guaranteed than weakref.finalize?
it happens earlier afaik *actually, weakref.finalize uses atexit (https://stackoverflow.com/a/72638700/14732272)
The gc relies on reference counts. If you removed reference counting, you'd totally break the gc.
what
Considering all possible types of a key, i need more if else in the base case. Btw, i don't remember the exact reasoning. Maybe this one.
the GC's job is to find objects that are only being kept alive due to cyclic references, which, well, relies on reference counting
isinstance(x, collections.abc.Hashable)
im too late, sorry
I think we can. But it would make thing a lot slower, because a lot of object are created and right after it get destroyed. If we have no RCing, we would have a lot of dead objects, so GC runs are a lot slower.
I heard somewhere that MicroPython don't use RCing to save memory
And for micropython it is totally ok, because processor is fast and memory is small. So iterating over all memory doesnt take a lot of time
https://devguide.python.org/internals/garbage-collector/#identifying-reference-cycles describes the GC's algorithm, if you want to read up on it.
so its not a full standalone gc
It's a reference counting GC, not a tracing GC.
a tracing GC works by taking some set of known reachable objects, and finding everything reachable from them. That's not how CPython's GC works. CPython's GC works by walking objects reachable from all containers, and finding objects where the containers that hold references to them account for the entirety of their reference count
It would also break every single 3rd party C extension module.
right, your message made me realize that
because we'd need some mechanism to track GC roots
yep
a tracing GC starts from the roots, and finds everything reachable. CPython's GC doesn't know the roots, but it still finds things not reachable from the roots - and it does that by finding everything whose reference count is totally explained by references that aren't held by a root.
what does pypy do for that?
@rose schooner @warm breach found out why it crashes, inside of remove_all_subclasses inside typeobject.c it deletes weakrefs from the tp_subclasses dictionary with the int as the key, but this happens after a level of finalization has occured, so frames have been deallocated and you get memory corruption
interesting
they don't support C extensions, do they?
if they do I guess the most likely mechanism is that they disable GC while an extension is active and/or scan memory for active pointers
but I'm just guessing
I think they use a traditional mark-and-sweep GC - but their CPython emulation layer for extension modules does expose reference counts - and I'm not sure how they bridge those two systems
https://www.pypy.org/posts/2018/09/inside-cpyext-why-emulating-cpython-c-8083064623681286567.html#maintaining-the-link-between-w-root-and-pyobject (the next section) gives some good details, actually.
- as long as the W_Root is kept alive by the GC, we want the PyObject* to live even if its refcount drops to 0;
- as long as the PyObject* has a refcount greater than 0, we want to make sure that the GC does not collect the W_Root.
actually the frame hasnt been deallocated, but it has a null f_code member? this is bizarre
seems like the answer is really that the two systems are interdependent, and each of them knows not to consider anything garbage without double checking that the other system isn't holding live references to it.
@rose schooner the frame is being cleared for some reason, which is causing the segfault
An "impossible" mystery: what does from __future__ import annotations change in a file that has no type annotations? https://github.com/nedbat/coveragepy/issues/1524#issuecomment-1375899451
Maybe the compile() call in https://github.com/nedbat/coveragepy/blob/master/coverage/parser.py#L388 gets affected by whether the future import is active?
coverage/parser.py line 388
self.code = compile(text, filename, "exec")```
though now that I think about it that doesn't make sense. how would the code within compile() even know what futures were active in the code that calls it?
Hmm, interesting.
it's also interesting because the project in the bug (twine) uses annotations....
so the code being compiled uses them.
hm there is a dont_inherit flag to compile()
ah yes that's it https://docs.python.org/3/library/functions.html#compile
with dont_inherit=False (the default), futures are inherited from the calling code
which seems weird, shouldn't the default be to only listen to futures that are present in the code you're compiling?
it does seem weird, yes. action at a distance!
Though now I wonder about coverage behaving differently on annotated code!
yes that would be the second part of the mystery π
I'm not sure how you define "statements", but I suppose the future import does mean there's less executable code
I was also surprised how compile() even did this. Seems like the mechanism is that code objects have the active future imports in their co_flags, and compile() peeks into the co_flags of the currently active code object to see what flags to use.
hmm, that does seem very odd.
eval() and exec() do the same thing. I can kind of see the justification: if you use eval/exec in a file, you probably want the code to be compiled similarly to the code in the file itself
thanks for the find...!
@feral island the docs are a little unclear: if I set dont_inherit=True, then will the __future__ in the compiled code be used?
the effect of dont_inherit=True is that it doesn't call PyEval_MergeCompilerFlags (which is what inherits flags from the calling code). It then calls _PyParser_ASTFromString to actually parse the string. I would think that still listens to future flags in the code it's parsing, but would need to read more code to verify
i'll be doing some empirical experiments too
don't feel bad! CPython is a very stable project and is very hesitant to make changes
I can understand, but I will try to implement that proxy thing.
@warm breach @rose schooner I looked into the crash caused by fishhook, it is triggered due to _PyIO_Fini being called after the interpreter is cleared (as far as I can tell normally that is safe because there shouldn't be any user code in there, but fishhook invalidates that assumption) Im looking rn to see if there are any legitimate bugs due to that assumption that do not involve fishhook
I'm sure someone's already figured it out but is it possible to modify/extend the in built python classes? Such as adding a method to pythons str class for example.
Check Objects/stringlib
so setattr of a python function after unlocking the class creates something similar to a PYFUNCTYPE callback function in the slot?
why are you not allowed to monkeypatch builtin types
One is moved there but it is unrelated ish to this crash. This crash rn is due to python being ran after significant tear down
Yea
!e
I don't think they're immutable? The same way you'd add a method to any other class would be my guess..
from types import MethodType
def reference(self):
return self
str.reference = MethodType(reference, str)
("123".reference())
@elder blade :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 6, in <module>
003 | TypeError: cannot set 'reference' attribute of immutable type 'str'
Interesting, I had no idea they were immutable - seems weird considering you can't really create classes with slots (I don't remember if you can or not, but it was difficult)
classes like str and int still have backing dictionaries, they just have a flag set that disallows setattr and delattr. fishhook modifies that flag, along with some other internal stuff to make it mostly stable
!e ```py
from fishhook import *
@hook(str)
def reference(self):
return self
print("123".reference())```
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
123
Does anyone know why str.maketrans function exist? Like why pass a dictionary to get a dictionary? It seems to me that all 3 forms of the function can be passed directly to str.translate
!d str.maketrans
static str.maketrans(x[, y[, z]])```
This static method returns a translation table usable for [`str.translate()`](https://docs.python.org/3/library/stdtypes.html#str.translate "str.translate").
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters (strings of length 1) to Unicode ordinals, strings (of arbitrary lengths) or `None`. Character keys will then be converted to ordinals.
If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to `None` in the result.
to me it feels like an API that was inspired by C-style languages, not something you'd do in modern Python
to me it seems that the only form of the function that can be directly passed to str.translate would be int->str / int->int dictionaries?
str->str dictionaries do not work on str.translate, and neither does forms using two or more arguments supported by maketrans
if you meant like "why doesn't str.translate supports it directly", idk
Ok, thanks guys!
Yeah I can't think of many use cases where I need to replace individual characters like that
Yeah, you either need a simple replacement or you go full regex
It's not that straightforward to e.g. swap two characters though. E.g. in "foobar", how do you easily swap the "o"s and the "a"s without translate?
when do you need to to do that though?
Of course, not very often. I think I remember using it recently, though. I try to remember grepping all work repos for translate tomorrow.
!e
Technically possible with a regex
import re
repl = re.sub("[ao]", lambda m: {"a": "o", "o": "a"}[m[0]], "overbananomaniacal")
print(repl)
@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.
averbononamoniocol
I found this one in our code text.translate(str.maketrans("", "", string.punctuation)), removing all punctuation from a string
I would personally probably have generated a regex or done a bunch of .replace() calls
but .translate is probably more efficient
what makes the function awkward for me is that you have to have to call another function for this to be usable
like, it's not str.translate({"a": "o", "o": "a"})
why exactly does str.translate not allow directly passing it a dict mapping chars to chars?
unless its just how it used to work, and nobody uses it enough to justify adding more functionality
I feel like that goes back to it being a C-style API
"".join(table.get(c, c) for c in s) where table is your dictionary and s your string works and is quite neat I think.
like what you really do is passing a char table[256]
i know that's more or less literally what you do for bytes.translate, since of course that uses an entirely different format for the translation table π
though at least i can see the rationale behind that, since there being only 256 different byte values means you can fit a complete translation table into a reasonably-sized amount of memory
right, I guess for str that got adapted because you don't want an array the size of all of Unicode π
I don't think so, didn't it work the same for ASCII strings in python 2? (or maybe it did?)
I assume the actual reason is that the C code can just do table[char_ord] instead of having to try the char ordinal or the single character string
the reason it's not only strings is probably some sense of proper type usage.
though it is very fast, compared to replace or regex
since it just needs to lookup a ord table
i spose
I think there should be a "convenient option" and a "fast option if you really need this and you know how to benchmark your code"
Hey internals people!
I have a question about Python's lexing of f-strings
This provides a pretty thorough overview of how python's lexing step is executed. Specifically, though, it outlines the patterns used to detect string literals, and separately it outlines how f-strings are parsed. But it makes no mention of how those two steps relate to each other, or if the process is even done in two steps
So my question is, is the lexing of f-strings done in two steps (first capture the string whilst ignoring replacement fields, then lex that string in greater detail?)
Or, are there two distinct procedures, one for format strings and one for unformatted strings?
I think i read on stackoverflow that chaining replace if a bit faster, but translate is still faster than regex and doesn't require chaining or looping, besides it gives expected results like if you do '1234'.replace('1', '2').replace('2', '3') you get 3334, but with translate you get (the expected, i guess) 2334
regarding this, is there a wrapper somewhere that bunches up __add__ __radd__ or other functions like those that occupy one pointer slot but can be multiple functions?
from einspect.structs import PyTypeObject
class Foo:
def __add__(self, other):
return "add"
def __radd__(self, other):
return "radd"
t = PyTypeObject.from_object(Foo)
nb_add = t.tp_as_number.contents.nb_add
print(nb_add(1, Foo()))
>> 'radd'
print(nb_add(Foo(), 1))
>> 'add'
so here it seems nb_add points to a function that internally decides based on argument type whether to call __add__ / __radd__?
essentially is there some way to get at the actual different __add__ and __radd__ functions from the tp_as_number.nb_add pointer?
There isn't, because on the C level they're just a single function. You could access the dunder attributes if it's a Python class, but for an extension class that'd be a pointless roundtrip back to where you started.
Here's the float implementation for instance:
https://github.com/python/cpython/blob/762745a124cbc297cf2fe6f3ec9ca1840bb2e873/Objects/floatobject.c#L595
There's a macro and helper function to reduce code duplication, but effectively it just does:
def float_nb_add(v, w):
a = convert_float(v)
b = convert_float(w)
return new_float(a + b)
Objects/floatobject.c line 595
float_add(PyObject *v, PyObject *w)```
hm...
but on the user class, what exactly does the nb_add point to? A custom function that decides types and then calls the appropriate python function?
There's a function defined for every slot that checks types and looks up attributes yeah.
Here's the macro that generates it for binary methods: https://github.com/python/cpython/blob/main/Objects/typeobject.c#L7785
Objects/typeobject.c line 7785
#define SLOT1BINFULL(FUNCNAME, TESTFUNC, SLOTNAME, DUNDER, RDUNDER) \```
what's up with all the \ π₯΄
oh do they need that to be on 1 line for the conditional define to work
It's a macro, so they need to escape the newlines so it's technically one line.
I think if you had a Python class that only overrode one of the methods in a pair, the way the other would fallback is:
operation β c slot β typeobject wrapper β dunder lookup β parent class method object β parent class slot
the fun C experience
hm so @pliant tusk it's not possible to conditionally restore a slot pointer of say only __radd__ while keeping __add__?
!e but how does fishhook manage to unhook __add__ while keeping the hook for __radd__ 
from fishhook import hook, unhook
@hook(int)
def __add__(self, other):
return self - other
@hook(int)
def __radd__(self, other):
return self + int(other)
print(int(1) + 1)
print("1" + 1)
unhook(int, "__add__")
print("1" + 1)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 0
002 | 0
003 | 2
or is it as simple as taking a copy of orig = int.__add__ 
How these wrappers are created? Is interpreter allocating new asm code, then patches it and then using it as C function?
Can you tell me where this is happening? Im really curious, it would be useful in my project
Basically, the slot pointer doesn't change after you unhook add because radd is still defined in the int dictionary
So the slot pointer remains the generic python one
It's a benefit I get for free based on fishhooks new hook strategy
so that's just what setattr does itself after marking the mutable flag?
hm, though making a backup of int.__add__ also seems to work after directing changing tp_as_number.nb_add
how sure how that works, is int.__add__ already resolved to the function pointed to by the slot and not the slot pointer 
Int.add holds the original value in the slot pointer and calls that address
So if you store it, then overwrite nb_add, the stores one will retain the original functionality
also changing nb_add also seems to not affect int.__add__(1, 2) or (1).__add__(2), it only affects using +
I think that's since __add__ is still the old pointer in the int.__dict__?
That makes sense
It's because the __add__ in the dict is still holding the original nb_add function pointer, not your new one
I use that in fishhook to implement orig
hm, if making a copy of the original function is enough, what is this stuff doing π https://github.com/chilaxan/fishhook/blob/master/fishhook/fishhook.py#L168-L202
is that just to make sure your internal calls aren't routed to patched stuff?
All of the closure variables are to make orig as stable as possible by removing the ability to hook code inside it (which would cause a recursive error)
And the cache stuff is storing the original and holding it for orig to grab later
can fishhook currently "unimplement" builtin slot functions?
Technically, but I would not recommend it. You would need to use fishhook.force_delattr
is it any more unsafe than setting the attr?
Yea because some dunders are assumed to always exist internally
!e 
from fishhook import force_delattr
force_delattr(int, "__mul__")
a = 2
print(a * 5)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
10
Unexpected lmao
seems to work if I set the slot to a null pointer, not sure how unsafe it is
from einspect import view
from einspect.structs.include.object_h import binaryfunc
view(int).tp_as_number.contents.nb_multiply = binaryfunc()
a = 2
print(a * 5)
Traceback (most recent call last):
File "main.py", line 7, in <module>
print(a * 5)
~~^~~
TypeError: unsupported operand type(s) for *: 'int' and 'int'
That makes sense that it would work. I'm curious to find out why fishhook didn't. Probably some caching
so it seems delattr just resets the slot pointer back to original...? kind of strange
from einspect import view
with view(int).as_mutable():
int.__mul__ = lambda a, b: a + b
print(view(int).tp_as_number.contents.nb_multiply(5, 5))
>> 10
with view(int).as_mutable():
del int.__mul__
print(view(int).tp_as_number.contents.nb_multiply(5, 5))
>> 25
Huh that's weird
even weirder:
delattrwith mutable flag, no effect
from einspect import view
with view(int).as_mutable():
delattr(int, "__mul__")
a = 5
print(a * 5)
>> 25
- deleting the
__mul__entry inint's__dict__, no effect
from einspect import view
del view(int.__dict__)["__mul__"]
a = 5
print(a * 5)
>> 25
- setting the
__mul__attr to some random thing first, THEN deleting the entry inint.__dict__, works somehow?
from einspect import view
with view(int).as_mutable():
setattr(int, "__mul__", "something")
del view(int.__dict__)["__mul__"]
a = 5
print(a * 5)
>> TypeError: unsupported operand type(s) for *: 'int' and 'int'
What is "einspect"? Is it from pip?
fully typed
lets make int iterable!
π
For completeness you should modify your code to break typecheckers while typechecking and make them understand your code
Are there any not fixed vulnerabilities in typecheckers?
Probably caused by some odd handling of slot pointers that are not the default python one
what versions does einspect support?
3.8-3.11
probably 3.7 as well but
too many typing stuff broken I'm not even going to try
it supports all of the different versions of structs?
no π©
I know at least the dict one was changed, so the current one is for 3.11
I'll have to make a redirected definition for 3.8-3.10 or something 
Some stuff on the type struct changed as well afaik
Maybe you could have einspect parse a given versions headers to build the structures
yeah that could work
src/einspect/structs/py_type.py lines 39 to 48
tp_setattr: setattrfunc
# formerly known as tp_compare (Python 2) or tp_reserved (Python 3)
tp_as_async: ptr[PyAsyncMethods]
tp_repr: reprfunc
# Method suites for standard classes
tp_as_number: ptr[PyNumberMethods]
tp_as_sequence: ptr[PySequenceMethods]
tp_as_mapping: ptr[PyMappingMethods]```
I've already defined most of the types
so it would just have to convert the C type style to python type hints and it should work
this is from here for example https://github.com/python/cpython/blob/3.11/Doc/includes/typestruct.h#L11-L20
Doc/includes/typestruct.h lines 11 to 20
setattrfunc tp_setattr;
PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2)
or tp_reserved (Python 3) */
reprfunc tp_repr;
/* Method suites for standard classes */
PyNumberMethods *tp_as_number;
PySequenceMethods *tp_as_sequence;
PyMappingMethods *tp_as_mapping;```
yea but if you are gonna parse headers then you won't have to manually define any of them
hm.. like parse everything? 
maybe I can run that at build time and make different wheels for each python version
yea
and you could also include it with einspect in case the end user wants to inspect some other clib
currently have to do a bunch of these as well for compat https://github.com/ionite34/einspect/blob/main/src/einspect/structs/py_object.py#L98-L100
src/einspect/structs/py_object.py lines 98 to 100
@bind_api(python_req(Version.PY_3_10) or pythonapi["Py_NewRef"])
def NewRef(self) -> object:
"""Returns new reference of the PyObject."""```
Include/object.h lines 102 to 106
struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;
};```
Include/cpython/object.h lines 141 to 142
struct _typeobject {
PyObject_VAR_HEAD```
How do I actually go about defining PyObject's ob_type as PyTypeObject with ctypes.Structure?
PyTypeObject inherits PyVarObject which inherits PyObject which references PyTypeObject 
Looks like you should be able to do: ```py
class PyTypeObject(ctypes.Structure):
pass
class PyObject(ctypes.Structure):
fields = [..., ("ob_type", ctypes.POINTER(PyTypeObject)), ...]
PyTypeObject.fields = [...]
Declaring an empty class, creating POINTERs that refer to that type, and then defining that type's fields later seems to be how you forward declare types for ctypes.
hm, but I'm automatically defining _fields_ using type hints from a decorator 
!e wait also this doesn't even work? 
from ctypes import *
class PyObject(Structure):
pass
class PyVarObject(PyObject):
pass
class PyTypeObject(PyVarObject):
pass
PyObject._fields_ = [
("ob_refcnt", c_long),
("ob_type", POINTER(PyTypeObject)),
]
@warm breach :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 12, in <module>
003 | AttributeError: _fields_ is final
wait since when
also why c_long instead of c_ssize_t
I mean, doesn't really matter, just for testing :p
but yeah I just want to define POINTER(PyTypeObject) within PyObject
seems impossible rn π
this works ```py
from ctypes import *
class PyObject(Structure):
fields = [
("ob_refcnt", c_ssize_t),
("ob_type", c_void_p),
]
class PyVarObject(PyObject):
fields = [
("ob_size", c_ssize_t),
]
class PyTypeObject(PyVarObject):
pass
PyObject.fields[1] = ('ob_type', POINTER(PyTypeObject))
ye but that doesn't change the attribute descriptor
>>> PyObject.ob_type
<Field type=c_void_p, ofs=8, size=8>
just change ._fields_
maybe there's a way to signal changes but idrk
so according to SO
Structure/Union classes must get 'finalized' sooner or later, when one of these things happen:
_fields_ is set.
An instance is created.
The type is used as field of another Structure/Union.
The type is subclassed
When they are finalized, assigning fields is no longer allowed.
just use ctypes to modify ctypes ```py
from ctypes import *
class PyObject(Structure):
fields = [
("ob_refcnt", c_ssize_t),
("ob_type", c_void_p),
]
class PyVarObject(PyObject):
fields = [
("ob_size", c_ssize_t),
]
class PyTypeObject(PyVarObject):
pass
LP_PyTypeObject = POINTER(PyTypeObject)
PyObject.fields[1] = ('ob_type', LP_PyTypeObject)
py_object.from_address(id(PyObject.ob_type)+object.basicsize+tuple.itemsize*3).value = LP_PyTypeObject
Guess it's time for c_void_p again π
@struct
class PyObject(Structure, Generic[_T, _KT, _VT]):
ob_refcnt: int
_ob_type: c_void_p
@property
def ob_type(self) -> ptr[PyTypeObject[Type[_T]]]:
from einspect.structs.py_type import PyTypeObject
res = cast(self._ob_type, POINTER(PyTypeObject))
return res
@ob_type.setter
def ob_type(self, value: ptr[PyTypeObject[Type[_T]]]) -> None:
self._ob_type = cast(value, c_void_p)
though honestly... might use your thing 
src/einspect/structs/py_object.py lines 25 to 30
@struct
class PyObject(Structure, Generic[_T, _KT, _VT]):
"""Defines a base PyObject Structure."""
ob_refcnt: int
ob_type: Annotated[ptr[PyTypeObject[Type[_T]]], c_void_p]```
src/einspect/structs/py_type.py lines 133 to 138
def _patch_py_object():
# noinspection PyProtectedMember
fields = PyObject._fields_
fields[1] = ("ob_type", POINTER(PyTypeObject))
offset = object.__basicsize__ + tuple.__itemsize__ * 3
py_object.from_address(id(PyObject.ob_type) + offset).value = POINTER(PyTypeObject)```
`src/einspect/protocols/type_parse.py` line 44
```py
RE_ANNOTATED = re.compile(r"^(Annotated)(\[(.*)])$")```
why make a function and immediately call it
there's a comment above explaining what it's doing
is this used in other places
don't really want that stuff left in the namespace mainly
fair point
@warm breach just assign to _fields_ only once and before class is initialised
!d ctypes.Structure
class ctypes.Structure(*args, **kw)```
Abstract base class for structures in *native* byte order.
Concrete structure and union types must be created by subclassing one of these types, and at least define a [`_fields_`](https://docs.python.org/3/library/ctypes.html#ctypes.Structure._fields_ "ctypes.Structure._fields_") class variable. [`ctypes`](https://docs.python.org/3/library/ctypes.html#module-ctypes "ctypes: A foreign function library for Python.") will create [descriptor](https://docs.python.org/3/glossary.html#term-descriptor)s which allow reading and writing the fields by direct attribute accesses. These are the
yeah but I want to subclass it
see this use case ^
where PyObject has a field that refers to a pointer of PyTypeObject, which itself inherits PyVarObject, which inherits PyObject
Hmm
completely circular
!e ```py
from ctypes import *
class PyObject(Structure):
fields = [
("ob_refcnt", c_long),
("_ob_type", c_void_p)
]
@property
def ob_type(self):
return cast(self._ob_type, POINTER(PyTypeObject))
class PyVarObject(PyObject):
pass
class PyTypeObject(PyVarObject):
pass
print(PyObject.from_address(id(1)).ob_type)
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
<__main__.LP_PyTypeObject object at 0x7f000675c7a0>
@warm breach you can do this trick ^
they did do that
Ah I didn't read enough of the conversation lol
the downside is that PyObject.ob_type is incorrect now
usually you would get the CField, in this case you get a property object
also in my case I would have to do an import within the property as well to avoid being circular, slightly annoying
um
!e anyone know what is up with this
a = int.__subclasshook__
b = int.__subclasshook__
print(a, b)
print(a is b)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | <built-in method __subclasshook__ of type object at 0x7f0a4aecdae0> <built-in method __subclasshook__ of type object at 0x7f0a4aecdae0>
002 | False
!e print(type.__subclasshook__ is type.__subclasshook__) 
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
False
@native flame :white_check_mark: Your 3.11 eval job has completed with return code 0.
True
it's probably inherited from object
β«object.__dict__['__subclasshook__'] is a descriptor
I have two ideas:
- this function exists only in C, so Python wrapper is created every time you use
.__subclasshook__ - when you use
.__subclasshook__, bound method is created (= object that contains first argument for the method). So, new object is created every time
You can get similar behaviour with new class Foo and method bar. Foo.bar will return new bound method
but the memory locations are the same in the repr
https://github.com/python/cpython/blob/main/Objects/methodobject.c#L279-L289 it prints the memory location of self
Objects/methodobject.c lines 279 to 289
static PyObject *
meth_repr(PyCFunctionObject *m)
{
if (m->m_self == NULL || PyModule_Check(m->m_self))
return PyUnicode_FromFormat("<built-in function %s>",
m->m_ml->ml_name);
return PyUnicode_FromFormat("<built-in method %s of %s object at %p>",
m->m_ml->ml_name,
Py_TYPE(m->m_self)->tp_name,
m->m_self);
}```

>>> a
<built-in method __subclasshook__ of type object at 0x00007FF9BDCA5E30>
>>> f'{id(int):#018X}'
'0x00007FF9BDCA5E30'
@rose schooner π
from einspect import impl, orig
@impl(int)
@property
def real(self):
if self == 42:
return "the answer to life, the universe, and everything"
return orig(int).real.__get__(self)
print((5).real)
>> 5
print((42).real)
>> the answer to life, the universe, and everything
fishhook like?
maybe
mainly went with this format to be able to get autocomplete and typing as well
since I lie in the orig __new__ type hints 
class orig:
"""Proxy to access a built-in type's original implementation."""
def __new__(cls, type_: Type[_T]) -> Type[_T]:
i should probably work on a new PyPI project as well
!pypi cereal
hm
from fishhook import hook, orig
from sys import _getframe
@hook.property(int)
def real(self):
if self == 42:
return "the answer to life, the universe, and everything"
return next(c.value for c in _getframe(0).f_code.co_consts if type(c).__module__ != 'builtins' and c.key == 'orig').__get__(self)
``` β«here's how it'd be done in `fishhook` probably
what's this
what about this 
from einspect import view, orig
view(set)["__name__"] = "not_a_set"
print(set)
>> <class 'not_a_set'>
print(orig(set).__name__)
>> set
nice
i'm gonna have to look into the source code again to see how i'd do it in fishhook
the default setattr dies trying to do that for some reason
seems like cpython bug maybe 
!e
from fishhook import force_setattr
force_setattr(set, "__name__", "not_a_set")
print(set)
@warm breach :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).
[No output]
trying to do it in the latest release of fishhook with hook.cls() doesn't work ```pycon
@hook.cls(set)
... class _:
... name = 'not_a_set'
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python311\Lib\site-packages\fishhook\fishhook.py", line 443, in wrapper
hook_var(cls, attr, value)
File "C:\Program Files\Python311\Lib\site-packages\fishhook\fishhook.py", line 415, in hook_var
force_setattr(cls, name, classproperty(prop))
File "C:\Program Files\Python311\Lib\site-packages\fishhook\fishhook.py", line 145, in force_setattr
setattr(cls, attr, value)
TypeError: can only assign string to set.name, not 'classproperty'
why does that turn into a class property 
What's y'all opinion:
Python Enhancement Proposals (PEPs)
deferred ref counting π
i wanna convert a message to a list of int, i used this code and didn't well, some help please
Please see #βο½how-to-get-help and open a thread in #1035199133436354600
You forgot the commas in the list
['a''b''c'] will evaluate to ['abc']
Yes i rewrite char="abcd.....
But didn't work
Should i print StripM after the returns ?
it does do properties
!e ```py
from fishhook import hook, orig
@hook.property(int)
def real(self):
if self == 42:
return "the answer to life, the universe, and everything"
return orig.real
print((5).real)
print((42).real)```
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 5
002 | the answer to life, the universe, and everything
and it changes it into a class property so that it can use the orig strategy to revert, I figured that people wouldn't be trying to use fishhook to change metaproperties like __name__
clearly incorrectly lmao
(also it dies in force_setattr because that probably tries to free the value that was in the slot originally and that string is a static string so free fails)
that has to be a CPython bug surely? though I guess setattr never planned for usage as such
I think I tried hook_property earlier, is that different to hook.property?
No it's the same thing, but it won't work for properties not stored in the class dictionary since I didn't include handling for them
wait what is orig here exactly? if you don't do the .real
It's the Orig singleton
ah hm, but the .real is a special attribute for hook.property? Or are there attribute access for normal functions as well
On normal functions you would only have attributes bound if your hook is being called from inside an attribute hook
(Orig actually just access the next level up in the case of nested hooks)
It lets wrapped hooks function seamlessly
could orig.__name__ somehow access that attribute of the bound function or such?
I guess that might make sense
It could be implemented, but as of now it does not
You can call get_cache_trace yourself to get the original hooked function if you need it
Hey internals people! I asked a question about lexers just now. I figure if any sub on this server knows about this stuff, it'd be you guys. Just thought I'd send up a flare
Basically, I've finished a little prototype and gotten myself warmed up. More than anything, I'm just trying to figure out where to go from here
And, maybe find some well built examples I can try to emulate
Include/internal/pycore_gc.h lines 25 to 26
/* True if the object is currently tracked by the GC. */
#define _PyObject_GC_IS_TRACKED(o) (_Py_AS_GC(o)->_gc_next != 0)```
No, it's reading a field in that GC header.
I'm reading the docs for venvs and I'm trying to figure out whether doing something like upgrading pip on the base python install will also cause the venv to use the upgraded version of pip. Is there anywhere I can read more about the implementation details of venvs without digging through the code?
Essentially I am a bit unclear as to what the statement A virtual environment is created on top of an existing Python installation from the docs actually means in practical terms. I understand it changes where python looks for packages, but does this extend to modules like pip as well?
a virtual environment has its own packages, including pip.
in that case, does the root install share anything with the venv? in what sense is the venv installed "on top of" the base install?
It's using the Python executable (over a symbolic link or something like that) and all that's needed to run the bare interpreter.
well... unless you do venv --copies when it uses simple copies π
And if you use the --system-site-packages option, the base packages will also be available
Ah alright, thanks, that makes sense.
actually TIL on some platforms venv will not use symlinks by default
btw where did you find the info? I tried to google it but was inundated by "how to use venvs" blog tutorials...
I feel like google results have been getting worse over time but maybe it's just my imagination.
what in particular?
I combined two pieces of information:
- a virtual environment has its own packages -- idk, from using venvs and looking at the
venvfolder pipis a regular package
for the CLI args, I did python -m venv --help
i... have strange ways of learning
I don't think that's strange at all
Maybe I just don't remember the place in the docs where I first saw it, and instead I remember how I last reinforced those pieces of information
I knew pip is just a package, but at the same time I wasn't confident all packages are treated equally because I'm not super familiar with how python packages work in depth.
Though obviously treating them the same is the best approach if you can get it to work
Premature abstraction is as bad as premature optimization.
-Luciano Ramalho
Shut the fuck up!
-Java
though I had never heard "premature abstraction" put to words before.
Not sure if this is the right place, but here it goes:
I know I can find parent classes for an instance of a class by checking __class__.__base__.__base__ etc.
Is there any way I can find children without making a function call ? eg __subclasses()__
You could do this by adding, to the base class you're interested in, an __init_subclass__ that kept a list of all subclasses to the base class. (You could do the same thing with a metaclass.)
til class definition is actually a function call```py
import dis
dis.dis("""
... class A:
... pass
... """)
2 0 LOAD_BUILD_CLASS
2 LOAD_CONST 0 (<code object A at 0xed52b3c8, file "<dis>", line 2>)
4 LOAD_CONST 1 ('A')
6 MAKE_FUNCTION 0
8 LOAD_CONST 1 ('A')
10 CALL_FUNCTION 2
12 STORE_NAME 0 (A)
14 LOAD_CONST 2 (None)
16 RETURN_VALUE
Disassembly of <code object A at 0xed52b3c8, file "<dis>", line 2>:
2 0 LOAD_NAME 0 (name)
2 STORE_NAME 1 (module)
4 LOAD_CONST 0 ('A')
6 STORE_NAME 2 (qualname)
3 8 LOAD_CONST 1 (None)
10 RETURN_VALUE
It's a function call because it's creating an instance of the metaclass. You can also create classes by directly calling the metaclass, which will inherit from type. Really the whole class syntax is (since the introduction of new-style classes back in Python 2) mostly syntactic sugar around such calls.
Me neither, but I really disagree with this quote. I find it difficult to over-abstract systems.
The reason you should not prematurely optimize is because it is a waste of time since your work may inevitably be lost if the smallest change is made to the code. Prematurely optimizing codes makes it harder to read because your code changes from doing things the obvious way, to the fastest way, and readers of your code now need to grasp the same optimizations that you do.
This is not completely the case with abstractions. While your code tends to grow a bit and ultimately may feel more complicated, everybody wins from the abstractions you make. When the code is later changed you can inevitably take advantage of these abstractions and write simpler code. It is unlikely that you need to rip out everything like may be the case if everything is written without abstractions and tightly coupled.
Writing abstractions is a natural part of developing features. You often do not need to go out of your way or some form of a detour to write abstractions. As part of good practice you typically design around separation of concern and create these abstractions. It's easy to differentiate between an experienced developer and a new one by whether they attempt to abstract upon the work they need to do
The quote sounds fancy which makes it easy to follow as advice and some type of guideline, but ultimately I find it counterproductive
I agree yeah, imo mostly the issue is not enough abstractions rather than too much. Since abstractions take work and planning and it's a lot easier to just hard code some logic.
I do think there's such a thing as "premature abstraction." If there are aspects of your design that aren't fully fleshed out yet, then it might be hard to find the right abstraction. It might be more reasonable to plan to refactor later when you have a better handle on things. That said, I think premature optimization is a much bigger problem that premature abstraction; I suspect premature abstraction is a rarely encountered problem, whereas premature optimization happens all of the place.
"duplication is far cheaper than the wrong abstraction"
It's not really the "amount" of abstraction, but rather inevitably using the wrong abstraction when you only have one instantiation of it at the time.
i.e. you might be abstracting the wrong elements, making the code harder to change in the future, not easier.
and I think godlygeek is referring to Sandi Metz
https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction
wdym
the function call here is more like
__builtins__.build_class__(classbody, 'A')
@uneven raptor btw are you using sys.getsizeof for pointers.py memory moves? I'd advise against it since the calculations will be mostly meaningless
like this results in an invalid size error even though both tuples have the same allocated memory
from pointers import _
t = (1, 2)
ptr = _&t
ptr <<= (1, 2, 3)
and this segfaults for some reason 
from pointers import _
t = [1, 2, 3]
ptr = _&t
ptr <<= [1, 2]
print(t)
I think there really is such a thing as too much abstraction. In the extreme it leads to https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpriseEdition
yeah, even if the abstraction is "correct" it does have a maintenance cost
I read just now
That python's lexical grammar is not regular
Does that mean it needs something beefier than a DFA to tokenize, or that the tokenization step is skipped completely?
indeed, it's not even context-free. Indentation is complicated.
the way it works is still via a DFA, but with a tiny general syntactic analyzer/turing machine strapped on top to keep track of indentation
Oh, so its a DFA augmented with a bit of state (because it isn't the stone age)
Follow up question. I find myself in a bit of a chicken-egg type situation. I'm trying to implement a DFA generator/lexer generator for my own purposes. Do I not need a parser to combobulate my lexical grammar into something the lexer generator can understand?
In other words, I need a parser to configure the lexer that I need for my parser?
is it possible for cpython to reuse the same id later for a new object after the current object is GC'd, or is it guaranteed to be unique per interpreter session?
they're reused, the ids are only unique with overlapping lifetimes
!e
for _ in range(10):
print(id([]))
@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 140273403014336
002 | 140273403014336
003 | 140273403014336
004 | 140273403014336
005 | 140273403014336
006 | 140273403014336
007 | 140273403014336
008 | 140273403014336
009 | 140273403014336
010 | 140273403014336
yeah but that is the same object?
which is fine, I'm just wondering if there's a case where some old object is GC'd and a new object gets the same id
no, that's not the same object
wait how is it not?
Conceptually, every time you do [] you create a new object
!e
import random
for _ in range(10):
print(id([random.randint(4, 10)]))
@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 140358553008768
002 | 140358553008768
003 | 140358553008768
004 | 140358553008768
005 | 140358553008768
006 | 140358553008768
007 | 140358553008768
008 | 140358553008768
009 | 140358553008768
010 | 140358553008768
here it's more obvious
I think it may be the same object because of storing them in the free list? But you'll see similar behaviour with other small objects because they'll go through the python managed memory
hm, so practically I'm trying to cache the allocated memory of an object
so I was considering using the id as a key
you can map the id to the object to keep it alive
yeah but I don't really want to keep it alive π₯΄
I was going to register a weakref.finalize to remove the entry when the object gets GC'd, but most of the builtins don't support weakrefs
You can wrap builtins in a wrapper
wait no, that won't really work
or do you know of another way to get the allocated size of a python object ;-;
!e ```py
print([] is [])
print(id([]) == id([]))
@dusk comet :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | False
002 | True
good
keep it that way, for the sake of your own sanity
!e for i in range(10): print(id(object()))
@flat gazelle :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 140512066781536
002 | 140512066781536
003 | 140512066781536
004 | 140512066781536
005 | 140512066781536
006 | 140512066781536
007 | 140512066781536
008 | 140512066781536
009 | 140512066781536
010 | 140512066781536
what should i use instead?
both objects think they own the internal array, so they both try to manipulate it
guys is there a command to make survey in text channel?
type(obj).__basicsize__ rounded up to the nearest 16 (alignment) would be more accurate
ok
sys.sizeof(somelist) will report the size of the pointed to array which isn't actually in the struct
!e
from einspect import view
ls = [1, 2]
with view(ls).unsafe() as v:
v <<= [*range(32)]
print(ls)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
list <-> list copies should be (*mostly) safe, no mater the size (since the struct is always 40 bytes)
the only really unsafe thing is if you copy a non-gc header type into a type that originally had a gc header because the header free will be wrong
interesting
this will probably be done in pointers.py 3
canβt really support changing behavior of moves right now
Yes, that's what that function call there is. But ultimately, __build_class__ calls the metaclass. (It has to, or else you wouldn't be able to give classes a metaclass.) It happens at https://github.com/python/cpython/blob/3.11/Python/bltinmodule.c#L209
Python/bltinmodule.c line 209
cls = PyObject_VectorcallDict(meta, margs, 3, mkw);```
Oh, maybe I'm confused as to what your TIL actually was. Yes, I guess it's not obvious that class statement gets rewritten as a function call to __build_class__. You could imagine that Python worked in some other way (and indeed, in the past, it did) where the work of __build_class__ happened internally, in something that wasn't exposed.
i mean why isnβt that a BUILD_CLASS instruction but a function call
The answer is really https://mail.python.org/pipermail/python-3000/2007-March/006338.html. The point is that the syntax class A(*args, **kwargs): strongly resembles a function call, so it made sense to turn it into an actual function call.
Hey internals people! Question for you
How do I parse a lexical grammar without a parsing infrastructure already in place, and without leveraging any existing (modern) string manipulation infrastructure?
I know this isn't exactly "on topic", but you all are definitely the people to ask
@deep nova what are you including in "modern string manipulation infrastructure"?
The built in regex engine, for one
and does "lexical" mean you are tokenizing, or parsing?
Replace, find, and split methods also. I'd prefer to do things in a single pass
For tokenizing.
TL;DR β I need to parse the lexical grammar to build the lexer that I need to build the parser.
And I want to put myself in the headspace of the early coder, who had no such tools
i guess you start writing a bunch of if statements with an index into the string that you increment as you go along.
So, the direct approach
Wrap my teeth around the grammars jugular, and shake until its dead kind of thing XD
@deep nova my guess is that any "indirect approach" would also count as "using modern infrastructure"
just code the lexical parser by hand
I'm not 100% sure, but
I'd guess that building a DFA of that size by hand would be... painful
it's not as bad as you think, but it is a whole days work more or less.
it's not that bad, I have handwritten a SQL tokenizer before. https://github.com/JelleZijlstra/sqltree/blob/main/sqltree/tokenizer.py
doesnβt need to build a giant dfa
Hmmmmmmmmmm 
the most inefficient method is to write a bunch of functions, each of which match a particular token and will either return a Token object (if success) or False (if failure)
then a tokenizer just executes each of the functions until it finds a success
and keep going until end of source string
@deep nova
funny, i was going to share a hand-written SQL parser I did. why is SQL the hand-crafted thing!?
why didn't you use regex for lexing?
it seemed harder π
I was recently bringing myself up to speed on typing PEPs. I came across this statement from the core team on the from __future__ import annotations https://mail.python.org/archives/list/python-dev@python.org/message/VIZEBX5EYMSYIJNDBF6DMUMZOCWHARSO/
with references to competing peps... does anybody know the current state of this discussion?
PEP 649 is provisionally accepted
I hadn't heard about that, that's great news. I'm happy to hear PEP 563 functionality will be removed.
When did this happen
Same. 649 opens up to a lot of interesting uses
very recently, a few weeks ago
Personally, I'm just happy my uses of annotations won't stop working in some future Python version. :D
Ah, still says draft
i do wish to see stringly annotations DEAD
hm, so with this, could an class variable annotation refer to the class itself, during checking __annotations__ from a class decorator?
I think the answer is no if you use the @decorator syntax, yes if you use the cls = decorator(cls) syntax. Because the class's name isn't defined in the globals until after the decorator runs, in the first case.
hm, is there still a way to get string annotations then?
since I assume accessing __annotations__ at that point will NameError?
or does it fall back to strings
you can always continue to use string annotations explicitly - instance: "SomeType" will still be legal
hm
also assuming this is added as from __future__ import co_annotations in 3.12, the old from __future import annotations will still be a thing?
for a while, but eventually deprecated.
probably deprecated in 3.12 and removed in 3.14
so we now need to choose between working with 3.7 - 3.11 with the annotations future, but possibly being deprecated, or only working with 3.12+?
the future annotations import was already annoying enough but now almost every single annotated code base will face a deprecation of this promised future import?
imo it really should just be switched over if it's compatible
explicit string annotations are legal from 3.7 onwards, so there is a way to write the code that's legal and has the same semantics for every version
right but for the last 4-5 years people have been using future annotations and implicit string hints
my point is that most code will still work in the range 3.7-3.14+, it's just the future import that will be deprecated
so if I don't want to use explicit string annotations and I want deferred eval, I have to choose between <3.12 compat or possibly breaking in 3.14
has there ever been the precedent of a future import being deprecated?
no, this will be the first.
hm, what if on 3.14 we just redirect from __future__ import annotations to co_annotations implicitly as an alias
that's not too crazy right
we'll provide some workarounds for this (e.g. passing an explicit namespace)
so code bases that never were able to switch that import still have a chance to work with the new deferred eval
it's essential that this works ergonomically because recursive dataclasses are a common pattern
interesting
my current solution is adding {cls.__name__: cls} as the locals arg to get_type_hints in the decorator https://github.com/ionite34/einspect/blob/main/src/einspect/structs/deco.py#L63-L65
src/einspect/structs/deco.py lines 63 to 65
hint_locals = {cls.__name__: cls}
try:
hints = get_type_hints(cls, None, hint_locals, include_extras=True)```
..
Just curious why happy for 563 to be removed?
The whole drama with delayed type annotations and 563 and pydantic I found a bit ofd
I use annotations for non-typing purposes. So I have existing code that would break if annotations would suddenly turn into strings.
Also I think it was an ugly solution.
I doubt that, I think it will stay around for much longer, perhaps until 3.11 EOL.
hm. Has that been discussed somewhere? One of the rationales given for PEP 649 is that the automatic stringification of annotations is hard for other implementations to support. Supporting it indefinitely seems to undercut the PEP.
https://discuss.python.org/t/pep-649-deferred-evaluation-of-annotations-tentatively-accepted/21331/3
I think we shouldnβt emit DeprecationWarning for PEP 563, at least 3 releases. Library authors can not use PEP 649 until they drop Python 3.11 support. PEP 563 is the only efficient and convenient way to use type annotations. I think it should be syntax error. Using PEP 649 means that source code doesnβt support Python 3.11. Allowing PEP 563...
hm, interesting. That does seem to undercut the PEP, if both will be supported for a very long time. Though I can see the rationale, I do think that's an unfortunate tradeoff...
I do find from __future__ import co_annotations very confusing
Yeah, deferred_annotations or something like that seems like a better name...
will there be a python 4
God I hope not
why
Remember the transition from Python 2 to 3? It wasn't fun. Don't want that again.
I wonder if the whole mess with annotations makes them wish it had been designed exclusively for static typing
It may have seemed clever allowing it to be used for various purposes but it feels like it backfired
well, it was initially developed as a generic feature
!pep 3107
I assume some stuff changed after 16 years
yeah
the thingis that static type annotations are an important enough feature to deserve their own syntax, that isn't shared with anything
i think many people, myself included, basically saw annotations that way, so from my perspective it's unfortunate to see annotations for typing weakened for these other use cases
I can't remember as it was a while ago but I remember when I was considering whether to use dataclasses, attrs or pydantic for something in my codebase, and finding pydantic's design very strange, and not liking it. The weird use of annotations was a big part of IIRC.
I think using annotations for serialization schemas and such is pretty clever, and saves quite a lot of nasty boilerplate.
Although, as always, TypeScript offers a better solution.
well, runtime inference of type hints may still be part of static, compiled languages, like Rust
though in that case it's compile-time obviously
which is not, exactly how that works either but the main point is type hints being able to affect behavior of casts in nested functions and library code
Are there any PEPs/discussions out there about a replacement for the descriptor/classmethod decorator combo? It's something I ran into recently while looking at how to make a class level property, and I saw that it was discouraged and outright disallowed in 3.11. I've been warned heavily against using metaclasses to achieve the same result, so I was curious if there were any proposals for something that would bring the same functionality with a safer method.
src/einspect/structs/py_object.py lines 150 to 156
@bind_api(pythonapi["PyObject_GetAttr"])
def GetAttr(self, name: str) -> object:
"""Return the attribute of the PyObject."""
@bind_api(pythonapi["PyObject_SetAttr"])
def SetAttr(self, name: str, value: object) -> int:
"""Set the attribute of the PyObject."""```
I use them here for example, works fine in 3.11
the GetAttr function becomes a descriptor on the class, and when accessed at class level I return an unbound function, and when accessed in the instance I return a bound function
src/einspect/protocols/delayed_bind.py line 37
class delayed_bind(property):```
`src/einspect/protocols/delayed_bind.py` line 101
```py
def __get__(self, instance: object | None, owner_cls: type[_CT]) -> _F:```
I apologize, my understanding of this is shaky. This was the discussion I ran into while trying to figure this out
https://github.com/python/cpython/issues/89519
i would check support in 3.12, some of the more abstract decorator usages were disallowed due to difficulty to implement/maintain
hm 
(like descriptor chaining has been removed to a degree i think)
welp uh
looks like I might have other issues with 3.12 π₯΄
Looks like you corrupted some GC header
I'm pretty sure the structure of the header changed in 3.12
What do you mean by runtime inference of type hints?
Idk, I feel like serialization etc worked fine with attrs which uses type annotations statically only afaik
Maybe you have a specific example in mind
that's probably a wrong way of saying it but essentially allowing nested generics to infer types from hints
also what do you mean "type annotations only statically"?
for attrs?
In rust
type inference is compile time, yes
but we don't really get that in python so it sort of has to be runtime
Not really, when you run python code through mypy etc nothing is being executed
unless we also get a dynamic compile time phase in which all types are forced to be known before runtime
It's still static
mypy is a linter essentially, it doesn't interact with the resulting code
Sure
The point is that all annotations can be understood without executing anything
As opposed to annotations where that's just not possible, you.have to execute code to evaluate the annotation
it's extremely unlikely python will switch to static typing, so runtime inference of type hints becomes a necessity
I think I don't understand the problem. If you want to use static type checking, just only use "pure", for lack of a better word, types, and use Pydantic-style annotations when you want to have runtime effects of the hints?
I meant parsing*
I guess the problem was that it had a big impact on some of the proposals around annotations, no?
Do you have an example?
Which proposals are you talking about?
from dataclass_factory import Factory # replace with any other similar library, e.g. Pydantic
df = Factory()
@dataclass
class Family:
uid: str
users: list[User]
@dataclass
class User:
name: str
age: int
family = df.load(some_thing.get_json(), Family)
i think clearly the whole 563 and 649 right?
re this: TypeScript does the opposite thing and lets you infer the type of the resulting object from the structure. e.g. here:
const user = (
isObj()
.field('name', isStr())
.field('age', isInt())
)
``` TS can infer that `user` is `Schema<{name: string, age: bigint}>`
IMO this makes a lot of sense
649 solves the problem cleanly though, doesn't it? I always thought 563 was ugly.
How si this an example of dynamic things showing up in type annotations?
that seems entirely static to me
I'm not entirely sure what you're talking about then 
this is exactly the kind of code I write with dataclasses, or that I could have written with attrs
Are you familiar with the whole 563 + pydantic drama?
pydantic puts things in the annotations that can only be evaluated dynamically, afair
which interacted very poorly with 563
ah, like Annotated?..
I'm not totally clear on that, I read a blog post about it, apparently there are still some nasty issues
As the author of PEP 563, I can summarize my position as follows: had PEP 563 never been implemented, it would be easy to accept PEP 649. However, in the current situation itβs not that clear because I find it very important to allow writing a single codebase that works on Python 3.7 - 3.11. If we can secure this behavior, Iβm +1 to accepting PE...
let's just switch to Rust btw
well, side effect of pythons type system, some types literally cannot be known before runtime
No, not really
you can choose to only put statically known things in type annotations.
?
For people who are using type annotations purely for the purposes of things like mypy, that's what you do
you can still, yeah. But it's not forced, just like any annotations
Yes, but if you're using annotations in the most common way
i.e. for the benefit of tooling
then it serves no purpose to put something dynamic in there
it was always dynamic
I like Rust a lot, but then I'm coming at it from teh angle of what to write instead of C++.
if you're okay with the performance of python there's a lot of things I'd look at first.
563 made it delayed
well that was a bit tongue in cheek
but before that annotations were always evaluated at define time
ah to be fair a lot of people say that unironically π
yeah, there's no argument there
563 was clearly a breaking change. but remember this convo started by you responding to my comment
I wonder if the whole mess with annotations make them wish it has been designed exclusively for static typing
also, difficult to say at this point if the most common use of annotations is for mypy and not for runtime libraries
if we're judging by amount of end users and not libraries
If type annotations had been designed exclusively for static typing then clearly they wouldn't have accepted anything dynamic, and this headache would ahve beena voided
Err, no, I dont' think it's that difficult to say
Rust is a much more complicated than Python, I don't think you can really substitute one for another
Most estimates are that something like half of professionally written python, at least, are using static annotations extensively
the tooling ecosystem has massive dependence on them
pycharm and vscode, if you don't have type annotations, the quality of everything the editor can do drops through the window
at this point, the static typing aspects of python go far, far beyond the impact of any one library or handful of libraries
Source?
I can't honestly remember
in any case everyone already lived with 563 for years now, I don't see how 649 makes mypy static checking any worse
just curious, does that not align with your experience?
are people writing non legacy codebases of python, over e.g. 10K lines, without using static type annotaitons?
I work in a Python-only company, there's almost no static typing used, including in new code.
(What we do use is Pydantic)
nobody lived with 563 because it wasn't accepted
hm? it was, that's how we got from future import annotations
fair enough. I did a quick google, according to the PSF, in a 2021 survey, static typing and "strict type hints" were the most desired feature
but it wasn't accepted into the 3.10 release? It' snever been default behavior.
it was a tentative thing, that you ahd to explicitly opt into
most "statically typed" code bases you talk about use the future import, and such uses 563
Anyway, you should read the blog post i linked (or I thought I linked?), it goes through a bit of the situation
I never did π€·ββοΈ
I've (mostly) read the blog post. I see that there are still some issues, but still see 649 as vastly superiour to 563.
this thing?
no
(PEP-563 was accepted. The eventual making-standard of the future import was just delayed and eventually scrapped.)
did 563 break anything in any case?
.... yes
this whole convo was supposed to be about 563 breaking pydantic
which is why it didn't become "default" or whatever the technical term is
simply because it still enables dynamic functionality for things like pydantic, no?
or is there another reason why it's superior?
well, no more eval, so much faster
Beauty.
also allows class attribute access
are you sure this is accurate? I'm under the impression that 563 is faster.
"faster" isn't a precise term. What things are faster?
sounds like @warm breach is referring to the speed of evaluating annotations (e.g. get_type_hints())
PEP 563 is likely faster for importing because there is less work to do at import time
sure. but if you're mostly concerned with the static typing use case, and clearly 563 is all about that, then 563 is faster, no?
can you give an example?
the blog post by the author of 563 I linked above says there are exactly two cases where 649 helps:
- when you have a locally defined class
- when you have local imports
this is the example on the 649 pep
from __future__ import co_annotations
class C:
def method(a: mytype):
pass
mytype = str
print(C.method.__annotations__)
!e whereas currently:
from __future__ import annotations
from typing import get_type_hints
class C:
def method(a: mytype):
pass
mytype = str
print(get_type_hints(C.method))
@warm breach :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 9, in <module>
003 | File "/usr/local/lib/python3.11/typing.py", line 2339, in get_type_hints
004 | hints[name] = _eval_type(value, globalns, localns)
005 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
006 | File "/usr/local/lib/python3.11/typing.py", line 359, in _eval_type
007 | return t._evaluate(globalns, localns, recursive_guard)
008 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
009 | File "/usr/local/lib/python3.11/typing.py", line 854, in _evaluate
010 | eval(self.__forward_code__, globalns, localns),
011 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/ixowutapah.txt?noredirect
okay, seems like this is equivalent to "locally defined class", just phrased differently
although it's obviously more annoying
Include/cpython/unicodeobject.h line 158
unsigned int interned:2;```
Include/cpython/unicodeobject.h line 103
unsigned int interned:1;```
so apparently it's because PyUnicodeObject.interned got changed from width 2 to 1
since SSTATE_INTERNED_IMMORTAL was removed
and so all my offsets after that are wrong in 3.12 π
ah that'll do it
thats probably not the only structure that changed
apparently it's just the unicode so far, that or I'm not testing the other stuff enough π
what's the goal of enforcing type hints when there aren't any?
so besides interned bit width, wstr fields removed
@struct
class PyUnicodeObject(PyObject):
length: int
hash: Annotated[int, c_int64]
- _interned: Annotated[int, c_uint, 2]
+ _interned: Annotated[int, c_uint, 1]
_kind: Annotated[int, c_uint, 3]
compact: Annotated[int, c_uint, 1]
ascii: Annotated[int, c_uint, 1]
padding: Annotated[int, c_uint, 26]
- wstr: POINTER(ctypes.c_wchar)
# Fields after this do not exist if ascii
utf8_length: int
utf8: ctypes.c_char_p
- wstr_length: int
# Fields after this do not exist if compact
data: LegacyUnion
so after that and besides some enum deprecation warning seems to be working on 3.12.0a3 π
i would still go through and verify all of your structures to avoid edge cases (or we could work on writing a python version source parser to generate the structs at install time)
π yeah we could
https://github.com/trolldbois/ctypeslib this seems to already do a bunch of similar stuff
though just parsing header files to the usual _fields_ style instead of type hints
and uses clang LLVM apparently..?
I could parse the output from that I guess and generate my file, or just hook into some modified version of that
maybe
π