pliant tusk Jan 17, 2023, 6:08 PM

#

ah interdependencies

grave jolt Jan 17, 2023, 6:10 PM

#

headers 🥴

#

probably the feature of C that aged very poorly

warm breach Jan 17, 2023, 6:27 PM

#

wait

#

what if I make an api call and just ask GPT3 to translate it

#

👀

halcyon trail Jan 17, 2023, 6:32 PM

#

grave jolt probably _the_ feature of C that aged very poorly

forgetting a closing brace in a header in some C++ code is truly one of the great programming experiences

feral island Jan 17, 2023, 6:34 PM

#

what if Python imports also worked by literally copying in the code from the imported module

pliant tusk Jan 17, 2023, 6:35 PM

#

warm breach what if I make an api call and just ask GPT3 to translate it

oh don't give GPT3 a free rce on your users machines lmao

halcyon trail Jan 17, 2023, 6:36 PM

#

feral island what if Python imports also worked by literally copying in the code from the imp...

i'd be a very sad panda

pliant tusk Jan 17, 2023, 6:39 PM

#

halcyon trail i'd be a very sad panda

import preprocess
#include "ns.py"

NS_BEGIN(foo)
x = 1
y = 2
z = 3
NS_END

print(foo)
``` i mean you can do it if you want 
`preprocess.py`
```py
import sys, dis, subprocess
frame = sys._getframe()
while frame := frame.f_back:
    f_code = frame.f_code
    if f_code.co_code[frame.f_lasti] == dis.opmap['IMPORT_NAME'] \
        and f_code.co_names[f_code.co_code[frame.f_lasti + 1]] == 'preprocess':
        file = frame.f_globals['__file__']
        if file:
            processed = subprocess.run(['gcc', '-E','-x','c', file], stdout=subprocess.PIPE)
            if processed.returncode == 0:
                exec(processed.stdout.decode())
            exit(processed.returncode)

ns.py

#define NS_BEGIN(name) __import__("builtins").__ns_globals__ = globals().copy(); globals().clear(); __ns_name__ = #name ;
#define NS_END __ns__=__import__("types").SimpleNamespace(**{k: v for k, v in globals().items() if k != "__ns_name__"}); __import__("builtins").__ns_globals__[__ns_name__]=__ns__; globals().clear(); globals().update(__import__("builtins").__ns_globals__)

#

note that this is a very bad implementation lmao

halcyon trail Jan 17, 2023, 7:06 PM

#

Sure, but do it if you want is different from import working that way

quick snow Jan 17, 2023, 7:33 PM

#

there are type hints (when using Pydantic), but we don't do static typing

sinful saddle Jan 17, 2023, 8:07 PM

#

Is there any guidance on naming conventions for dictionary keys specifically on the use of spaces?

grave jolt Jan 17, 2023, 8:10 PM

#

sinful saddle Is there any guidance on naming conventions for dictionary keys specifically on ...

What do you mean?

#

IIRC PEP 8 prescribes this: ```py
foo = {
"fizz": 1,
"buzz": 2,
}

sinful saddle Jan 17, 2023, 8:11 PM

#

I mean on the use of spaces within dictionary keys.

flat gazelle Jan 17, 2023, 8:12 PM

#

sinful saddle Is there any guidance on naming conventions for dictionary keys specifically on ...

the convention is to use a class with attributes instead. If you are using a dict, it generally means the schema is defined elsewhere and you should follow that convention.

grave jolt Jan 17, 2023, 8:12 PM

#

sinful saddle I mean on the use of spaces within dictionary keys.

example maybe?

#

I don't see why a dictionary key can't contain a space

#

but if you're storing some fixed attributes, yeah, use a class

#

(maybe a dataclasses.dataclass or see attrs, if you want a dumb record)

flat gazelle Jan 17, 2023, 8:14 PM

#

I would say the python convention is more on not having spaces in keys (e.g. a TypedDict can't describe a dict with spaces in its keys afaik)

feral island Jan 17, 2023, 8:15 PM

#

flat gazelle I would say the python convention is more on not having spaces in keys (e.g. a ...

it can but you need to use the alternative syntax

feral island Jan 17, 2023, 8:15 PM

#

sinful saddle Is there any guidance on naming conventions for dictionary keys specifically on ...

it depends on what your dictionary is for

#

but as other people here have said, often a class with attributes is a better choice than a dictionary with hardcoded keys

gray galleon Jan 18, 2023, 2:52 AM

#

why are True, False and None capitalized if they aren't classes

rich cradle Jan 18, 2023, 2:55 AM

#

for the former two, the PEP has some rational: https://peps.python.org/pep-0285/#resolved-issues (second bullet, as well as assorted comments throughout the PEP)

PEP 285 – Adding a bool type | peps.python.org

Python Enhancement Proposals (PEPs)

gray galleon Jan 18, 2023, 2:59 AM

#

what, bool type was introduced that late?

#

anyways True and False is capitalized because of None
and None is capitalizd because of guido i guess

warm breach Jan 18, 2023, 3:57 AM

#

pliant tusk ah interdependencies

gave up on clang, gonna parse with re instead 🥴

#

re.compile('struct _object {.*?};', re.DOTALL)
<re.Match object; span=(4049, 4146), match='struct _object {\n    _PyObject_HEAD_EXTRA\n    P>

struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    PyTypeObject *ob_type;
};

pliant tusk Jan 18, 2023, 3:57 AM

#

Oh, that will never work

#

Too many nested structs, or structs that contain fields built by the preprocessor

warm breach Jan 18, 2023, 4:21 AM

#

pliant tusk Oh, that will never work

trying to get that ctypeslib thing to work again, but it can't find the pyconfig.h somehow (which is in the current root path)

❯ clang2py --clang-args="-stdlib=libc++ -I -I. -IInclude" "Objects/dictobject.c"
WARNING:clangparser:'pyconfig.h' file not found (Objects/dictobject.c:12:10)
WARNING:clangparser:Source code has 1 error. Please fix.

#

do you know if I'm doing the include args wrong for clang 😔

pliant tusk Jan 18, 2023, 4:31 AM

#

You probably need to find a way to integrate ctypeslib with pythons makefile

warm breach Jan 18, 2023, 5:12 AM

#

pliant tusk Too many nested structs, or structs that contain fields built by the preprocesso...

so I've got it parsing the cpython source and my own class def

#

at least I might be able to add it to testing and see if they ever mismatch

#

probably not good enough to generate anything yet though

pliant tusk Jan 18, 2023, 5:12 AM

#

warm breach so I've got it parsing the cpython source and my own class def

Nice, lmk when it is on einspects GitHub and I'll check it out

warm breach Jan 18, 2023, 5:13 AM

#

https://paste.pythondiscord.com/buwimiyeqo

#

here's the craziness 🥴

#

goes in a tools/ folder at root

#

and the config file tools/struct_source.toml

[config]
repository = "https://github.com/python/cpython"
versions = ["3.8", "3.9", "3.10", "3.11"]

[structs.py_object.PyObject]
source = "Include/object.h"
regex = "struct _object {(.*?)};"
exclude_fields = ["_PyObject_HEAD_EXTRA"]

pliant tusk Jan 18, 2023, 5:16 AM

#

warm breach and the config file `tools/struct_source.toml` ```toml [config] repository = "ht...

You need to run the preprocessor before extracting the structs, that will clear out macros and then you are just dealing with C code

warm breach Jan 18, 2023, 5:19 AM

#

pliant tusk You need to run the preprocessor before extracting the structs, that will clear ...

I get this https://paste.pythondiscord.com/edoneqahiz from gcc -E Include/object.h -o out.h

#

I guess it's mildly cleaner to parse

pliant tusk Jan 18, 2023, 5:20 AM

#

warm breach I guess it's mildly cleaner to parse

Yea would just need to clean out comments and empty lines

warm breach Jan 18, 2023, 5:20 AM

#

maybe I can define some "core" types like Py_ssize_t and then recursively resolve the types myself? 🥴

pliant tusk Jan 18, 2023, 5:21 AM

#

warm breach maybe I can define some "core" types like `Py_ssize_t` and then recursively reso...

You could even parse the typedefs to detect function definitions

warm breach Jan 18, 2023, 5:24 AM

#

pliant tusk You could even parse the typedefs to detect function definitions

Yeah structure wise, the mapping should be quite simple

typedef void (*freefunc)(void *);
typedef void (*destructor)(PyObject *);
typedef PyObject *(*getattrfunc)(PyObject *, char *);
typedef PyObject *(*getattrofunc)(PyObject *, PyObject *);

#

https://github.com/ionite34/einspect/blob/main/src/einspect/structs/include/object_h.py#L70-L73

fallen slateBOT Jan 18, 2023, 5:24 AM

#

src/einspect/structs/include/object_h.py lines 70 to 73

freefunc = PYFUNCTYPE(None, c_void_p)
destructor = PYFUNCTYPE(None, py_object)
getattrfunc = PYFUNCTYPE(py_object, py_object, c_char_p)
getattrofunc = PYFUNCTYPE(py_object, py_object, py_object)```

pliant tusk Jan 18, 2023, 5:26 AM

#

warm breach Yeah structure wise, the mapping should be quite simple ```c typedef void (*free...

You should see if there is some static C type parser that breaks them down into just type data

sacred yew Jan 18, 2023, 5:41 AM

#

isn't https://github.com/eliben/pycparser a thing?

GitHub

GitHub - eliben/pycparser: Complete C99 parser in pure Python

:snake: Complete C99 parser in pure Python. Contribute to eliben/pycparser development by creating an account on GitHub.

warm breach Jan 18, 2023, 5:56 AM

#

sacred yew isn't https://github.com/eliben/pycparser a thing?

😔

ParseError: object.h:1:1: Directives not supported yet

sacred yew Jan 18, 2023, 6:09 AM

#

warm breach 😔 ```py ParseError: object.h:1:1: Directives not supported yet ```

i think you need to run it through the preprocessor first

warm breach Jan 18, 2023, 3:59 PM

#

!e

from ctypes import *

realloc = pythonapi["PyMem_Realloc"]
realloc.argtypes = [c_void_p, c_size_t]
realloc.restype = c_void_p

t = (1, 2)
print(id(t))

res = realloc(id(t), 64)
print(res)

fallen slateBOT Jan 18, 2023, 3:59 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 140266784484928
002 | 140266784484928

warm breach Jan 18, 2023, 4:00 PM

#

!e

from ctypes import *

realloc = pythonapi["PyMem_Realloc"]
realloc.argtypes = [c_void_p, c_size_t]
realloc.restype = c_void_p

t = (1, 2)
print(id(t))

res = realloc(id(t), 80)
print(res)

fallen slateBOT Jan 18, 2023, 4:00 PM

#

@warm breach :x: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

001 | 140541912034816
002 | 140541912043472

warm breach Jan 18, 2023, 4:00 PM

#

is there a reason why PyMem_Realloc here at larger sizes segfaults? (Shouldn't it just return NULL if it can reallocate?)

pliant tusk Jan 18, 2023, 6:00 PM

#

warm breach is there a reason why `PyMem_Realloc` here at larger sizes segfaults? (Shouldn't...

its segfaulting because realloc is freeing the old address of t, but you still have a reference to old t so when the interpreter closes it breaks

pliant tusk Jan 18, 2023, 6:00 PM

#

warm breach !e ```py from ctypes import * realloc = pythonapi["PyMem_Realloc"] realloc.argt...

this one is able to realloc in place, so it doesnt free id(t)

warm breach Jan 18, 2023, 6:01 PM

#

pliant tusk its segfaulting because realloc is freeing the old address of `t`, but you still...

the realloc frees it? firHmm

warm breach Jan 18, 2023, 6:01 PM

#

pliant tusk this one is able to realloc in place, so it doesnt free `id(t)`

isn't that realloc call always in place?

#

https://docs.python.org/3/c-api/memory.html#c.PyObject_Realloc

Python documentation

Memory Management

Overview: Memory management in Python involves a private heap containing all Python objects and data structures. The management of this private heap is ensured internally by the Python memory manag...

pliant tusk Jan 18, 2023, 6:02 PM

#

not if it cannot find enough space

warm breach Jan 18, 2023, 6:03 PM

#

oh, hm

#

how do strings do that thing where they can try to resize safely firT

#

is that something in the stable abi

pliant tusk Jan 18, 2023, 6:04 PM

#

warm breach how do strings do that thing where they can try to resize safely <:firT:78570196...

strings like str()? afaik they never resize

feral island Jan 18, 2023, 6:04 PM

#

warm breach how do strings do that thing where they can try to resize safely <:firT:78570196...

they can only resize while they're being created I think, when there's only one reference

#

and even that API is probably somewhat unsafe

pliant tusk Jan 18, 2023, 6:06 PM

#

https://github.com/python/cpython/blob/main/Objects/bytearrayobject.c#L232-L243 here you can see that realloc will free the pointer passed in if it cannot operate in place

fallen slateBOT Jan 18, 2023, 6:06 PM

#

Objects/bytearrayobject.c lines 232 to 243

else {
    sval = PyObject_Realloc(obj->ob_bytes, alloc);
    if (sval == NULL) {
        PyErr_NoMemory();
        return -1;
    }
}

obj->ob_bytes = obj->ob_start = sval;
Py_SET_SIZE(self, size);
obj->ob_alloc = alloc;
obj->ob_bytes[size] = '\0'; /* Trailing null byte */```

warm breach Jan 18, 2023, 6:11 PM

#

feral island they can only resize while they're being created I think, when there's only one ...

I mean like this thing https://github.com/python/cpython/blob/3.11/Objects/unicodeobject.c#L11530-L11535

fallen slateBOT Jan 18, 2023, 6:11 PM

#

Objects/unicodeobject.c lines 11530 to 11535

/* append inplace */
if (unicode_resize(p_left, new_len) != 0)
    goto error;

/* copy 'right' into the newly allocated area of 'left' */
_PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);```

warm breach Jan 18, 2023, 6:11 PM

#

the "in place" string append

#

https://github.com/python/cpython/blob/3.11/Objects/unicodeobject.c#L1090-L1091
https://github.com/python/cpython/blob/3.11/Objects/unicodeobject.c#L1124

feral island Jan 18, 2023, 6:14 PM

#

I guess it's an optimization that only applies if refcnt == 1 https://github.com/python/cpython/blob/3.11/Objects/unicodeobject.c#L1978

fallen slateBOT Jan 18, 2023, 6:14 PM

#

Objects/unicodeobject.c lines 1090 to 1091

static int
resize_inplace(PyObject *unicode, Py_ssize_t length)```
`Objects/unicodeobject.c` line 1124
```c
data = (PyObject *)PyObject_Realloc(data, new_size);```

#

Objects/unicodeobject.c line 1978

static int```

warm breach Jan 18, 2023, 6:14 PM

#

this also seems to call PyObject_Realloc unconditionally pithink

#

ah

#

I guess it doesn't matter if the realloc isn't in place there

#

since it's 1 ref count and it just returns the new allocation

warm breach Jan 18, 2023, 6:25 PM

#

pliant tusk strings like `str()`? afaik they never resize

!e apparently you can trick it to mutate the string in-place if the ref count is 1 👀

from einspect.structs import PyObject

s = "a"
s += "b"
text = s

PyObject.from_object(s).DecRef()

s += "1"
s += "2"
print(s)
print(text)

PyObject.from_object(s).IncRef()

fallen slateBOT Jan 18, 2023, 6:25 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | ab12
002 | ab12

quick snow Jan 18, 2023, 6:31 PM

#

!e I'm assuming this fails?

from einspect.structs import PyObject

s = "a"
s += "b"
text = s

PyObject.from_object(s).DecRef()

s += "1"*50
s += "2"*50
print(s)
print(text)

PyObject.from_object(s).IncRef()

grave jolt Jan 18, 2023, 6:32 PM

#

wait what

fallen slateBOT Jan 18, 2023, 6:32 PM

#

@quick snow :x: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

001 | ab1111111111111111111111111111111111111111111111111122222222222222222222222222222222222222222222222222
002 | flush

feral island Jan 18, 2023, 6:32 PM

#

you are accessing random memory

grave jolt Jan 18, 2023, 6:32 PM

#

ah right

#

😳

feral island Jan 18, 2023, 6:33 PM

#

print calls f.flush(), so probably that's where the flush string comes from

#

it happened to be allocated in the same place

warm breach Jan 18, 2023, 6:37 PM

#

quick snow !e I'm assuming this fails? ```py from einspect.structs import PyObject s = "a...

I don't think that resize happened in-place since your append was too large

quick snow Jan 18, 2023, 6:37 PM

#

warm breach I don't think that resize happened in-place since your append was too large

Yes, that's what I was testing

warm breach Jan 18, 2023, 6:37 PM

#

so s += "1"*50 shadowed the original s with a new object

quick snow Jan 18, 2023, 6:38 PM

#

I'm assuming it's about memory alignment, that your version works?

warm breach Jan 18, 2023, 6:38 PM

#

and since we modified refcount of s to be 1, it got dropped

#

so later print(text) prints garbage memory

warm breach Jan 18, 2023, 6:39 PM

#

quick snow I'm assuming it's about memory alignment, that your version works?

well string += only tries to resize in place if it has enough space, so the shorter one works

#

!e

s = "a"
s += "b"
print(id(s))

s += "1"
s += "2"
print(id(s))

fallen slateBOT Jan 18, 2023, 6:39 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 140050976782704
002 | 140050976782704

warm breach Jan 18, 2023, 6:39 PM

#

!e but append longer and it'll (probably) no longer be in place

s = "a"
s += "b"
print(id(s))

s += "111111111"
s += "222222222"
print(id(s))

fallen slateBOT Jan 18, 2023, 6:40 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 139980662602224
002 | 139980662606864

quick snow Jan 18, 2023, 6:40 PM

#

warm breach well string `+=` only tries to resize in place if it has enough space, so the sh...

Sure, but why is there enough space when the added string is small? Everything is aligned in groups of 8 bytes or something like that?

grave jolt Jan 18, 2023, 6:41 PM

#

!e

from einspect.structs import PyObject

s = "a"
s += "b"
text = s

PyObject.from_object(s).DecRef()

s += "3"*4_000_000
s += "55"*6_000_0
#print(s)
#print(text)
print("hi")
PyObject.from_object(s).IncRef()

warm breach Jan 18, 2023, 6:42 PM

#

quick snow Sure, but why is there enough space when the added string is small? Everything i...

it should be 16 I think

fallen slateBOT Jan 18, 2023, 6:43 PM

#

@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.

hi

pliant tusk Jan 18, 2023, 6:43 PM

#

but also due to the way that python uses memory pools you can get weird spacing around objects

warm breach Jan 18, 2023, 6:43 PM

#

!e

print("ab".__sizeof__())
print("ab12".__sizeof__())

fallen slateBOT Jan 18, 2023, 6:43 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 51
002 | 53

warm breach Jan 18, 2023, 6:43 PM

#

so "ab" had 64 bytes allocated

#

and "ab12" happens to fit fine

#

!e

s = "a"
s += "b"
print(id(s))

s += "1234567890"
s += "abc"

print(id(s))
print(s.__sizeof__())

fallen slateBOT Jan 18, 2023, 6:45 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 140533952084528
002 | 140533952084528
003 | 64

warm breach Jan 18, 2023, 6:45 PM

#

so you can append all the way up to 64 and it'll resize in place

#

!e but one more and it can't

s = "a"
s += "b"
print(id(s))

s += "1234567890"
s += "abcd"

print(id(s))
print(s.__sizeof__())

fallen slateBOT Jan 18, 2023, 6:46 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 140460387744304
002 | 140460387748880
003 | 65

feral island Jan 18, 2023, 6:46 PM

#

64 bytes? that feels like a lot of overhead for a two-char string

#

!e ```
import sys
print(sys.getsizeof("ab"))
print(sys.getsizeof("a"))

fallen slateBOT Jan 18, 2023, 6:47 PM

#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 51
002 | 50

feral island Jan 18, 2023, 6:48 PM

#

!e ```
import sys
a = "a"
a += "b"
print(sys.getsizeof(a))
print(sys.getsizeof("a"))

fallen slateBOT Jan 18, 2023, 6:48 PM

#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 51
002 | 50

warm breach Jan 18, 2023, 6:52 PM

#

feral island 64 bytes? that feels like a lot of overhead for a two-char string

well um

#

they removed the wstr field in 3.12 at least

#

so ascii compact strings are 8 bytes smaller

#

Python 3.12.0a3 (main, Jan 18 2023, 01:07:36) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> a = "a"
>>> a += "b"
>>> sys.getsizeof(a)
43
>>> sys.getsizeof("a")
42

feral island Jan 18, 2023, 6:52 PM

#

interestingly in my local 3.9 "a" was 58 bytes but "ab" was 51 bytes

thorn barn Jan 18, 2023, 9:30 PM

#

Hi everyone,
I have a question regarding memory footprint in Python. There is the possibility of creating a class with dunder slots to remove the dunder dict attribute and thereby removing some memory overhead for each instance. To take a look at this a created a toy example like:

class Point:

def __init__(self, x, y, z):
    self.x = x
    self.y = y
    self.z = z

as well as a version with dunder slots and another one based on a namedtuple to compare the three cases.
Since the sys.getsizeof function only returns a simplistic value I used the getsize function from this post, which keeps track of references. https://stackoverflow.com/questions/449560/how-do-i-determine-the-size-of-an-object-in-python

Now comes the weird part that I don't understand:
I was comparing the results for different python versions, where 3.10 shows with the getsize function for an instance of the point class 236bytes while the slots version takes 140 bytes.
Now with version 3.11 the normal class version drops down to 140 bytes (just like the slots version).
So I was checking if the dunder dict still exists which does, but after calling the dunder dict the size that getsize returns jumps up to 436 bytes. In version 3.10 this is not happening.

Has someone an explanation for this behavior? The 3.11 release notes do not give me a hint that they have done some memory optimization in this regard.

You can find the full working example in this godbolt link where you can easily switch between the versions of Python.

https://godbolt.org/z/PsYYKxb49

Stack Overflow

How do I determine the size of an object in Python?

How do I get the size occupied in memory by an object in Python?

Compiler Explorer - Python (Python 3.11)

from collections import namedtuple
import sys
from types import ModuleType, FunctionType
from gc import get_referents

how-do-i-determine-the-size-of-an-object-in-python

Custom objects know their class.

Function objects seem to know way too much, including modules.

Exc...

feral island Jan 18, 2023, 9:30 PM

#

thorn barn Hi everyone, I have a question regarding memory footprint in Python. There is th...

the __dict__ is lazily created on access I believe

dusk comet Jan 18, 2023, 9:35 PM

#

IIRC, dict's are also allocating memory for hash table on first access

thorn barn Jan 18, 2023, 9:38 PM

#

would you say that it is still valid in python 3.11 that dunder slots will safe you memory?

pliant tusk Jan 18, 2023, 9:42 PM

#

feral island the `__dict__` is lazily created on access I believe

Isn't the dict allocated on instantiation for any classes that do not define any slots?

feral island Jan 18, 2023, 9:42 PM

#

yes, unless you use __dict__ directly, which generally you shouldn't be doing

#

(that was in response to @thorn barn )

feral island Jan 18, 2023, 9:43 PM

#

pliant tusk Isn't the dict allocated on instantiation for any classes that do not define any...

not anymore I believe in 3.11 but not familiar with the details on how this works

#

https://docs.python.org/3/whatsnew/3.11.html#misc has the links

thorn barn Jan 18, 2023, 9:45 PM

#

ahh nice thats it https://github.com/python/cpython/pull/28802

GitHub

bpo-45340: Don't create object dictionaries unless actually needed ...

A "normal" Python objects is conceptually just a pair of pointers, one to the class, and one to the dictionary.
With shared keys, the dictionary is redundant as it is no more than a pair ...

pliant tusk Jan 18, 2023, 9:45 PM

#

Ahh so the objects hold a pointer to their keys and values table, and lazily create the dict that references them

thorn barn Jan 18, 2023, 9:46 PM

#

thanks for the help!

pliant tusk Jan 18, 2023, 9:54 PM

#

feral island not anymore I believe in 3.11 but not familiar with the details on how this work...

I wonder if there would be any noticeable speedup by allowing specifying types in __slots__, like __slots__ = [('attr', int)] to let python optimize the slot into a py_ssize_t, or bytes would specify char*. I assume there would be a worry that optimizing like that would narrow the types

feral island Jan 18, 2023, 10:02 PM

#

pliant tusk I wonder if there would be any noticeable speedup by allowing specifying types i...

something like that could work, but there's a lot of problems to sort out

#

for one, an int might not fit in a py_ssize_t

#

and it might not be faster because you have to perform boxing when accessing the attributes

pliant tusk Jan 18, 2023, 10:02 PM

#

feral island and it might not be faster because you have to perform boxing when accessing the...

yea fair

grave jolt Jan 18, 2023, 10:03 PM

#

yeah

#

For example: you ask for foo.bar to get back a str object. Does that object have 1 reference? 2 references?

#

Is it immortal?

#

If we suppose that we don't want to copy the whole unicode string into a boxed object, if that object has 1 reference, how do you solve the += optimisation?

pliant tusk Jan 18, 2023, 10:06 PM

#

yea maybe it wouldnt be worth it

dusk comet Jan 18, 2023, 10:09 PM

#

pliant tusk I wonder if there would be any noticeable speedup by allowing specifying types i...

I think, in this case adaptive interpreter can optimize some instructions with this fields in advance

feral island Jan 18, 2023, 10:09 PM

#

yeah agree, that's where this could help

#

like if the adaptive interpreter sees 1 + foo.bar and knows that foo has a field of type int, it could potentially optimize it into just a pointer access + machine ADD instruction

dusk comet Jan 18, 2023, 10:10 PM

#

There is also a lot of possible problems

warm breach Jan 19, 2023, 9:23 AM

#

so TIL python ints are also (sometimes) mutable in 3.12 now

#

same thing as the current string append optimization

dusk comet Jan 19, 2023, 1:22 PM

#

I guess float's and complex's also can be mutable

halcyon trail Jan 19, 2023, 2:04 PM

#

I was going to say I'm surprised it's worth but since creating a new int would typically involve a heap allocation, that makes sense

umbral plume Jan 19, 2023, 5:58 PM

#

not too familiar with the string optimisation, but i'm guessing that means something along the lines of "big integers with a sole reference can be mutated and returned as if its a new object, rather than allocating a new integer and freeing the old one"?

halcyon trail Jan 19, 2023, 6:19 PM

#

i assume. but they don't need to be big.

#

or at least, unless by "big" you simply mean "bigger than the handful of integers that cpython always allocates space for"

#

(a few hundred, iirc)

spice pecan Jan 19, 2023, 6:48 PM

#

-5 through 256 I think

#

Most common ones

warm breach Jan 20, 2023, 1:53 AM

#

umbral plume not too familiar with the string optimisation, but i'm guessing that means somet...

yeah, I think the main intended use case was for range()

#

this id should stay the same in >= 3.12.0a1

for i in range(300, 500):
    i += 1
    print(id(i))

dusk comet Jan 20, 2023, 1:30 PM

#

I think in <=3.11 it will switch between two ids

#

!e ```py
for i in range(300, 310):
i += 1
print(id(i))

fallen slateBOT Jan 20, 2023, 1:31 PM

#

@dusk comet :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 140413941526768
002 | 140413941526768
003 | 140413941526768
004 | 140413941526768
005 | 140413941526768
006 | 140413941526768
007 | 140413941526768
008 | 140413941526768
009 | 140413941526768
010 | 140413941526768

dusk comet Jan 20, 2023, 1:31 PM

#

!e ```py
for i in range(300, 310):
print(id(i))

fallen slateBOT Jan 20, 2023, 1:31 PM

#

@dusk comet :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 139947360022800
002 | 139947360022768
003 | 139947360022800
004 | 139947360022768
005 | 139947360022800
006 | 139947360022768
007 | 139947360022800
008 | 139947360022768
009 | 139947360022800
010 | 139947360022768

quick trellis Jan 20, 2023, 1:51 PM

#

dusk comet I think in <=3.11 it will switch between two ids

why?

feral island Jan 20, 2023, 1:53 PM

#

quick trellis why?

I think the iterator creates the next int before the previous one can be deallocated

#

and because of free lists they get reallocated in the same small set of spots

quick trellis Jan 20, 2023, 2:05 PM

#

feral island and because of free lists they get reallocated in the same small set of spots

thank you i understand. so how come it's different in 3.11? what was it before?

feral island Jan 20, 2023, 5:41 PM

#

quick trellis thank you i understand. so how come it's different in 3.11? what was it before?

what I describe was how it worked before. I think 3.11 put in an optimization so the integer gets reused

dusk comet Jan 20, 2023, 8:21 PM

#

I see the same behaviour in CPython3.7.
(id(x) is very small because it is a 32-bit build)

>>> for i in range(300, 310):
...     i += 1
...     print(id(i))
...
9952576
9952576
9952576
9952576
9952576
9952576
9952576
9952576
9952576
9952576
>>> for i in range(300, 310):
...     print(id(i))
...
9952592
9952576
9952592
9952576
9952592
9952576
9952592
9952576
9952592
9952576
>>> import sys; sys.version
'3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:01:55) [MSC v.1900 32 bit (Intel)]'

#

i dont think these two examples are useful to observe mutability of ints

halcyon trail Jan 20, 2023, 11:04 PM

#

I've often been surprised at the fact that there hasn't been a mainstream language designed entirely with these semantics in mind

#

You'd get the important benefits of immutability, and the ergonomics and mostly the performance of mutability

#

Those semantics aren't ideal when you want to deliberately share mutability, or for multithreading.
but those are both a small minority of cases, so you could explicitly annotate when you wanted that

#

I've been bitten by mutability of lists and dicts quite a few times in python, if strings/ints/etc were mutable too I imagine I'd get bit much more often

warm breach Jan 21, 2023, 2:12 PM

#

this thing essentially https://github.com/python/cpython/blob/main/Objects/longobject.c#L283-L290

fallen slateBOT Jan 21, 2023, 2:12 PM

#

Objects/longobject.c lines 283 to 290

// Mutate in place if there are no other references the old
// object.  This avoids an allocation in a common case.
// Since the primary use-case is iterating over ranges, which
// are typically positive, only do this optimization
// for positive integers (for now).
((PyLongObject *)old)->ob_digit[0] =
    Py_SAFE_DOWNCAST(value, Py_ssize_t, digit);
return 0;```

warm breach Jan 21, 2023, 2:13 PM

#

python/cpython#91713

neon troutBOT Jan 21, 2023, 2:13 PM

#

GitHub

PRMerged [cpython] #91713 gh-91432: Specialize FOR_ITER

warm breach Jan 21, 2023, 3:19 PM

#

anyone know where the str.__sizeof__ implementation is

#

can't find 😔

#

~~nevermind found it now, named with one underscore unlike the other ones~~

feral island Jan 21, 2023, 4:41 PM

#

warm breach anyone know where the `str.__sizeof__` implementation is

haven't checked the code but maybe it's just on object and derived from the size fields on the type object?

warm breach Jan 21, 2023, 4:42 PM

#

feral island haven't checked the code but maybe it's just on `object` and derived from the si...

nah I found it here 😅 https://github.com/python/cpython/blob/3.11/Objects/unicodeobject.c#L14120-L14121

fallen slateBOT Jan 21, 2023, 4:42 PM

#

Objects/unicodeobject.c lines 14120 to 14121

static PyObject *
unicode_sizeof_impl(PyObject *self)```

warm breach Jan 21, 2023, 4:43 PM

#

I was searching for ___sizeof___impl since everything else was named liked that

#

but unicode does _sizeof_impl for some reason

#

@pliant tusk so copilot is getting pretty good at translating cpython functions now 👀

pliant tusk Jan 21, 2023, 4:51 PM

#

Impressive

#

Can it translate the structs too?

warm breach Jan 21, 2023, 4:52 PM

#

warm breach Jan 21, 2023, 4:58 PM

#

pliant tusk Can it translate the structs too?

not really, sometimes still has weird ideas

pliant tusk Jan 21, 2023, 4:59 PM

#

That seems close right?

#

(I haven't looked at how structs are defined in einspect yet)

warm breach Jan 21, 2023, 5:01 PM

#

@struct
class SetEntry(Structure, AsRef, Generic[_T]):
    key: ptr[PyObject[_T, None, None]]
    hash: Annotated[int, Py_hash_t]  # noqa: A003

#

here's the actual one

#

to be fair my Annotated usage is pretty arbitrary so

#

it gets parsed as

Annotated[<ignored>, type]

or

Annotated[<ignored>, type, bit-width]

#

mainly due to ctypes autocasting, if you type something as c_uint32 it won't actually be that type at runtime (will be cast to int instead)

warm breach Jan 21, 2023, 7:32 PM

#

@rose schooner 👀
https://github.com/ionite34/einspect/blob/dev/src/einspect/views/view_str.py#L24

fallen slateBOT Jan 21, 2023, 7:32 PM

#

src/einspect/views/view_str.py line 24

class StrView(View[str, None, None], MutableSequence):```

warm breach Jan 21, 2023, 7:33 PM

#

safe-ish so far

from einspect import view

s = "abc🦀"
v = view(s)

v[-1] = "!"
print(s)
# abc!

v[:] = "🤔12🐍"
print(s)
# 🤔12🐍

del v[1:]
print(s)
# 🤔

v.extend(["D", "E", "F"])
print(s)
# 🤔DEF

v.reverse()
print(s)
# FED🤔

v.clear()
assert s == ""

v.append("こんにちは")
print(s)
# こんにちは

quick snow Jan 21, 2023, 8:26 PM

#

warm breach safe-ish so far ```py from einspect import view s = "abc🦀" v = view(s) v[-1] ...

What happens if you start with "abc!"? And how does your last append work? Where did it get the extra space from? Memory alignment again, and it wouldn't work to append a longer string?

warm breach Jan 21, 2023, 8:28 PM

#

quick snow What happens if you start with `"abc!"`? And how does your last append work? Whe...

!e wouldn't work since the 🦀 needs more space than abc! had allocated

print("abc!".__sizeof__())
print("abc🦀".__sizeof__())

fallen slateBOT Jan 21, 2023, 8:29 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 53
002 | 92

warm breach Jan 21, 2023, 8:30 PM

#

from einspect import view

s = "abc!"
v = view(s)
v[3] = "🦀"

Traceback (most recent call last):
  File "main.py", line 5, in <module>
    v[3] = "🦀"
    ~^^^
  File "einspect/views/view_str.py", line 74, in __setitem__
    raise UnsafeError(
einspect.errors.UnsafeError: setitem required str to be resized beyond current memory allocation. Enter an unsafe context to allow this.

vernal loom Jan 21, 2023, 8:30 PM

#

!e ```py
print(list("abc!".encode('u8')))
print(list("abc🦀".encode('u8')))

fallen slateBOT Jan 21, 2023, 8:30 PM

#

@vernal loom :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | b'abc!'
002 | b'abc\xf0\x9f\xa6\x80'

vernal loom Jan 21, 2023, 8:30 PM

#

that's why it's longer

#

!e ```py
print(list("abc!".encode('u8')))
print(list("abc🦀".encode('u8')))

fallen slateBOT Jan 21, 2023, 8:31 PM

#

@vernal loom :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [97, 98, 99, 33]
002 | [97, 98, 99, 240, 159, 166, 128]

warm breach Jan 21, 2023, 8:31 PM

#

but otherwise, given enough space, it dynamically reallocates the PyObject between the 3 str subtypes - PyASCIIObject | PyCompactUnicodeObject | PyUnicodeObject

#

from einspect import view

s = "abc🦀"
v = view(s)
print(v)
>> StrView(<PyCompactUnicodeObject at 0x10449d920>)

v[3] = "!"
print(v)
>> StrView(<PyASCIIObject at 0x10449d920>)

vernal loom Jan 21, 2023, 8:33 PM

#

can you allocate more space in einspect?

warm breach Jan 21, 2023, 8:34 PM

#

well yes but

#

like realloc it might not be in-place

#

a str might already have other structs after it

#

not much point in doing that since python variables are essentially memory pointers, and if you move the object somewhere else the original variables now point to random memory

quick snow Jan 21, 2023, 9:33 PM

#

vernal loom !e ```py print(list("abc!".encode('u8'))) print(list("abc🦀".encode('u8'))) ```

Not exactly, Python strings aren't UTF-8 encoded. When needed (like with emoji), the entire string becomes UCS-32 (the PyUnicodeObject mentioned above).

raven ridge Jan 21, 2023, 10:22 PM

#

they do cache a UTF-8 representation the first time it's requested, though

rose schooner Jan 21, 2023, 10:52 PM

#

warm breach !e wouldn't work since the 🦀 needs more space than `abc!` had allocated ```py p...

‫why does python have to add like 24 bytes

#

ignoring padding, that is

raven ridge Jan 21, 2023, 11:11 PM

#

rose schooner ‫why does python have to add like 24 bytes

The first can be stored in this format: https://github.com/python/cpython/blob/main/Include/cpython/unicodeobject.h#L55-L64

fallen slateBOT Jan 21, 2023, 11:11 PM

#

Include/cpython/unicodeobject.h lines 55 to 64

- compact ascii:

  * structure = PyASCIIObject
  * test: PyUnicode_IS_COMPACT_ASCII(op)
  * kind = PyUnicode_1BYTE_KIND
  * compact = 1
  * ascii = 1
  * (length is the length of the utf8)
  * (data starts just after the structure)
  * (since ASCII is decoded from UTF-8, the utf8 string are the data)```

raven ridge Jan 21, 2023, 11:11 PM

#

The second needs https://github.com/python/cpython/blob/main/Include/cpython/unicodeobject.h#L66-L76 with kind = PyUnicode_4BYTE_KIND

fallen slateBOT Jan 21, 2023, 11:11 PM

#

Include/cpython/unicodeobject.h lines 66 to 76

- compact:

  * structure = PyCompactUnicodeObject
  * test: PyUnicode_IS_COMPACT(op) && !PyUnicode_IS_ASCII(op)
  * kind = PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or
    PyUnicode_4BYTE_KIND
  * compact = 1
  * ascii = 0
  * utf8 is not shared with data
  * utf8_length = 0 if utf8 is NULL
  * (data starts just after the structure)```

warm breach Jan 22, 2023, 4:35 AM

#

rose schooner ‫why does python have to add like 24 bytes

the pure ASCII strs are a lot smaller since they just use a c_char array plus some length fields (though not that small since you still need all the info bits to represent the other subtypes)

#

https://github.com/ionite34/einspect/blob/dev/src/einspect/structs/py_unicode.py#L83

fallen slateBOT Jan 22, 2023, 4:40 AM

#

src/einspect/structs/py_unicode.py line 83

or addressof(cast(obj.wstr, c_void_p)) != addressof(PyUnicode_DATA(obj))```

warm breach Jan 22, 2023, 4:40 AM

#

tbh the ctypes auto cast is the most annoying thing ever

#

can't compare 2 pointers since one of them gets transformed into bytes | None

rose schooner Jan 22, 2023, 4:56 AM

#

warm breach the pure ASCII strs are a lot smaller since they just use a c_char array plus so...

string offset for ASCII/UCS-1 is 48 but anything other than that it's 72 (64-bit systems)

warm breach Jan 22, 2023, 4:57 AM

#

rose schooner string offset for ASCII/UCS-1 is 48 but anything other than that it's 72 (64-bit...

actually do you know how to get a PyUnicodeObject from a literal

#

I've only been able to get ASCII and compact

rose schooner Jan 22, 2023, 4:57 AM

#

warm breach actually do you know how to get a PyUnicodeObject from a literal

wdym?

warm breach Jan 22, 2023, 4:58 AM

#

https://github.com/python/cpython/blob/main/Include/cpython/unicodeobject.h#L78-L87

fallen slateBOT Jan 22, 2023, 4:58 AM

#

Include/cpython/unicodeobject.h lines 78 to 87

- legacy string:

  * structure = PyUnicodeObject structure
  * test: !PyUnicode_IS_COMPACT(op)
  * kind = PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or
    PyUnicode_4BYTE_KIND
  * compact = 0
  * data.any is not NULL
  * utf8 is shared and utf8_length = length with data.any if ascii = 1
  * utf8_length = 0 if utf8 is NULL```

warm breach Jan 22, 2023, 4:59 AM

#

https://github.com/python/cpython/blob/main/Include/cpython/unicodeobject.h#L153-L161

fallen slateBOT Jan 22, 2023, 4:59 AM

#

Include/cpython/unicodeobject.h lines 153 to 161

typedef struct {
    PyCompactUnicodeObject _base;
    union {
        void *any;
        Py_UCS1 *latin1;
        Py_UCS2 *ucs2;
        Py_UCS4 *ucs4;
    } data;                     /* Canonical, smallest-form Unicode buffer */
} PyUnicodeObject;```

rose schooner Jan 22, 2023, 5:00 AM

#

i don't understand the question

#

like isn't it just .from_object() or something

#

struct.from_address(id(obj))

warm breach Jan 22, 2023, 7:26 AM

#

rose schooner i don't understand the question

like what kind of strings are actually using the PyUnicodeObject struct instead of the 2 other ones

warm breach Jan 22, 2023, 7:26 AM

#

fallen slate `Include/cpython/unicodeobject.h` lines 78 to 87 ```h - legacy string: * stru...

or I guess what they call "legacy string" here

#

essentially if the string object has compact = 1, it's PyCompactUnicodeObject, if ascii = 1, it's PyASCIIObject, if both are 0, it's the biggest subtype, PyUnicodeObject

#

but I haven't been able to get one naturally in python

rose schooner Jan 22, 2023, 7:28 AM

#

warm breach essentially if the string object has `compact = 1`, it's `PyCompactUnicodeObject...

i don't actually know

rose schooner Jan 22, 2023, 7:28 AM

#

warm breach but I haven't been able to get one naturally in python

but the 72 offset of strings when they're UCS-2 or UCS-4 strongly indicates a use of PyUnicodeObject's data field

rose schooner Jan 22, 2023, 7:33 AM

#

fallen slate `Include/cpython/unicodeobject.h` lines 153 to 161 ```h typedef struct { PyC...

@warm breach why does this happen ```pycon

view("\U0010ffffabc")._pyobject.data.any
416612941823
``` why isn't it a void * pointer or something

#

well it actually is but why does it appear like this

warm breach Jan 22, 2023, 7:38 AM

#

firHmm

rose schooner Jan 22, 2023, 7:38 AM

#

and also um this ```pycon

a = "\U0010ffff\0"
from ctypes import c_ulong
c_ulong.from_address(id(a)+72)
c_ulong(1114111)
view(a)._pyobject.data.ucs4.contents
Windows fatal exception: access violation

Current thread 0x00000d4c (most recent call first):
File "<stdin>", line 1 in <module>

#

oh i see why now ```pycon

a = "\U0010ffff\0"
from ctypes import c_ulong, POINTER
POINTER(c_ulong).from_address(id(a)+72).contents
Windows fatal exception: access violation

Current thread 0x000008b8 (most recent call first):
File "<stdin>", line 1 in <module>

warm breach Jan 22, 2023, 7:39 AM

#

rose schooner and also um this ```pycon >>> a = "\U0010ffff\0" >>> from ctypes import c_ulong ...

yeah it doesn't have a data

#

that was before I implemented the 3 separate subtypes for str

#

from einspect import view

a = "\U0010ffff\0"
print(view(a))
>> StrView(<PyCompactUnicodeObject at 0x100f3af10>)

#

that thing is still a PyCompactUnicodeObject, so no data field

rose schooner Jan 22, 2023, 7:40 AM

#

warm breach yeah it doesn't have a data

wdym

warm breach Jan 22, 2023, 7:40 AM

#

only PyUnicodeObject has a data field

#

PyCompactUnicodeObject is PyASCIIObject with 3 more fields

utf8_length, utf8, wstr_length

#

PyUnicodeObject is PyCompactUnicodeObject with 1 more field, data

rose schooner Jan 22, 2023, 7:42 AM

#

warm breach only `PyUnicodeObject` has a data field

yes

#

and view(a)._pyobject is a PyUnicodeObject

warm breach Jan 22, 2023, 7:43 AM

#

well that wasn't quite correct

#

turns out actually the strings dynamically may be one of 3 subtypes

#

I haven't released the version with the 3 different types yet

#

currently there's only the PyUnicodeObject struct

#

so if you access data when it actually should be a compact or ascii it accesses out of bound memory

rose schooner Jan 22, 2023, 7:44 AM

#

rose schooner and also um this ```pycon >>> a = "\U0010ffff\0" >>> from ctypes import c_ulong ...

how does manually c_ulong.from_address()'ing the thing get the data though

warm breach Jan 22, 2023, 7:47 AM

#

rose schooner how does manually `c_ulong.from_address()`'ing the thing get the data though

cuz it accessed it as c_ulong and not POINTER(c_ulong)

#

I'm not sure why accessing element 0 of the pointer is different from c_ulong but...

#

!e

from ctypes import c_ulong, POINTER

a = "\U0010ffff\0"

print(POINTER(c_ulong).from_address(id(a)+72).contents)

fallen slateBOT Jan 22, 2023, 7:47 AM

#

@warm breach :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

[No output]

rose schooner Jan 22, 2023, 7:47 AM

#

warm breach or I guess what they call "legacy string" here

i think i know how

#

string subtypes

warm breach Jan 22, 2023, 7:48 AM

#

firT

rose schooner Jan 22, 2023, 7:48 AM

#

https://github.com/python/cpython/blob/main/Objects/unicodeobject.c#L1203 in PyUnicode_New

fallen slateBOT Jan 22, 2023, 7:48 AM

#

Objects/unicodeobject.c line 1203

_PyUnicode_STATE(unicode).compact = 1;```

rose schooner Jan 22, 2023, 7:48 AM

#

https://github.com/python/cpython/blob/main/Objects/unicodeobject.c#L14380 in unicode_subtype_new

fallen slateBOT Jan 22, 2023, 7:48 AM

#

Objects/unicodeobject.c line 14380

_PyUnicode_STATE(self).compact = 0;```

rose schooner Jan 22, 2023, 7:50 AM

#

!e ```py
from einspect.views import StrView
class Str(str):...

a=Str("\U0010ffff\0")
print(StrView(a)._pyobject.data.ucs4.contents)

fallen slateBOT Jan 22, 2023, 7:51 AM

#

@rose schooner :white_check_mark: Your 3.11 eval job has completed with return code 0.

c_uint(1114111)

rose schooner Jan 22, 2023, 7:51 AM

#

@warm breach

#

it works

warm breach Jan 22, 2023, 7:51 AM

#

hm firHmm

#

apparently my type algorithm is wrong then

#

I check ascii = 1 before compact

#

that thing is ascii = 1 but compact = 0 which is weird

#

!e

from einspect.structs import PyUnicodeObject

class Foo(str):
    ...

f = Foo()
v = PyUnicodeObject.from_object(f)
print(v.ascii)
print(v.compact)

fallen slateBOT Jan 22, 2023, 7:52 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 1
002 | 0

rose schooner Jan 22, 2023, 7:52 AM

#

warm breach I check `ascii = 1` before compact

here's an example check from the source https://github.com/python/cpython/blob/main/Objects/unicodeobject.c#L1114-L1122

fallen slateBOT Jan 22, 2023, 7:52 AM

#

Objects/unicodeobject.c lines 1114 to 1122

if (ascii->state.compact)
{
    if (ascii->state.ascii)
        data = (ascii + 1);
    else
        data = (compact + 1);
}
else
    data = unicode->data.any;```

rose schooner Jan 22, 2023, 7:53 AM

#

so yeah you're supposed to check for compact first

warm breach Jan 22, 2023, 7:53 AM

#

so...

#

wtf is ascii = 1 doing there

#

bug?

rose schooner Jan 22, 2023, 7:53 AM

#

where do you check it

warm breach Jan 22, 2023, 7:53 AM

#

you can't have an ascii string that's not compact, that's not a thing

warm breach Jan 22, 2023, 7:54 AM

#

rose schooner where do you check it

https://github.com/python/cpython/blob/3.11/Include/cpython/unicodeobject.h#L195

fallen slateBOT Jan 22, 2023, 7:54 AM

#

Include/cpython/unicodeobject.h line 195

unsigned int ascii:1;```

rose schooner Jan 22, 2023, 7:54 AM

#

warm breach you can't have an ascii string that's not compact, that's not a thing

i think it's still useful for some optimizations

warm breach Jan 22, 2023, 7:54 AM

#

ah yeah

#

it still gets unset if its not ascii

#

!e

from einspect.structs import PyUnicodeObject

class Foo(str):
    ...

f = Foo("🤔🤔🤔")
v = PyUnicodeObject.from_object(f)
print(v.ascii)

fallen slateBOT Jan 22, 2023, 7:55 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

warm breach Jan 22, 2023, 7:55 AM

#

so I guess it just doesn't matter for determining struct type

rose schooner Jan 22, 2023, 7:56 AM

#

warm breach so I guess it just doesn't matter for determining struct type

well it still does if we're talking about type(x) is str objects

warm breach Jan 22, 2023, 7:56 AM

#

rose schooner here's an example check from the source https://github.com/python/cpython/blob/m...

guess this looks fine now? firHmm

def _narrow_type(self) -> None:
    # Narrow to a more specific unicode type if possible
    if self._pyobject.compact:
        if self._pyobject.ascii:
            self._pyobject = self._pyobject.astype(PyASCIIObject)
        else:
            self._pyobject = self._pyobject.astype(PyCompactUnicodeObject)
    else:
        self._pyobject = self._pyobject.astype(PyUnicodeObject)

rose schooner Jan 22, 2023, 7:57 AM

#

warm breach guess this looks fine now? <:firHmm:974893065288421416> ```py def _narrow_type(...

yep

warm breach Jan 22, 2023, 7:59 AM

#

it's kind of nice now with how many types I've defined, like this function can be almost copied word for word from C source
https://github.com/ionite34/einspect/blob/dev/src/einspect/structs/py_unicode.py#L146-L172
https://github.com/python/cpython/blob/3.11/Objects/unicodeobject.c#L14120-L14149

warm breach Jan 22, 2023, 8:18 AM

#

@rose schooner proper unicode subtypes just released in https://github.com/ionite34/einspect/releases/tag/v0.4.9

GitHub

Release v0.4.9 · ionite34/einspect

Implement dynamic string structs to reflect CPython implementation:

PyASCIIObject
PyCompactUnicodeObject
PyUnicodeObject

Added some macros to py_unicode

These should exactly match their C equiva...

rose schooner Jan 22, 2023, 8:20 AM

#

warm breach <@310263589913100288> proper unicode subtypes just released in https://github.co...

what if view() accepted subclasses

#

instead of doing this ```pycon

f=Foo("\U0010ffffabc\uffff")
view(f)
<stdin>:1: DeprecationWarning: Using einspect.view on objects without a concrete View subclass will be deprecated. Use einspect.views.AnyView instead.
View(<PyObject at 0x1e03e1b04e0>)

warm breach Jan 22, 2023, 8:21 AM

#

rose schooner what if `view()` accepted subclasses

well firHmm

#

I suppose it could

#

I'm just not sure what happens to those stuff after the builtin

#

do non-slots custom classes have a fixed struct as well?

rose schooner Jan 22, 2023, 8:23 AM

#

so as long as the custom class does not override the builtin parent class's .__new__()/.__init__() then it's probably fine

warm breach Jan 22, 2023, 8:25 AM

#

hm

warm breach Jan 22, 2023, 8:27 AM

#

warm breach do non-slots custom classes have a fixed struct as well?

actually I guess I don't have to worry about that

#

!e since python already disallows it

class Foo(int):
    __slots__ = ("_some_attr",)

fallen slateBOT Jan 22, 2023, 8:27 AM

#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 1, in <module>
003 | TypeError: nonempty __slots__ not supported for subtype of 'int'

warm breach Jan 22, 2023, 8:30 AM

#

rose schooner so as long as the custom class does not override the builtin parent class's `.__...

!e

class Foo(int):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self._some_attr = 1

print(Foo(123).__sizeof__())
print((123).__sizeof__())

fallen slateBOT Jan 22, 2023, 8:30 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 28
002 | 28

warm breach Jan 22, 2023, 8:30 AM

#

how are these both 28 though?

#

doesn't Foo need a __dict__ at least?

#

where does it store _some_attr

rose schooner Jan 22, 2023, 8:31 AM

#

warm breach doesn't `Foo` need a `__dict__` at least?

it does

#

but it's in like negative offsets from the actual object

#

or actually it's in the type

warm breach Jan 22, 2023, 8:34 AM

#

rose schooner or actually it's in the type

but that's the type dict no?

#

the instance still needs its own dict

rose schooner Jan 22, 2023, 8:35 AM

#

oh wait actually yeah

warm breach Jan 22, 2023, 8:35 AM

#

!e

class Foo(int):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self._some_attr = 1

f = Foo(123)
print(id(f))
print(id(f.__dict__))

fallen slateBOT Jan 22, 2023, 8:35 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 140502620308352
002 | 140502622763328

warm breach Jan 22, 2023, 8:35 AM

#

this is showing the actual address of the dict and not the pointer I guess

warm breach Jan 22, 2023, 8:46 AM

#

rose schooner oh wait actually yeah

ah found it

#

!e

from einspect.types import ptr
from einspect.structs import PyDictObject

class Foo(int):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self._some_attr = 1

f = Foo(123)  # sizeof = 28
end = id(f) + f.__sizeof__()
print(ptr[PyDictObject].from_address(end+4).contents.into_object())

f = Foo(2 ** 50)  # sizeof = 32
end = id(f) + f.__sizeof__()
print(ptr[PyDictObject].from_address(end).contents.into_object())

fallen slateBOT Jan 22, 2023, 8:46 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | {'_some_attr': 1}
002 | {'_some_attr': 1}

warm breach Jan 22, 2023, 8:47 AM

#

apparently it's right after the original struct, + alignment

#

not sure where this is documented though

rose schooner Jan 22, 2023, 8:59 AM

#

!e ```py
from ctypes import py_object

ALIGNMENT = tuple.itemsize * 2

def pad_int(x):
return -x//ALIGNMENT * -ALIGNMENT

class Foo(int):
def init(self, *args, **kwargs):
super().init()
self._some_attr = 1

f = Foo(123)
print(py_object.from_address(id(f) + pad_int(f.sizeof())).value)

fallen slateBOT Jan 22, 2023, 9:00 AM

#

@rose schooner :white_check_mark: Your 3.11 eval job has completed with return code 0.

{'_some_attr': 1}

rose schooner Jan 22, 2023, 9:00 AM

#

ok

near ocean Jan 22, 2023, 5:47 PM

#

what does this do?

pliant tusk Jan 22, 2023, 5:57 PM

#

warm breach tbh the ctypes auto cast is the most annoying thing ever

ctypes subtypes don't auto unwrap if that would fix your issue

warm breach Jan 22, 2023, 5:58 PM

#

hmm

warm breach Jan 22, 2023, 6:00 PM

#

pliant tusk ctypes subtypes don't auto unwrap if that would fix your issue

btw do you know how to get the instance dict of subtypes

#

!e

from einspect.structs import PyDictObject
from einspect.types import ptr

class Foo(str):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.x = 50

print(Foo.__dictoffset__)
f = Foo("abc")
print(f.__sizeof__())
dict_ptr = ptr[PyDictObject].from_address(id(f)+f.__sizeof__()+112)
print(dict_ptr.contents.into_object())

fallen slateBOT Jan 22, 2023, 6:00 PM

#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | -112
002 | 84
003 | Traceback (most recent call last):
004 |   File "<string>", line 13, in <module>
005 | ValueError: NULL pointer access

warm breach Jan 22, 2023, 6:00 PM

#

I'm trying to use __dictoffset__ but it doesn't work here somehow

warm breach Jan 22, 2023, 6:00 PM

#

warm breach !e ```py from einspect.types import ptr from einspect.structs import PyDictObjec...

for int subtypes it seems to be just after the struct

pliant tusk Jan 22, 2023, 6:01 PM

#

Foo.__itemsize__ * abs(view(f).size) afaik

warm breach Jan 22, 2023, 6:01 PM

#

what is itemsize even firHmm

pliant tusk Jan 22, 2023, 6:01 PM

#

its an attribute that all PyVarObject classes have *all have it, it is only non-zero on PyVarObjects

#

tuple.__itemsize__ is sizeof(c_void_p) for example

warm breach Jan 22, 2023, 6:02 PM

#

though I think for some reason str isn't even a PyVarObject 🥴

pliant tusk Jan 22, 2023, 6:02 PM

#

and you need abs because some types fiddle with the sign of ob_size

warm breach Jan 22, 2023, 6:05 PM

#

warm breach !e ```py from einspect.structs import PyDictObject from einspect.types import pt...

this thing has __itemsize__ = 0 though firHmm

pliant tusk Jan 22, 2023, 6:05 PM

#

warm breach though I think for some reason str isn't even a PyVarObject 🥴

this should work for all subtypes to get the location of the instance dict py def get_inst_dict_offset(i): return type(i).__basicsize__ + (type(i).__itemsize__ * view(i).size)

#

(but note that it is created lazily as of 3.11 @warm breach )

warm breach Jan 22, 2023, 6:07 PM

#

so it can be null?

pliant tusk Jan 22, 2023, 6:07 PM

#

lemme check

warm breach Jan 22, 2023, 6:07 PM

#

that's fine but I'm just trying to get the pointer which doesn't seem to work

pliant tusk Jan 22, 2023, 6:09 PM

#

warm breach that's fine but I'm just trying to get the pointer which doesn't seem to work

ah that function won't work because view doesnt accept subclasses

warm breach Jan 22, 2023, 6:09 PM

#

!e

from einspect.structs import PyDictObject
from einspect.types import ptr

class Foo(str):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.x = 50

f = Foo("abc")
print(f.__dict__)

p = ptr[PyDictObject].from_address(id(f) + Foo.__basicsize__)
print(p.contents.into_object())

fallen slateBOT Jan 22, 2023, 6:09 PM

#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | {'x': 50}
002 | Traceback (most recent call last):
003 |   File "<string>", line 13, in <module>
004 | ValueError: NULL pointer access

warm breach Jan 22, 2023, 6:10 PM

#

pliant tusk ah that function won't work because `view` doesnt accept subclasses

well I'm trying to implement that 😔

#

so firstly finding the additional dict pointer subtypes have

pliant tusk Jan 22, 2023, 6:10 PM

#

fallen slate <@233059161401720832> :x: Your 3.11 eval job has completed with return code 1. ...

weird

warm breach Jan 22, 2023, 6:11 PM

#

!e is this supposed to be 0

class Foo(str):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.x = 50

print(Foo.__itemsize__)

fallen slateBOT Jan 22, 2023, 6:11 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

pliant tusk Jan 22, 2023, 6:11 PM

#

!e ```py
from einspect.structs import PyDictObject
from einspect.types import ptr

class Foo(str):
def init(self, *args, **kwargs):
super().init()
self.x = 50

f = Foo("abc")
print(f.dict)

p = ptr[PyDictObject].from_address(id(f) + Foo.dictoffset)
print(p.contents.into_object())```

fallen slateBOT Jan 22, 2023, 6:11 PM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | {'x': 50}
002 | 0

warm breach Jan 22, 2023, 6:11 PM

#

o.O

pliant tusk Jan 22, 2023, 6:11 PM

#

warm breach !e is this supposed to be 0 ```py class Foo(str): def __init__(self, *args, ...

yes, str isnt a PyVarObject, it does not have a variably sized struct

warm breach Jan 22, 2023, 6:12 PM

#

I read somewhere negative dictoffset meant from end of struct somehow fml

#

been lied to 😔

pliant tusk Jan 22, 2023, 6:12 PM

#

i don't know why it is 0 tho

warm breach Jan 22, 2023, 6:13 PM

#

pliant tusk !e ```py from einspect.structs import PyDictObject from einspect.types import pt...

!e

from einspect.structs import PyDictObject
from einspect.types import ptr

class Foo(int):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.x = 50

f = Foo(123)
print(f.__dict__)
print(Foo.__dictoffset__)

p = ptr[PyDictObject].from_address(id(f) + Foo.__dictoffset__)
print(p.contents.into_object())

fallen slateBOT Jan 22, 2023, 6:13 PM

#

@warm breach :x: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

001 | {'x': 50}
002 | -8

warm breach Jan 22, 2023, 6:13 PM

#

also this doesn't work for int subclasses somehow 😔

#

also how is that offset -8 anyways, isn't that where the gc header is??

pliant tusk Jan 22, 2023, 6:15 PM

#

there is more stuff in the header now

#

it changed in 3.11

warm breach Jan 22, 2023, 6:16 PM

#

!e

from einspect.structs import PyDictObject
from einspect.types import ptr

class Foo(list):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.x = 50

f = Foo((1, 2))
print(f.__dict__)
print(Foo.__dictoffset__)

p = ptr[PyDictObject].from_address(id(f) + Foo.__dictoffset__)
print(p.contents.into_object())

fallen slateBOT Jan 22, 2023, 6:16 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | {'x': 50}
002 | -72
003 | type_

warm breach Jan 22, 2023, 6:16 PM

#

list also fails

#

tfw __dictoffset__ isn't actually dict offset 😩

#

!e

from einspect.structs import PyDictObject
from einspect.types import ptr

class Foo(list):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.x = 50

f = Foo((1, 2))
print(f.__dict__)
print(Foo.__dictoffset__)

p = ptr[PyDictObject].from_address(id(f) + f.__sizeof__() + Foo.__dictoffset__)
print(p.contents.into_object())

fallen slateBOT Jan 22, 2023, 6:18 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | {'x': 50}
002 | -72
003 | {'x': 50}

warm breach Jan 22, 2023, 6:18 PM

#

so list apparently works if you add offset after sizeof

#

but this isn't the case with int where you should ignore dictoffset and just find the first aligned byte after the struct

pliant tusk Jan 22, 2023, 6:24 PM

#

@warm breach look at this https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_dictoffset

Python documentation

Type Objects

Perhaps one of the most important structures of the Python object system is the structure that defines a new type: the PyTypeObject structure. Type objects can be handled using any of the PyObject_...

warm breach Jan 22, 2023, 6:27 PM

#

pliant tusk <@233059161401720832> look at this https://docs.python.org/3/c-api/typeobj.html#...

If the value is less than zero, it specifies the offset from the end of the instance structure.

warm breach Jan 22, 2023, 6:27 PM

#

pliant tusk !e ```py from einspect.structs import PyDictObject from einspect.types import pt...

how did this work then firHmm

#

you did id(f) + Foo.__dictoffset__

pliant tusk Jan 22, 2023, 6:29 PM

#

i think that one just got lucky and found a pointer to 0

warm breach Jan 22, 2023, 6:32 PM

#

pliant tusk i think that one just got lucky and found a pointer to `0`

!e aha I got it

from einspect.structs import PyDictObject
from einspect.types import ptr

def align(size: int, alignment: int) -> int:
    return (size + alignment - 1) & ~(alignment - 1)

class Foo(str):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.x = 50

f = Foo("abc")

addr = id(f) + align(f.__sizeof__(), 8) + Foo.__dictoffset__
p = ptr[PyDictObject].from_address(addr)
print(p.contents.into_object())

fallen slateBOT Jan 22, 2023, 6:32 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

{'x': 50}

warm breach Jan 22, 2023, 6:32 PM

#

seems to be sizeof aligned to 8 bytes then dictoffset for negative

pliant tusk Jan 22, 2023, 6:38 PM

#

i would calculate sizeof manually using basicsize itemsize and ob_size, because some __sizeof__s are not exactly correct

#

@warm breach but you'll need a way to check if a given type has the ob_size field

warm breach Jan 22, 2023, 6:52 PM

#

pliant tusk <@233059161401720832> but you'll need a way to check if a given type has the `ob...

hm why

#

isn't calculating struct size and getting __dictoffset__ enough

pliant tusk Jan 22, 2023, 7:02 PM

#

__sizeof__ isn't always a bare struct size, sometimes it includes nested fields

warm breach Jan 22, 2023, 7:11 PM

#

pliant tusk `__sizeof__` isn't always a bare struct size, sometimes it includes nested field...

how this look firHmm

def instance_dict(self) -> ptr[PyObject[dict, Any, Any]] | None:
    """Return the instance dict of the PyObject."""
    # Get the tp_dictoffset of the type
    offset = self.ob_type.contents.tp_dictoffset
    # If 0, the type does not have a dict
    if offset == 0:
        return None
    # For > 0, start from the address of the PyObject
    if offset > 0:
        addr = self.address + offset
    # For < 0, start after the struct
    else:
        # align size to pointer size (8)
        size = align_size(self.mem_size, ctypes.sizeof(c_void_p))
        addr = self.address + size + offset
    # Return the pointer
    return POINTER(PyObject).from_address(addr)

pliant tusk Jan 22, 2023, 7:26 PM

#

afaik that should work

pliant tusk Jan 22, 2023, 7:27 PM

#

warm breach how this look <:firHmm:974893065288421416> ```py def instance_dict(self) -> ptr...

you an stress test it with gc.get_objects and just see if it crashes on any of them

warm breach Jan 22, 2023, 7:40 PM

#

pliant tusk you an stress test it with `gc.get_objects` and just see if it crashes on any of...

import gc
from einspect.structs import PyObject

for obj in gc.get_objects():
    st = PyObject.from_object(obj)
    d = st.instance_dict()
    if d is not None:
        print(repr(obj))
        print(obj.__dict__)
        pydict = d.contents.into_object()
        print(pydict)

#

seems fine mostly but there are a few that still seems to have a null pointer despite a non-zero tp_dictoffset and after dict access

#

<_frozen_importlib_external.SourceFileLoader object at 0x103626a40>
{'name': 'einspect.views.view_tuple', 'path': '/Users/ionite/repos/Python/einspect/src/einspect/views/view_tuple.py'}
Traceback (most recent call last):
  File "scratch.py", line 10, in <module>
    pydict = d.contents.into_object()
             ^^^^^^^^^^
ValueError: NULL pointer access

#

some _frozen_importlib_external.SourceFileLoader thing apparently pithink

pliant tusk Jan 22, 2023, 7:48 PM

#

Odd

copper ice Jan 22, 2023, 8:28 PM

#

#

it may be stupid

#

but why is it showing none?

#

please guys thinking for a long time but

#

cant understand

#

i wanna print pink to yellow in reverse order

#

@warm breach pls help

#

srsly cant figure out whats wrong in such an easy code

warm breach Jan 22, 2023, 8:30 PM

#

copper ice but why is it showing none?

list.reverse() is in-place, which means it reverses the list and returns None. You need to use the list again separately

copper ice Jan 22, 2023, 8:31 PM

#

warm breach list.reverse() is in-place, which means it reverses the list and returns None. Y...

wdym

#

cant it be in the print function?

warm breach Jan 22, 2023, 8:31 PM

#

ls.reverse()
print(ls)

copper ice Jan 22, 2023, 8:32 PM

#

what is wrong in this

copper ice Jan 22, 2023, 8:33 PM

#

warm breach ```py ls.reverse() print(ls) ```

warm breach Jan 22, 2023, 8:33 PM

#

copper ice

reverse() returns None after mutating a list

#

so c is None

copper ice Jan 22, 2023, 8:34 PM

#

warm breach reverse() returns None after mutating a list

but then how do i reverse 1:5

warm breach Jan 22, 2023, 8:34 PM

#

assign it to something first

#

or just use the reversed() function instead, which isn't in place

#

print(reversed(ls[1:2]))

copper ice Jan 22, 2023, 8:35 PM

#

coming now

#

thanks man

feral cedar Jan 22, 2023, 8:35 PM

#

or use negative step

copper ice Jan 22, 2023, 8:38 PM

#

feral cedar or use negative step

yea thanks thats working too

spark magnet Jan 23, 2023, 1:45 AM

#

anyone know anything about Cython internals? Cython + coverage.py has long been an uneasy alliance, and I would love to get it smoothed out.

raven ridge Jan 23, 2023, 1:53 AM

#

I know a bit...

#

Cython generates normal C extension modules, so it's not particularly special. The only weird thing it does that affects profiling/tracing is this stuff: https://cython.readthedocs.io/en/stable/src/tutorial/profiling_tutorial.html - if you ask it to, it generates fake frames for Cython functions and calls the installed tracing or profiling function explicitly when entering and exiting those functions (passing those fake frames along)

#

we eventually gave up on supporting Cython functions built with profiling support in Memray. Those fake frames caused too much trouble... though I'm happy coverage.py doesn't do the same, since I do like to see coverage stats for my Cython code 😆

raven ridge Jan 23, 2023, 2:02 AM

#

spark magnet anyone know anything about Cython internals? Cython + coverage.py has long been ...

what sort of info are you looking for, in particular?

spark magnet Jan 23, 2023, 2:15 AM

#

raven ridge what sort of info are you looking for, in particular?

thanks. this change https://github.com/nedbat/coveragepy/pull/1347 made this problem: https://github.com/nedbat/coveragepy/issues/1538, and i don't know anything about why we needed the change, or whether to change it back, or what.

#

i'm not in a place to look at the code right now, i was hoping to find someone who could take a look over the next few weeks or something.

raven ridge Jan 23, 2023, 2:22 AM

#

well, that's an interesting one.

spark magnet Jan 23, 2023, 2:24 AM

#

unfortunately, i am blessed with many interesting ones 🙂

raven ridge Jan 23, 2023, 2:27 AM

#

what are the values of that dict? The PR title was "Map also empty dictionaries to file" - are the values always dicts? Did this maybe affect other falsy things?

#

is it possible that the performance difference is explained entirely by extra data being (unnecessarily) processed?

spark magnet Jan 23, 2023, 2:29 AM

#

raven ridge is it possible that the performance difference is explained entirely by extra da...

but it wasn't extra data: we were skipping empty values, and now we aren't skipping empty values. How can that matter!?

raven ridge Jan 23, 2023, 2:29 AM

#

maybe there's extra work being done now, that used to be skipped

spark magnet Jan 23, 2023, 2:30 AM

#

tbh, the caching done by cached_mapped_values got changed recently, and I borked it (maxsize=0 is different than maxsize=None!), but maybe there's still something wrong with it.

#

the "slowdown" issue mentions "We can notice a lot more SQL queries (both selects and inserts attempts with 0 rows) in 6.4.3, that we don't have on 6.4.2."

#

ideally, someone who knows a bit about Cython could go back to https://github.com/cython/cython/issues/3515 and see if there's a better fix.

raven ridge Jan 23, 2023, 2:35 AM

#

the Cython coverage plugin, as I understand it, is basically only concerned with 1 thing: mapping coverage reports from the generated .c files to the .pyx/.pyi files that the .c file was transpiled from

spark magnet Jan 23, 2023, 2:35 AM

#

raven ridge the Cython coverage plugin, as I understand it, is basically only concerned with...

right, that's the classic use-case for coverage plugins (originally developed for the django template plugin)

raven ridge Jan 23, 2023, 2:37 AM

#

I know a fair bit about Cython, but much less about the Cython coverage plugin in particular - I've only had to dig into it once, and I had a pretty poor understanding at the time... There were some recent fixes to it that are only applied to the current development version, and not to the stable version...

spark magnet Jan 23, 2023, 2:38 AM

#

i just commented on the original Cython issue. just getting clear reproduction instructions would help.

raven ridge Jan 23, 2023, 2:39 AM

#

https://github.com/cython/cython/pull/3831 went into the 3.x branch only...

spark magnet Jan 23, 2023, 2:41 AM

#

the rabbit hole goes deeper... 😦 Thanks for talking it through with me at least... I have to bounce.

swift imp Jan 23, 2023, 11:13 AM

#

raven ridge we eventually gave up on supporting Cython functions built with profiling suppor...

You work on memray?

quick snow Jan 23, 2023, 2:05 PM

#

Have y'all seen this? Seems like a great bunch of ideas. https://discuss.python.org/t/announce-pybi-and-posy/23021/26

Discussions on Python.org

[announce] Pybi and Posy

Great initiative! Looks very promising @njs I really like that you’ve taken a holistic view of a larger scope of the problem, but not too large to make it impractical to solve. Fwiw, I think it’s probably best to keep the discussion focussed on pybi and posy. The project is new, it’s explicitly stated in the OP that external pythons (such as a ...

thorn oasis Jan 23, 2023, 2:17 PM

#

'make' is not recognized as an internal or external command,
operable program or batch file.
can anyone help me to solve this error please

raven ridge Jan 23, 2023, 4:05 PM

#

swift imp You work on memray?

Yep, I'm one of a team of two developing it.

unkempt rock Jan 23, 2023, 5:16 PM

#

raven ridge Yep, I'm one of a team of two developing it.

Legendary status

warm breach Jan 23, 2023, 6:26 PM

#

raven ridge Yep, I'm one of a team of two developing it.

speaking of, does memray have some way to inspect the allocated memory block of an object

raven ridge Jan 23, 2023, 6:27 PM

#

No, it really doesn't even know what object owns a memory block, or even whether any object owns it. It works at a lower level than that.

#

"all" that it knows is the full call stack at which every block of memory was allocated, and when (and whether) that memory block was deallocated. (OK, and a few other, less important things. The total RSS is tracked over time, and we know the name of each thread, if one was set...)

warm breach Jan 23, 2023, 6:29 PM

#

was trying to find some way to get that info from python, didn't seem to be a stable c API for it

raven ridge Jan 23, 2023, 6:29 PM

#

warm breach was trying to find some way to get that info from python, didn't seem to be a st...

there absolutely will never be. The stable C API is designed to abstract away implementation details, and what memory is held by an arbitrary object is basically the definition of an implementation detail.

warm breach Jan 23, 2023, 6:34 PM

#

currently I'm calculating the struct size aligned to 16, plus a GC header if supported by the type, and the position of the instance dict pointer

#

not sure if that covers everything

feral island Jan 23, 2023, 6:35 PM

#

the extra stuff for variable-sized objects?

raven ridge Jan 23, 2023, 6:46 PM

#

why are you calculating this size? What do you do with it?

pulsar ridge Jan 23, 2023, 6:46 PM

#

✨ I am trying to Creating and storing Google credentials in token.pickle from auth code 😔 but not able to create token.pickle

warm breach Jan 23, 2023, 6:58 PM

#

raven ridge why are you calculating this size? What do you do with it?

to like see if copying an object into other object will be into owned memory 🥴

raven ridge Jan 23, 2023, 6:59 PM

#

why bother?

#

just because you're copying a number of bytes <= to the size of the structure doesn't mean that things are left in a sane state. Hell, you can't even copy one list into another list. Adding one safety check to a fundamentally unsafe operation doesn't make much sense to me.

warm breach Jan 23, 2023, 7:01 PM

#

!e

from einspect import view

v = view(2**60)

with v.unsafe():
    v <<= (1, 2, 3)
    
print(2**60)

fallen slateBOT Jan 23, 2023, 7:01 PM

#

@warm breach :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

[No output]

raven ridge Jan 23, 2023, 7:02 PM

#

sure, you can segfault if you don't check this. You can also segfault if you do check this. So...

warm breach Jan 23, 2023, 7:02 PM

#

raven ridge just because you're copying a number of bytes <= to the size of the structure do...

should copying a list into a list always be fine? since the struct is always 40 bytes

#

the malloced array will just stay where it was

raven ridge Jan 23, 2023, 7:02 PM

#

no, because you screw up the reference counts.

warm breach Jan 23, 2023, 7:03 PM

#

assuming we call clear() on the list about to get overwritten 🥴

raven ridge Jan 23, 2023, 7:03 PM

#

and now there's two different objects referring to the same malloc'ed array, and the first one of them to be destroyed will free it, and any later attempt to access it by the second one will be a use-after-free bug.

warm breach Jan 23, 2023, 7:09 PM

#

raven ridge and now there's two different objects referring to the same malloc'ed array, and...

!e I IncRef the source to make it stay alive here, not sure why that solves the freeing segfault really

from einspect import view

x = [1, 2]

with view(x).unsafe() as v:
    y = [*range(18)]
    v <<= y

print(x)
del y
print(x)

fallen slateBOT Jan 23, 2023, 7:09 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
002 | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]

feral island Jan 23, 2023, 7:10 PM

#

"not sure why that solves the segfault" usually means you have either a memory leak or some other memory safety bug now

warm breach Jan 23, 2023, 7:10 PM

#

will objects remaining with refcounts still get freed during interpreter shutdown?

raven ridge Jan 23, 2023, 7:11 PM

#

best effort, yeah.

warm breach Jan 23, 2023, 7:11 PM

#

shouldn't the double free happen there then pithink

pliant tusk Jan 23, 2023, 7:11 PM

#

you would probably have memory leaks if this code was being called from an embeded interpreter

raven ridge Jan 23, 2023, 7:11 PM

#

If you're on Linux, try running that with export MALLOC_CHECK_=3 PYTHONMALLOC=malloc and see if you get a crash.

pliant tusk Jan 23, 2023, 7:12 PM

#

raven ridge If you're on Linux, try running that with `export MALLOC_CHECK_=3 PYTHONMALLOC=m...

do you know if macOS supports something similar?

warm breach Jan 23, 2023, 7:13 PM

#

would removing the list object from the GC linked list do anything

raven ridge Jan 23, 2023, 7:13 PM

#

no clue. But you can get something sort of similar from pymalloc itself, if you run with python -Xdev

warm breach Jan 23, 2023, 7:13 PM

#

I guess it still gets freed on 0 ref-count? And that just disables cyclic GC?

raven ridge Jan 23, 2023, 7:13 PM

#

at best it would leak memory. at worst it'd crash.

#

honestly, "corrupts memory in a manner that may eventually lead to a crash" is just about the worst sort of memory bug.

#

both because that's usually a security vulnerability, and because it can be quite annoying to track down if the crash location is far removed from where the memory corruption occurred.

warm breach Jan 23, 2023, 7:28 PM

#

raven ridge If you're on Linux, try running that with `export MALLOC_CHECK_=3 PYTHONMALLOC=m...

seems fine with those pithink

#

maybe I should valgrind it

raven ridge Jan 23, 2023, 7:29 PM

#

!e what's going on here: ```py
from einspect import view

x = [1, 2]

with view(x).unsafe() as v:
y = [*range(18)]
v <<= y

print(x)
y[0] = 42
print(x)

fallen slateBOT Jan 23, 2023, 7:29 PM

#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
002 | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]

raven ridge Jan 23, 2023, 7:29 PM

#

why isn't the first element changed?

warm breach Jan 23, 2023, 7:30 PM

#

oh uh I make a deep copy of the source object and IncRef that 🥴

#

which I should probably stop doing

raven ridge Jan 23, 2023, 7:31 PM

#

then I'm guessing you're leaking that copy.

#

and if you stopped doing that, then I'm guessing it would be pretty easy to get this to segv.

warm breach Jan 23, 2023, 7:32 PM

#

mainly the deepcopy was incref all the members I think

#

since otherwise when y gets dropped and calls clear x might now have members pointing to non existent objects

raven ridge Jan 23, 2023, 7:33 PM

#

the array isn't a member with a reference count - you must actually be creating a copy of that array, or the first element would have changed to 42

#

https://github.com/ionite34/einspect/blob/d99bda0e93d32a1fe0cadad0aca7af6dfc215035/src/einspect/views/view_base.py#L263

fallen slateBOT Jan 23, 2023, 7:34 PM

#

src/einspect/views/view_base.py line 263

other = deepcopy(other)```

raven ridge Jan 23, 2023, 7:34 PM

#

and the fact that you're leaking that copy is the only reason this doesn't wind up with a double free.

#

you've got two lists that each think they're responsible for freeing that array. The only way that can possibly not result in a double free is if one of them never gets destroyed, and so never tries to free its array.

warm breach Jan 23, 2023, 7:37 PM

#

I guess a possibility would be to maintain a list of "shared array" list objects and override list type's tp_free to only free the array when it is the last reference pithink

raven ridge Jan 23, 2023, 7:38 PM

#

you'd wind up needing to do something like that for every type of object.

warm breach Jan 23, 2023, 7:38 PM

#

eh yeah doesn't seem worth it

raven ridge Jan 23, 2023, 7:39 PM

#

memcpy'ing into the struct of some arbitrary object is fundamentally unsafe. It can't be made safe, short of special casing how it behaves for each different type of object.

warm breach Jan 23, 2023, 7:40 PM

#

!e

from einspect import view, unsafe

x = [1, 2]
y = [*range(18)]

with unsafe():
    view(y).move_to(view(x))

print(x)
y[0] = 100
print(x)

fallen slateBOT Jan 23, 2023, 7:40 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
002 | [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]

raven ridge Jan 23, 2023, 7:40 PM

#

making a copy of a Python object with memcpy breaks that object's invariants. On a case-by-case basis, you can fix up the copy to make the invariants hold - but there's no general way to do that.

warm breach Jan 23, 2023, 7:40 PM

#

so move_to doesn't make a deepcopy unlike move_from

#

wonder why this doesn't double free

raven ridge Jan 23, 2023, 7:41 PM

#

!e ```py
from einspect import view, unsafe

x = [1, 2]
y = [*range(18)]

with unsafe():
view(y).move_to(view(x))

print(x)
y.clear()
print(x)

fallen slateBOT Jan 23, 2023, 7:41 PM

#

@raven ridge :x: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]

warm breach Jan 23, 2023, 7:43 PM

#

raven ridge !e ```py from einspect import view, unsafe x = [1, 2] y = [*range(18)] with un...

I assume that's because the array was modified by clear but x still has a size of 17? pithink

raven ridge Jan 23, 2023, 7:44 PM

#

clear freed the array, x is still holding a pointer to it.

#

print(x) tries to access it after it's been feed, which explodes.

warm breach Jan 23, 2023, 7:45 PM

#

oh clear frees the array? pithink

#

does it allocate a new one when you append

raven ridge Jan 23, 2023, 7:45 PM

#

yeah

pliant tusk Jan 23, 2023, 7:46 PM

#

https://github.com/python/cpython/blob/main/Objects/listobject.c#L592-L613

raven ridge Jan 23, 2023, 7:47 PM

#

ooh man, both of those comments 😄

warm breach Jan 23, 2023, 7:50 PM

#

huh

#

so a list that gets cleared has NULL ob_item but a list that gets popped to 0 doesn't...?

#

!e

from einspect.structs import PyListObject

x = [1]
x.pop()

ls = PyListObject.from_object(x)
print(ls.ob_item[0].contents.into_object())

fallen slateBOT Jan 23, 2023, 7:52 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

warm breach Jan 23, 2023, 7:52 PM

#

I guess it just leaves the object pointer pithink

raven ridge Jan 23, 2023, 7:53 PM

#

looks like popping to 0 also frees the array. https://github.com/python/cpython/blob/7b20a0f55a16b3e2d274cc478e4d04bd8a836a9f/Objects/listobject.c#L1029-L1031

fallen slateBOT Jan 23, 2023, 7:53 PM

#

Objects/listobject.c lines 1029 to 1031

if (size_after_pop == 0) {
    Py_INCREF(v);
    status = _list_clear(self);```

warm breach Jan 23, 2023, 7:54 PM

#

raven ridge looks like popping to 0 also frees the array. https://github.com/python/cpython/...

ah interesting

#

I guess not yet in 3.11 https://github.com/python/cpython/blob/3.11/Objects/listobject.c#L1032

fallen slateBOT Jan 23, 2023, 7:54 PM

#

Objects/listobject.c line 1032

list_pop_impl(PyListObject *self, Py_ssize_t index)```

pliant tusk Jan 23, 2023, 7:55 PM

#

raven ridge looks like popping to 0 also frees the array. https://github.com/python/cpython/...

ah damn that is gonna break one of my bug prep strategies

warm breach Jan 23, 2023, 7:57 PM

#

deling the last item already frees the array in 3.11 it seems

#

!e

from einspect.structs import PyListObject

x = [1]
del x[0]

ls = PyListObject.from_object(x)
print(ls.ob_item[0])

fallen slateBOT Jan 23, 2023, 7:58 PM

#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 7, in <module>
003 | ValueError: NULL pointer access

pliant tusk Jan 23, 2023, 7:59 PM

#

!e unrelated, but I still feel stuff like this is unintuitive ```py
class MyList(list):pass

l = MyList()
n = l + MyList([1])
print(type(n))```

fallen slateBOT Jan 23, 2023, 7:59 PM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class 'list'>

warm breach Jan 23, 2023, 8:02 PM

#

I guess list.__add__ just doesn't know about your subtype at all

#

and just operates on the front PyListObject part

feral island Jan 23, 2023, 8:02 PM

#

it doesn't (and can't) know how to construct objects of an arbitrary subclass

raven ridge Jan 23, 2023, 8:03 PM

#

there was a big change to datetime to fix that for the methods of datetime.datetime

#

to get them to return subclass instances rather than base class instances, I mean

#

though in that case, there's already an inherited classmethod for creating derived class instances.

pliant tusk Jan 23, 2023, 8:06 PM

#

feral island it doesn't (and can't) know how to construct objects of an arbitrary subclass

it would be nice if there was a way for a developer to either signal that constructing subclasses are the same (could check if Py_TYPE(obj)->tp_init == list->tp_init?), or provide a method to provide the arguments to construct the object

raven ridge Jan 23, 2023, 8:07 PM

#

checking tp_init isn't enough, since there's also tp_new

pliant tusk Jan 23, 2023, 8:08 PM

#

could check both, but this kind of thing would probably be better as an opt-in instead of an opt-out

#

maybe via a metaclass that sets some type flag

#

@feral island @raven ridge would either of you happen to know in what order tp_del is called in subclasses with multiple bases?

raven ridge Jan 23, 2023, 8:16 PM

#

can you inherit from two classes with different tp_del?

warm breach Jan 23, 2023, 8:17 PM

#

!e

class UserTuple(tuple): pass

print(().__sizeof__())
print(UserTuple().__sizeof__())

class UserInt(int): pass

print((0).__sizeof__())
print(UserInt().__sizeof__())

fallen slateBOT Jan 23, 2023, 8:17 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

warm breach Jan 23, 2023, 8:17 PM

#

is this a bug

#

why does int subclass sizeof not include the instance dict but tuple does

flat gazelle Jan 23, 2023, 8:51 PM

#

pliant tusk !e unrelated, but I still feel stuff like this is unintuitive ```py class MyList...

from collections import UserList
class MyList(UserList):pass

l = MyList()
n = l + MyList([1])
print(type(n))

#

UserList and UserDict have more predictable behaviour for subclassing

pliant tusk Jan 23, 2023, 8:53 PM

#

flat gazelle ```py from collections import UserList class MyList(UserList):pass l = MyList()...

yea ik, just feels like it would be unintuitive for beginners

flat gazelle Jan 23, 2023, 8:54 PM

#

ah, I see. I wonder why it was even allowed to inherit from dict in the first place

dusk comet Jan 24, 2023, 12:12 AM

#

>>> type(l + [1])
<class '__main__.MyList'>
>>> type([1] + l)
<class '__main__.MyList'>
``` this is also cool because builtin types do not care about subclasses: ```py
>>> class X(list): ...
...
>>> type(X() + X())
<class 'list'>

grave jolt Jan 24, 2023, 7:57 AM

#

raven ridge ooh man, both of those comments 😄

Nice method name 🥴
https://github.com/python/cpython/blob/main/Objects/listobject.c#L622

fallen slateBOT Jan 24, 2023, 7:57 AM

#

Objects/listobject.c line 622

list_ass_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh, PyObject *v)```

swift imp Jan 24, 2023, 11:31 AM

#

raven ridge Yep, I'm one of a team of two developing it.

Impressive my friend. I've always found your conversation inputs useful so it's not surprise

warm breach Jan 24, 2023, 12:47 PM

#

rose schooner what if `view()` accepted subclasses

so uh, for instance dicts, would it make more sense for the move target to get the same dict as the source or a copy firHmm

#

from einspect import view, unsafe

class Foo:
    pass

class Bar:
    pass

f = Foo()
b = Bar()
b.x = 100

with unsafe():
    view(f) << b

print(f.x)
>> 100

rose schooner Jan 24, 2023, 12:49 PM

#

warm breach so uh, for instance dicts, would it make more sense for the move target to get t...

it's "move", not "copy"

#

so the same dict should make more sense

warm breach Jan 24, 2023, 12:59 PM

#

rose schooner so the same dict should make more sense

so it would be 🥴

print(f.x)  # 100
b.x = "hi"
print(f.x)  # hi

rose schooner Jan 24, 2023, 12:59 PM

#

warm breach so it would be 🥴 ```py print(f.x) # 100 b.x = "hi" print(f.x) # hi ```

yep

warm breach Jan 24, 2023, 3:33 PM

#

@pliant tusk do you know how I can get this to allocate correctly on init?
https://github.com/ionite34/einspect/blob/main/src/einspect/structs/py_long.py#L15-L23

fallen slateBOT Jan 24, 2023, 3:33 PM

#

src/einspect/structs/py_long.py lines 15 to 23

@struct
class PyLongObject(PyVarObject[int, None, None]):
    """
    Defines a PyLongObject Structure.

    https://github.com/python/cpython/blob/3.11/Include/cpython/longintrepr.h#L79-L82
    """

    _ob_digit_0: ctypes.c_uint32 * 0```

warm breach Jan 24, 2023, 3:33 PM

#

since if I do PyLongObject(...) to make a struct instance it won't have enough allocation to fit my actual ob_digit array

#

hm I guess I could use _PyObject_GC_NewVar

pliant tusk Jan 24, 2023, 3:44 PM

#

warm breach hm I guess I could use `_PyObject_GC_NewVar`

Are ints garbage collected?

warm breach Jan 24, 2023, 3:45 PM

#

pliant tusk Are ints garbage collected?

hm, no

#

I guess _PyObject_NewVar then

#

PyVarObject *
_PyObject_NewVar(PyTypeObject *tp, Py_ssize_t nitems)
{
    PyVarObject *op;
    const size_t size = _PyObject_VAR_SIZE(tp, nitems);
    op = (PyVarObject *) PyObject_Malloc(size);
    if (op == NULL) {
        return (PyVarObject *)PyErr_NoMemory();
    }
    _PyObject_InitVar(op, tp, nitems);
    return op;
}

pliant tusk Jan 24, 2023, 3:46 PM

#

warm breach since if I do `PyLongObject(...)` to make a struct instance it won't have enough...

Could alloc it normally then use reallocate to size up

warm breach Jan 24, 2023, 3:48 PM

#

pliant tusk Could alloc it normally then use reallocate to size up

but wouldn't the init already be writing to unowned memory

#

from einspect.structs import PyTypeObject, PyLongObject

x = PyLongObject(
    ob_refcnt=1,
    ob_type=PyTypeObject.from_object(int).as_ref(),
    ob_size=2,
    ob_digit=[1, 1],
)

print(x.into_object())

#

this seems fine

from einspect.structs import *

obj = PyTypeObject.from_object(int).NewVar(1)
obj = obj.contents.astype(PyLongObject)
obj.ob_digit[0] = 123

print(obj.into_object())  # 123

pliant tusk Jan 24, 2023, 3:59 PM

#

warm breach but wouldn't the init already be writing to unowned memory

i meant to overload the PyVarObject init to malloc type(o).__basicsize__ + ob_size * type(o).__itemsize__

warm breach Jan 24, 2023, 4:04 PM

#

pliant tusk i meant to overload the PyVarObject init to malloc `type(o).__basicsize__ + ob_s...

ah hm, or maybe I can override PyVarObject's __new__ to use _PyObject_NewVar instead? firT

#

not sure which one seems safer

#

eh though __new__ would be annoying to type hint since I can't use Self

pliant tusk Jan 24, 2023, 4:12 PM

#

no idea, ctypes does not have a mechanism for variable sized structures

dusk comet Jan 24, 2023, 6:09 PM

#

warm breach so it would be 🥴 ```py print(f.x) # 100 b.x = "hi" print(f.x) # hi ```

.__dict__ is writable (it is copying dict pointer, not copying dict content), so you can do that without einspect

grave jolt Jan 24, 2023, 6:44 PM

#

!e

class A:
    pass

x = A()
y = A()
x.__dict__ = y.__dict__
x.foo = 42
print(y.foo)

fallen slateBOT Jan 24, 2023, 6:44 PM

#

@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.

grave jolt Jan 24, 2023, 6:44 PM

#

WTF

#

cursed

warm breach Jan 24, 2023, 6:48 PM

#

dusk comet `.__dict__` is writable (it is copying dict pointer, not copying dict content), ...

it's mainly for copying the instance dict along with everything else for subclasses of builtin types

from einspect import view, unsafe

class UserList(list):
    pass

ls = UserList()
x = UserList([1, 2, 3])

with unsafe():
    view(ls) << x

print(ls)
# [1, 2, 3]

x.foo = "bar"
print(ls.foo)
# bar

x[1] = "hi"
print(ls)
# [1, 'hi', 3]

dull oxide Jan 25, 2023, 1:49 PM

#

Best modules in python??

fickle ferry Jan 25, 2023, 3:27 PM

#

dull oxide Best modules in python??

Wrong channel, and depends on what you are doing.

warm breach Jan 25, 2023, 10:01 PM

#

what does a __dictoffset__ of -1 mean in python 3.12?

#

https://docs.python.org/3.12/c-api/typeobj.html#Py_TPFLAGS_MANAGED_DICT I'm observing that it means the class uses a the new Managed Dict feature, but I don't see the -1 being documented anywhere

median umbra Jan 26, 2023, 12:04 AM

#

Can anyone help me in this GIT command. I am new to coding.

warm breach Jan 26, 2023, 12:06 AM

#

median umbra Can anyone help me in this GIT command. I am new to coding.

update to newer version to git iirc

median umbra Jan 26, 2023, 12:06 AM

#

I downloaded git yesterday bruh.

unkempt rock Jan 26, 2023, 12:06 AM

#

run git --version

median umbra Jan 26, 2023, 12:07 AM

#

unkempt rock Jan 26, 2023, 12:07 AM

#

hmm odd that seems like a new release

median umbra Jan 26, 2023, 12:07 AM

#

yup

spark magnet Jan 26, 2023, 12:08 AM

#

median umbra yup

this isn't really the right channel. #❓｜how-to-get-help

unkempt rock Jan 26, 2023, 12:08 AM

#

maybe we could talk in unix channel

spark magnet Jan 26, 2023, 12:09 AM

#

#tools-and-devops is right

median umbra Jan 26, 2023, 12:09 AM

#

ok

deep nova Jan 26, 2023, 12:31 AM

#

Question about python's parser

#

When I look at Python's grammar, I see that there are actual function calls embedded in it

#

Is this purely notational? Does the grammar exist simply as a map of the parser, which is itself hand coded?

#

While I'm here — what kind of parser does CPython use? Are there any interesting features or implementation details I should know about?

#

Shortly, I'll be trying to build an equivalent

flat gazelle Jan 26, 2023, 12:35 AM

#

There is a lovely talk on the new PEG parser. https://youtu.be/QppWTvh7_sI

YouTube

North Bay Python

"Writing a PEG parser for fun and profit" - Guido van Rossum (North...

Parsing Expression Grammars (PEGs) are a relatively new formalism for describing grammars suitable for automatically generating efficient parsers. I've become interested in using a PEG-generated parser as an alternative to CPython's nearly 30 year old "pgen" parser generator. This poses some interesting problems. I've also come up with a neat wa...

▶ Play video

deep nova Jan 26, 2023, 1:09 AM

#

Now

#

PEG grammars are context free grammars witch short-circuited alternation, right?

#

And beyond that, PEG parsers are recursive descent packrat parsers designed to support linear time parsing with infinite lookahead, at the cost of more memory consumption?

deep nova Jan 26, 2023, 3:15 AM

#

flat gazelle There is a lovely talk on the new PEG parser. https://youtu.be/QppWTvh7_sI

Wow

#

What an incredible invention

deep nova Jan 26, 2023, 4:18 AM

#

Is there any reason that leading zeroes are prohibited for integer literals, but are permitted for based integers?

#

00001 # invalid
0x001 # valid

#

Is it just a quirk in the parser?

#

Also

#

In lexing and parsing integers — you might have something that looks a bit like this...

#

integer ::= dec_integer ( 'E' | 'e' ) ( dec_integer + )
          | dec_integer ( 'I' | 'i' )
          | dec_integer | hex_integer | oct_integer | bin_integer

dec_integer := dec_digit +
hex_integer := '0' ( 'X' | 'x' ) ( ( dec_digit | hex_glyph ) + )
oct_integer := '0' ( 'O' | 'o' )   ( oct_digit + )
bin_integer := '0' ( 'B' | 'b' )   ( bin_digit + )

#

This is great and all. It's very concise. But it occurs to me that a more granular approach might be more suitable. For example — lets say I allowed the 0x to be a token, and then the number that followed it to be a token. Later, during parsing, I could stitch them together into a hex integer

#

This would give me access to the number part without having to call str.split() on the literal. In a nutshell, I'm going to need to the split the thing anyway, so why not just do it at the first step instead of a later step

raven ridge Jan 26, 2023, 4:31 AM

#

deep nova Is there any reason that leading zeroes are prohibited for integer literals, but...

a leading 0 in Python 2 indicated that the integer literal was base 8, like it does in C. That feature was dropped for Python 3, and so it was much safer to change it to be an error rather to silently have the integer literals evaluate to a different value.

deep nova Jan 26, 2023, 4:32 AM

#

Safer with respect to legacy code?

raven ridge Jan 26, 2023, 4:33 AM

#

right. given the choice, it's nicer for users who are trying to upgrade their code if running old code with a new interpreter fails with an obvious error than if it seems to work but silently does something different.

#

arguably the single reason that Python 2 still exists in some places is because of the failure to do that with string literals - changing string literals from byte strings to unicode strings makes for a porting nightmare, because things fail in strange and surprising ways, potentially in locations far away from where the bug was.

boreal umbra Jan 26, 2023, 4:38 AM

#

I assume that prepending all string literals in py2 code with a b to port it to py3 is not a viable strategy?

raven ridge Jan 26, 2023, 4:39 AM

#

not really. You still need to transform them back and forth to unicode strings whenever you're passing them to most any library function, save some of the ones in os I suppose

raven ridge Jan 26, 2023, 4:42 AM

#

deep nova This would give me access to the number part without having to call `str.split()...

to nitpick a bit, you don't need to split, just slice. You know how many characters 0x or 0b are, so you just ignore that many characters from the literal when computing the value for the token.

raven ridge Jan 26, 2023, 4:43 AM

#

deep nova ```bnf integer ::= dec_integer ( 'E' | 'e' ) ( dec_integer + ) | dec_i...

That first line shouldn't have a + on it, FWIW - the repetition is already handled inside of dec_integer. In case that came from a real grammar.

deep nova Jan 26, 2023, 4:44 AM

#

I'm pretty sure they'd be equivalent in the long run. Slicing might be a shade faster, but it creates a copy of the string and so is worse on memory. Its a moot point though, because I'm trying to do this with as little reliance on built in string manipulation as possible

#

C style, baby

deep nova Jan 26, 2023, 4:44 AM

#

raven ridge That first line shouldn't have a `+` on it, FWIW - the repetition is already han...

A work in progress

#

Anyway, the long and the short of the answer is that in building a new language, I can allow leading zeroes if I so choose

#

Would there be any reason to disallow it?

#

Other than the fast that its pointless?

raven ridge Jan 26, 2023, 4:45 AM

#

deep nova I'm pretty sure they'd be equivalent in the long run. Slicing *might* be a shade...

if by "splitting" you meant str.split("x", 1), then that creates two partial strings.

#

compared to slicing, which creates one.

raven ridge Jan 26, 2023, 4:46 AM

#

deep nova Would there be any reason to disallow it?

confusion, I suppose. For some languages it means octal, for some it doesn't, so allowing it means an ambiguity for human readers.

deep nova Jan 26, 2023, 4:49 AM

#

My dudes

#

I love grammars

#

I havn't had this much fun in ages

raven ridge Jan 26, 2023, 5:03 AM

#

if you haven't seen https://devguide.python.org/internals/compiler/#compiler it's probably up your alley, @deep nova

Python Developer's Guide

Compiler Design

Abstract: In CPython, the compilation from source code to bytecode involves several steps: Tokenize the source code ( Parser/tokenizer.c), Parse the stream of tokens into an Abstract Syntax Tree ( ...

deep nova Jan 26, 2023, 5:03 AM

#

bookmarked

#

So

#

There are many ways to handle numbers

#

(There are many benefits to being a marine biologist)

#

Python doesn't do this, and it does seem a bit excessive, but in planning all of this out I do have to ask myself a few question

#

For example — binary, octal, and hex integers are just integers. There's no reason they can't be raised using e notation

#

And there's no reason they can't be the power, either

#

And, there's no reason they can't be imaginary

#

Any thoughts?

raven ridge Jan 26, 2023, 5:13 AM

#

octal literals are rarely used, but when they are it's as a collection of 8-bit flags (like Unix mode masks).
hex literals are either used for a collection of 256-bit values (rgb or rgba, for instance), or for masks used for bitwise arithmetic, or to make special power-of-two values easier to recognize in code (0xFFFF vs 65535, etc).
binary literals are virtually never used, but if someone did use them I'd imagine it must only be for a mask used for bitwise arithmetic.

And for any of those things, raising another number to that power or raising them to some power or making them imaginary just isn't useful. If someone has chosen one of those 3 representations, they did it for a reason - and that reason likely doesn't recommend those operations.

rose schooner Jan 26, 2023, 5:14 AM

#

raven ridge a leading `0` in Python 2 indicated that the integer literal was base 8, like it...

why didn't they just warn?

#

then after a few minor versions the warning would be removed

deep nova Jan 26, 2023, 5:15 AM

#

What about binary, octal, and hex floats? Same story?

raven ridge Jan 26, 2023, 5:16 AM

#

hex floats are a thing, actually - there's a representation for floats where you can specify the sign, mantissa, and exponent independently

raven ridge Jan 26, 2023, 5:17 AM

#

rose schooner why didn't they just warn?

What would make a warning better than an error? Warnings are easier to ignore. I guess one advantage is that you can collect multiple warnings in one run of the process, but 🤷‍♂️

raven ridge Jan 26, 2023, 5:18 AM

#

deep nova What about binary, octal, and hex floats? Same story?

https://docs.python.org/3/library/stdtypes.html#float.hex

deep nova Jan 26, 2023, 5:18 AM

#

I think I'll allow non-base-10 floats for now. It seems like it might be useful

#

But I'll probably prohibit other-base numbers from being bases or powers in 'e' notation, or from being imaginary

raven ridge Jan 26, 2023, 5:19 AM

#

check that link, it shows Python's syntax for hex floats. C99 allows it as well.

deep nova Jan 26, 2023, 5:19 AM

#

I'm going to have an imaginary type as well as a complex type

rose schooner Jan 26, 2023, 5:20 AM

#

deep nova I'm going to have an `imaginary` type as well as a `complex` type

what's the difference?

deep nova Jan 26, 2023, 5:20 AM

#

The former being any number postfixed by an i, and the latter being the sum of an imaginary and a real

rose schooner Jan 26, 2023, 5:21 AM

#

deep nova The former being any number postfixed by an `i`, and the latter being the sum of...

what would 1+1i - 1 be?

deep nova Jan 26, 2023, 5:21 AM

#

Complex has both a real and an imaginary part. The key difference being that you can write an complex number without needing to include the real part. Just shorthand, really

#

0+1i in traditional form

#

Sorry, let me rephrase — python's syntax requires specifying both the real and imaginary parts of a complex literal. I'm going to allow for omitting the real part, which will default to zero

feral island Jan 26, 2023, 5:23 AM

#

deep nova Sorry, let me rephrase — python's syntax requires specifying both the real and i...

it doesn't, 1j is a valid complex literal

deep nova Jan 26, 2023, 5:23 AM

#

Oh!

#

Well, there we go then

feral island Jan 26, 2023, 5:23 AM

#

1+1j is syntactically just a binop of two nums

deep nova Jan 26, 2023, 5:24 AM

#

Now, another question. What about allowing the imaginary postfix on other-base integers?

#

Again, largely pointless, but its a question I have to ask myself

feral island Jan 26, 2023, 5:25 AM

#

I think it's the same as what @raven ridge said above: the contexts where you'd use non-base 10 numbers aren't contexts where complex numbers are likely to come up

#

then again I'm not sure I've ever used complex numbers in Python other than when writing tests for things that need to support the whole language

raven ridge Jan 26, 2023, 5:27 AM

#

(people keep using them in AoC for representing essentially two-tuples of integers that you can apply mathematical operations to, like doubling)

rose schooner Jan 26, 2023, 5:27 AM

#

feral island I think it's the same as what <@451976922361102357> said above: the contexts whe...

ok so it seems like my programming language does that fine

#

well the concept says so at least

raven ridge Jan 26, 2023, 5:29 AM

#

When it comes to language design, rather than allowing anything that pops into your head as a feature that someone might one day want to use, it's much more reasonable to look for things that people commonly want to do, and make sure that there's a succinct way to represent those things.

deep nova Jan 26, 2023, 5:29 AM

#

raven ridge (people keep using them in AoC for representing essentially two-tuples of intege...

Nice

raven ridge Jan 26, 2023, 5:30 AM

#

Take a large code base, and see if you can find any place where someone raises a hex literal to a power of 10, for instance.

#

if you find people doing that, maybe the e syntax would be a helpful shortcut for those people.

#

if no one is doing it, then you'd be building a feature that no one needs, which costs you work and maintenance effort and doesn't buy users of your language anything.

rose schooner Jan 26, 2023, 5:31 AM

#

raven ridge if no one is doing it, then you'd be building a feature that no one needs, which...

basically the u strings i put in my programming language

deep nova Jan 26, 2023, 5:35 AM

#

So

#

Tomorrow

#

I'm going to need to come back here and ask about lexing/parsing F strings

#

Another thing I'll need to ask about is how PEG parsers can automatically match parentheses, but other's can't

#

I don't think I can brain any more, though

rose schooner Jan 26, 2023, 5:39 AM

#

deep nova I'm going to need to come back here and ask about lexing/parsing F strings

currently python parses the string and then converts it into a sequence of constant strings and formatting values
there's a PEP to use the new PEG parser to parse the beginning (e.g. f"), its contents, then the end (basically the ending quote)

final elk Jan 26, 2023, 6:11 AM

#

Hi

umbral plume Jan 26, 2023, 8:35 AM

#

https://docs.python.org/3/reference/lexical_analysis.html#imaginary-literals
So, according to the language specification, there are imaginary number literals, but there isn't any mention of complex number literals. At the same time though, according to dis, no addition actually has to occur when executing, because such literals get optimised during bytecode compilation.
Does this means that, in a way, complex number literals are actually an implementation-specific thing?

grave jolt Jan 26, 2023, 8:36 AM

#

umbral plume <https://docs.python.org/3/reference/lexical_analysis.html#imaginary-literals> S...

Well, 1+5 will also be optimised

umbral plume Jan 26, 2023, 8:40 AM

#

I suppose, though complex numbers feel different to me for some reason

#

like i suppose it doesn't matter since it gets optimised when dealing with arithmetic between constants anyway, but it sorta sounds strange to say that python doesn't actually have complex literals, despite them existing for all intents and purposes

#

there's also the slightly odd behaviour of 5+3j.imag not doing what you'd expect it to if someone were under the impression complex literals were a thing (though 5+3j.real coincidentally ends up working :P)

rose schooner Jan 26, 2023, 9:18 AM

#

umbral plume there's also the slightly odd behaviour of `5+3j.imag` not doing what you'd expe...

7j+3j.real does not "work" the way it should

#

it's sort of weird

umbral plume Jan 26, 2023, 9:20 AM

#

but that to me actually reads as 7j + 3j.real, since seeing two js indicates two separate imaginary numbers, and . has higher precedence than +

#

yet the + in 3+4j doesn't feel like an actual +, it feels more like just part of the syntax, a bit like the - in 1e-10 doesn't actually invoke unary negation

flat gazelle Jan 26, 2023, 9:57 AM

#

yeah, there aren't any complex literals in the language spec, they are just optimised in cpython

warm breach Jan 26, 2023, 11:25 AM

#

flat gazelle yeah, there aren't any complex literals in the language spec, they are just opti...

!e it's not always inlined anyways, there's a size limit as well

from dis import dis

dis("2**65+5j")

fallen slateBOT Jan 26, 2023, 11:25 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 |   0           0 RESUME                   0
002 | 
003 |   1           2 LOAD_CONST               0 (2)
004 |               4 LOAD_CONST               1 (65)
005 |               6 BINARY_OP                8 (**)
006 |              10 LOAD_CONST               2 (5j)
007 |              12 BINARY_OP                0 (+)
008 |              16 RETURN_VALUE

flat gazelle Jan 26, 2023, 11:30 AM

#

indeed

warm breach Jan 26, 2023, 2:36 PM

#

!e

import time
from ctypes import cast
from einspect import impl, ptr
from einspect.api import Py
from einspect.structs import *

@impl(list)
@classmethod
def with_capacity(cls, n: int) -> list:
    return PyListObject(
        ob_refcnt=1,
        ob_type=PyTypeObject(list).as_ref(),
        ob_size=0,
        ob_item=cast(Py.Mem.Malloc(n * 8), ptr[ptr[PyObject]]),
        allocated=n,
    ).into_object()

ls = []

s = time.perf_counter()
ls.extend(range(9_000))
print((time.perf_counter() - s) * 1000, "ms")

ls = list.with_capacity(9_000)

s = time.perf_counter()
ls.extend(range(9_000))
print((time.perf_counter() - s) * 1000, "ms")

fallen slateBOT Jan 26, 2023, 2:36 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 0.22995099425315857 ms
002 | 0.14029024168848991 ms

warm breach Jan 26, 2023, 2:37 PM

#

could we have a list.with_capacity 👀

spark magnet Jan 26, 2023, 3:41 PM

#

Anyone have an opinion on how code objects are hashed and compared? https://github.com/python/cpython/issues/101346

GitHub

The hash and equals methods of code objects are large, complex and ...

We should just compare code objects by value. Technically, they are immutable values but realistically, there is only on code object per source unit. Attempting to maintain value semantics when the...

warm breach Jan 26, 2023, 4:19 PM

#

spark magnet Anyone have an opinion on how code objects are hashed and compared? https://gith...

https://grep.app/search?q=hash((.*)%2B(__code__))&regexp=true&filter[lang][0]=Python seems like a decent number of rewrite / compilation libraries rely on the hash of __code__

grep.app | code search

Search across a half million git repos. Search by regular expression.

#

#

if we'd just limit the compared fields of __code__ I think __eq__ is still helpful

#

otherwise it'd be difficult to do that comparison

pliant tusk Jan 26, 2023, 4:44 PM

#

spark magnet Anyone have an opinion on how code objects are hashed and compared? https://gith...

if the __eq__ and __hash__ methods are removed, it would break any tool that uses code objects as keys in a dictionary

flat gazelle Jan 26, 2023, 4:56 PM

#

I vaguely remember running into importlib hashing a code object, but I didn't spend too much thinking about why that happens.

feral island Jan 26, 2023, 5:20 PM

#

pliant tusk if the `__eq__` and `__hash__` methods are removed, it would break any tool that...

no, they'd just be hashed by identity

feral island Jan 26, 2023, 5:21 PM

#

warm breach https://grep.app/search?q=hash%28%28.%2A%29%2B%28__code__%29%29&regexp=true&filt...

would those libraries break though if code objects were hashed by identity instead?

#

that's probably hard to answer unfortunately, but I suspect in most cases the answer is "no"

warm breach Jan 26, 2023, 5:22 PM

#

hm... how does the current hash work?

#

does it just go through each field

#

also are there ways the CodeTypes can mutate?

deep nova Jan 26, 2023, 5:32 PM

#

Quick question about Python's escape characters

#

\N{name}

#

Matches a named character. Are the braces actually there, or are they part of the notation

#

\Ncolon or \N{colon}

feral island Jan 26, 2023, 5:35 PM

#

warm breach does it just go through each field

yes I think so, except for a few included fields. see https://github.com/python/cpython/issues/94155

GitHub

Specializing adaptive interpreter code object hashes are less uniqu...

In Python 3.11 and 3.12, the hash function for a PyCodeObject (code_hash()) no longer hashes the bytecode. I assume this is because the specializing adaptive interpreter can change the bytecode how...

feral island Jan 26, 2023, 5:35 PM

#

warm breach also are there ways the CodeTypes can mutate?

in 3.11 yes, because of the specializing adaptive interpreter

swift imp Jan 26, 2023, 5:55 PM

#

Stupid question...nvm

deep nova Jan 26, 2023, 5:55 PM

#

Question about tokenizing strings

raven ridge Jan 26, 2023, 7:20 PM

#

spark magnet Anyone have an opinion on how code objects are hashed and compared? https://gith...

I'm shocked to learn that they were ever treated as a value-semantic type. Hashing by identity seems to obviously be the correct thing to have done in the first place. Whether that's a safe change to make now, though, I don't have an informed opinion on.

raven ridge Jan 26, 2023, 7:21 PM

#

deep nova `\Ncolon` or `\N{colon}`

!e Easy enough to check for yourself: ```py
print("\Ncolon")

fallen slateBOT Jan 26, 2023, 7:21 PM

#

@raven ridge :x: Your 3.11 eval job has completed with return code 1.

001 |   File "<string>", line 1
002 |     print("\Ncolon")
003 |                    ^
004 | SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: malformed \N character escape

raven ridge Jan 26, 2023, 7:21 PM

#

they're a required part of the syntax.

dusk comet Jan 26, 2023, 7:38 PM

#

>>> '\n{colon}'
'\n{colon}'
>>> '\N{colon}'
':'
>>> f'\N{colon}'
':'
>>> f'\N{{colon}}'
  File "<stdin>", line 1
    f'\N{{colon}}'
                  ^
SyntaxError: f-string: single '}' is not allowed
``` interesting

raven ridge Jan 26, 2023, 8:00 PM

#

looks like that's treating {colon as the name of the character to look up, then.

raven ridge Jan 26, 2023, 8:02 PM

#

umbral plume there's also the slightly odd behaviour of `5+3j.imag` not doing what you'd expe...

!e Even weirder: py print(1+1j * 2) If 1+1j were really a "complex literal", then that would give 2+2j instead.

fallen slateBOT Jan 26, 2023, 8:02 PM

#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

(1+2j)

raven ridge Jan 26, 2023, 8:02 PM

#

I guess that's why the repr includes parentheses, come to think of it.

bright aspen Jan 26, 2023, 8:37 PM

#

What function is called when in is called, i.e if my_number in my_list? Is it possible to have a custom class for my_list to overwrite the in check?

quick snow Jan 26, 2023, 8:38 PM

#

bright aspen What function is called when `in` is called, i.e `if my_number in my_list`? Is i...

__contains__

umbral plume Jan 26, 2023, 9:57 PM

#

https://docs.python.org/3/reference/expressions.html#membership-test-operations

#

There's actually 3 different checks that are attempted one after another when you try to use in:

It checks if the object has a __contains__ method, and if so, attempts to use it
Otherwise, it checks if the object has an __iter__ method, and tries to use that to iterate through and find the item
If the above two checks fails, it then checks if the object has a __getitem__ method, and if so, tries indexing it by 0, 1, 2, 3, etc. until either the item is found or an exception occurs
if all 3 checks fail, then a TypeError is raised

deep nova Jan 27, 2023, 12:01 AM

#

I'm at a bit of a crossroads, and I'd love some input. I'm writing my first proper lexical (tokenizer) grammar for my language, and I'm considering my options with respect to strings.

Strings are a rabbit hole. Even regular strings contain escape characters (of numerous varieties) which will need to be interpreted at some point. Raw strings require special handling. Fstrings have replacement fields which will demand, at some point, that the string be properly parsed. As such, the problem at its simplest is this: when and how to parse strings?

The way I see it, I have four options:

Lex strings as atomic tokens. This makes the lexer a bit more complicated, but not unmanageably so. Later, when the parser encounters the string token, have it pass the string to a secondary parser designed specifically for them. This is probably the simplest option.
Have the lexer, upon encountering an open quote, create a new instance of itself in a different mode (string-mode). Using a secondary dfa, break the string up into runs of normal characters and escape characters. For f-strings, when the lexer encounters a replacement field, have it create yet another lexer instance in regular-mode to tokenize the expression. Break out upon the appropriate queues.

This option is recursive, which is good. Its complicated and hacky, which is bad.

#

Don't lex strings at all. Instead, just have a " operator which is treated the same as an opening parenthesis. Everything after this operator is parsed as normal — characters becomes identifiers, symbols become operators, numbers become integers, and so on. Escape characters will be recognized as their own tokens. Whitespace will be explicitly tokenized. Eventually another " will be encountered (or not) but the lexer doesn't blink or care.

Eventually, during parsing, the parser will handle assembly of the string. This is my favorite solution, as it requires only a single pass and no secondary parser or lexer, or mode of operation. But its still a bit ugly. This approach also places string recognition and validation upon the parser instead of the lexer — this is good because the parser is simply more powerful and can do things the lexer can't.

This option is also nice because replacement fields will be built during the outer string construction. There is also potential for arbitrarily nested strings in replacement fields, which I think is kinda cool.

Don't lex at all. Just use a scannerless parser which treats characters as the tokens, and builds primitives on its own. I shy away from this option because, well, I put a lot of work into my DFA algorithm and I don't want it to go to waste. It might be the most graceful solution, though.

At the end of the day, all I know is that I don't want any secondary parsers, secondary operational modes, or hacky lexer recursion. People keep telling me I'm over thinking it, but I want to do this job right without taking the lazy way out.

feral island Jan 27, 2023, 12:05 AM

#

I don't know the right answer, but I can say that (1) is how CPython currently works. It works fine for normal strings, but creates limitations for f-strings, so we're likely going to switch to (2) for f-strings.

#

(3) is an interesting idea that I haven't seen before. I feel like it would make your parser quite complex

deep nova Jan 27, 2023, 12:07 AM

#

Well, first things first

#

Sanity check. Are any of these options completely insane?

feral island Jan 27, 2023, 12:07 AM

#

I feel like you'll likely run into issues with (3) that are hard to solve. The language fundamentally works quite differently within and outside strings, so getting things like escaping to work properly will be hard

#

I'm assuming the language you're parsing is Python or close to it

deep nova Jan 27, 2023, 12:08 AM

#

You might call it a python derivative

#

So, pretty close

feral island Jan 27, 2023, 12:08 AM

#

so Python plus more syntax? You'll want to support \N{} and \U and all that

deep nova Jan 27, 2023, 12:09 AM

#

Python, now with more syntax!

#

Yeah, lots of different escape options. I'm thinking of throwing out the bytes type (I've never once used it) but I'm ready to think about that yet

#

I suppose, then, that option 4 is the most direct and graceful. When the parser encounters an open quote it will explicitly enter into a "new environment" in that only those methods appropriate for handling characters inside strings are called. Via the call stack, open and close quotes are handled trivially. I'm almost certainly going PEG, which will make lookahead an option and that too will be useful for detecting this like unescaped backslashes

raven ridge Jan 27, 2023, 1:20 AM

#

deep nova Sanity check. Are any of these options completely insane?

option 4 strikes me as the hardest of those to implement by far. And the option most likely to have a negative effect on code quality, by making the parser do two jobs instead of one. As a thought exercise, imagine the language didn't have raw string literals: what changes would be required to add them?

deep nova Jan 27, 2023, 1:23 AM

#

raven ridge option 4 strikes me as the hardest of those to implement by far. And the option ...

Well, this is why I'm writing a parser generator. In theory, modifying the language's syntax would mean modifying the grammar, and working in significantly more abstract terms

#

In theory

#

But I understand where you're coming from. Lexers are good at lexing. Parsers are okay at lexing, but not great

raven ridge Jan 27, 2023, 1:24 AM

#

something else to think about is how these decisions will affect your ability to recognize and report syntax errors. Option 3 strikes me as making good error reporting much more difficult.

deep nova Jan 27, 2023, 1:24 AM

#

Though its perfectly doable, the amount of code required to make a parser work on that fine of a grain is probably massive

deep nova Jan 27, 2023, 1:25 AM

#

raven ridge something else to think about is how these decisions will affect your ability to...

Indeed. The ambiguity is palpable

#

Surely there must be a solution. I'd like to walk through the logic step by step

raven ridge Jan 27, 2023, 1:27 AM

#

consider that "10e3" and 10e3 are very differing things - lexing one as OPEN_QUOTE NUMBER CHARACTER NUMBER CLOSE_QUOTE might make sense, but lexing the other one as NUMBER CHARACTER NUMBER absolutely doesn't, and would make your life much harder.

deep nova Jan 27, 2023, 1:27 AM

#

Strings are, in modern programming languages, quite complex. They are composite objects which different types of characters, may contain expressions and, potentially, other strings. Thus they are recursive as well.

It follows from this that strings are simply irregular, and cannot be lexed to satisfaction (not without gross hacks anyway)

#

Does this check out?

raven ridge Jan 27, 2023, 1:28 AM

#

"to satisfaction" is ambiguous - to whose satisfaction?

deep nova Jan 27, 2023, 1:29 AM

#

To the satisfaction of the definition of a lexer. You can't lex an unlexable object. Trying to do so will require state and procedure that goes beyond what "a lexer" by the strict definition is capable of

raven ridge Jan 27, 2023, 1:30 AM

#

that depends on what the lexemes are.

#

option 1 - lexing it as just STRING_LITERAL and then figuring out the rest later - is still lexing it

#

that option at least requires figuring out where it starts and stops, and what characters are inside the literal

deep nova Jan 27, 2023, 1:32 AM

#

If you're willing to accept strings being un-nestable (well, strings using the same quotes) then I suppose you're right. It does severely limit what you're able to put inside the string though

raven ridge Jan 27, 2023, 1:33 AM

#

deep nova Strings are, in modern programming languages, quite complex. They are composite ...

you don't need to accept strings being unnestable. You can nest this way as well by just making the lexer hold some counters, right?

deep nova Jan 27, 2023, 1:33 AM

#

I'm not willing to ad-hoc a solution

raven ridge Jan 27, 2023, 1:34 AM

#

it's more annoying, but it's certainly not impossible for the lexer to figure out where the string ends.

deep nova Jan 27, 2023, 1:34 AM

#

I want an academically substantiated, theory based approach

raven ridge Jan 27, 2023, 1:34 AM

#

in the same way as it's not impossible for a parser to do all the lexing.

grave jolt Jan 27, 2023, 1:34 AM

#

what does python do right now? pithink

raven ridge Jan 27, 2023, 1:35 AM

#

lex the entire string literal as one token

deep nova Jan 27, 2023, 1:35 AM

#

grave jolt Jan 27, 2023, 1:35 AM

#

julia moment

feral cedar Jan 27, 2023, 1:35 AM

#

i don't get why julia has to be so "purist"

grave jolt Jan 27, 2023, 1:36 AM

#

eh, it's not a big deal

raven ridge Jan 27, 2023, 1:36 AM

#

deep nova Strings are, in modern programming languages, quite complex. They are composite ...

I'd state this differently. String literals are quite complex, and the contents of string literals follows an entirely different grammar than is used anywhere else in the language. While it's certainly possible to define one grammar and one set of tokens that encompasses both stuff-inside-strings and stuff-outside-strings, it's not clear to me that it makes things more readable, or maintainable, or more performant, etc.

grave jolt Jan 27, 2023, 1:36 AM

#

I just reference it for the memes, no offence to julia,

deep nova Jan 27, 2023, 1:37 AM

#

raven ridge I'd state this differently. String literals are quite complex, and the contents ...

If the grammar inside the string and grammar outside are fundamentally different, and, I want to maintain low-level control over construction in both environments

#

By which I mean, I want the escape characters in strings to be their own tokens, runs of unescaped characters to be their own tokens, and so on

#

Then it stands to reason I'll need two machines (or else one machine operable in two modes)

raven ridge Jan 27, 2023, 1:38 AM

#

yeah.

deep nova Jan 27, 2023, 1:40 AM

#

Fair enough. Questions abound though. What about error propagation? Toggling from inside a string to outside (potentially recursively). And, as soon as you start switching between environments you need to start carrying context from one environment to the other and back again (for example, whether the string is raw or not, or whether it uses single or double quotes)

raven ridge Jan 27, 2023, 1:40 AM

#

yes, you do. That could be one class with a stack of contexts, though.

warm breach Jan 27, 2023, 1:41 AM

#

deep nova

what if we had string division as well 👀

deep nova Jan 27, 2023, 1:41 AM

#

So, a recursive lexer

raven ridge Jan 27, 2023, 1:41 AM

#

warm breach what if we had string division as well 👀

I've heard someone seriously suggest that as a way to invoke str.split() or maybe str.partition()

deep nova Jan 27, 2023, 1:42 AM

#

raven ridge I've heard someone seriously suggest that as a way to invoke `str.split()` or ma...

Badass!!!!!

#

I'm totally stealing that

warm breach Jan 27, 2023, 1:43 AM

#

!e

from einspect import view

view(str)["__truediv__"] = str.partition

print("hello+world" / "+")

fallen slateBOT Jan 27, 2023, 1:43 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

('hello', '+', 'world')

raven ridge Jan 27, 2023, 1:44 AM

#

one major issue with trying to lex both inside and outside of a string using the same lexer is that whitespace is significant in one and not in the other. You want to tokenize 1 +5 as NUMBER("1") OPERATOR("+") NUMBER("5"), but you can't tokenize "1 +5" as BEGIN_QUOTE('"') NUMBER("1") OPERATOR("+") NUMBER("5") END_QUOTE('"') or you've lost syntactically significant whitespace.

deep nova Jan 27, 2023, 1:45 AM

#

You've stated your piece on this approach I know, but I think its worth mentioning again

#

In theory, one could have a single grammar which tokenizes everything from both grammars. Whether or not this is possible would depend on the nature of the differences between them. I'm not sure either of us can say for sure whether it would be impossible in a python-like language, though we can agree it would probably be difficult and potentially be quite ugly

grave jolt Jan 27, 2023, 1:46 AM

#

what if

#

a high-level language without string literals

deep nova Jan 27, 2023, 1:46 AM

#

Either case, the theory remains the same: kill them all and let the parser sort them out.

raven ridge Jan 27, 2023, 1:48 AM

#

one could have a single grammar which tokenizes everything from both grammars
You absolutely can. It seems like a lot of extra complexity to me, but it's clearly possible to lex my first example as NUMBER("1") OPERATOR("+") NUMBER("5") and my second as BEGIN_QUOTE('"') LITERAL("1 +5") END_QUOTE('"'). The question is just whether that makes implementing the parser easier or harder. My intuition is that it would be harder, but 🤷‍♂️

rich cradle Jan 27, 2023, 1:48 AM

#

deep nova I'm at a bit of a crossroads, and I'd love some input. I'm writing my first prop...

my current approach in my parser is to lex f-strings not unlike parsing. it's something like (it's been a while) FStart, FText, FExprStart, FExprEnd, and FEnd. then the parser actually handles putting them together as necessary. they just turn into their own kind of ast node. it's significantly more permissive than what python currently allows, though.

#

if i wanted to support exactly what cpython does, i'd just handle it all in the lexer stage, just like it does.

#

you don't need as much "deep" context

grave jolt Jan 27, 2023, 1:49 AM

#

Why would you want such a nested f-string? it sounds like a nightmare to read for a human

rich cradle Jan 27, 2023, 1:49 AM

#

i think my impl is more like pep 701

deep nova Jan 27, 2023, 1:50 AM

#

rich cradle my current approach in my parser is to lex f-strings not unlike parsing. it's so...

I'm going to need some context. Do you lex fstrings literals as big atomic chunks of characters, lex again in a second step, and then parse?

rich cradle Jan 27, 2023, 1:50 AM

#

grave jolt Why would you want such a nested f-string? it sounds like a nightmare to read fo...

i don't. but i do want f"{d["key"]}" to be valid.

deep nova Jan 27, 2023, 1:50 AM

#

rich cradle i don't. but i do want `f"{d["key"]}"` to be valid.

YESSSSSSs

#

My dude!

grave jolt Jan 27, 2023, 1:51 AM

#

I personally find f"{d['key']}" easier to parse as a human

feral island Jan 27, 2023, 1:51 AM

#

!pep 703

grave jolt Jan 27, 2023, 1:51 AM

#

f"{d[" + key + "]}" will definitely take me a while

fallen slateBOT Jan 27, 2023, 1:51 AM

#

**PEP 703 - Making the Global Interpreter Lock Optional in CPython**

Link

Status

Draft

Python-Version

3.12

Created

09-Jan-2023

Type

Standards Track

feral island Jan 27, 2023, 1:51 AM

#

sorry wrong one

#

!pep 701

fallen slateBOT Jan 27, 2023, 1:52 AM

#

**PEP 701 - Syntactic formalization of f-strings**

Link

Status

Draft

Python-Version

3.12

Created

15-Nov-2022

Type

Standards Track

feral island Jan 27, 2023, 1:52 AM

#

^ this is likely to make all @grave jolt 's fears come true

grave jolt Jan 27, 2023, 1:52 AM

#

o no

rich cradle Jan 27, 2023, 1:53 AM

#

deep nova I'm going to need some context. Do you lex fstrings literals as big atomic chunk...

i keep track of how nested in delimiters i am in the lexer. fstrings and their inner expressions are another part of that. there's only one lexing pass. the example i gave above would end up lexed as something like FStart, FExprStart, (just lex like normal tokens now) Ident, OpenBracket, Str, CloseBracket, (we came to a brace and aren't nested in any other delimiter pairs, we're done)FExprEnd, FEnd.

raven ridge Jan 27, 2023, 1:53 AM

#

grave jolt Why would you want such a nested f-string? it sounds like a nightmare to read fo...

referential transparency for one. The rule that you can refactor py greeting = "Hello, y'all" print(greeting) into py print("Hello, y'all") without changing the meaning makes it easier to reason about both the behavior of function calls and the meaning of variables. Currently, you can't do the same thing with f-strings, though - you can't refactor py greeting = "Hello, y'all" print(f'greeting={greeting}') into ```py
greeting = print(f'greeting="Hello, y'all"')

#

people who maintain code generators also said that the arbitrary restriction on not being able to reuse quotes within different levels of an f-string makes it much harder to generate code, because you need to carry extra context down the stack

grave jolt Jan 27, 2023, 1:55 AM

#

I don't have a formal rebuttal but that sounds a bit contrived pithink

#

that sounds like a good argument ^

deep nova Jan 27, 2023, 1:56 AM

#

https://tenor.com/view/spacejunkie-elon-spacejunkie-hmm-gif-21468683

Tenor

raven ridge Jan 27, 2023, 1:56 AM

#

https://discuss.python.org/t/pep-701-syntactic-formalization-of-f-strings/22046 had a bunch of arguments, both for and against, if you want to see some of the discussion.

Discussions on Python.org

PEP 701 – Syntactic formalization of f-strings

Hi 👋 I am very excited to share with you a PEP that @isidentical, @lys.nikolaou and myself have been working on recently: PEP 701 - PEP 701 – Syntactic formalization of f-strings. We believe this will be a great improvement in both the maintainability of CPython and the usability of f-strings. We look forward to hear what you think about this ...

deep nova Jan 27, 2023, 1:56 AM

#

Func fact — I actually look a little bit like him

#

Same forehead

raven ridge Jan 27, 2023, 1:57 AM

#

raven ridge https://discuss.python.org/t/pep-701-syntactic-formalization-of-f-strings/22046 ...

it also had a poll; 2/3rds of respondents prefer to lift the restriction and allow arbitrary nesting.

grave jolt Jan 27, 2023, 1:58 AM

#

Referential transparency is kind of lacking in python tbh, in some places

#

like the closure gotcha

lone sun Jan 27, 2023, 1:58 AM

#

From a purist theoretical perspective the answer is (4) because there's really no such thing as lexing; there are just sequences of characters and grammar production rules and parse trees. The whole reason for introducing lexing as a separate step is for convenience. So in that sense, the very fact that you've introduced a lexer is some kind of flaw.

deep nova Jan 27, 2023, 1:58 AM

#

So, I appreciate the sentiment, but I'm not sure I'm 100% on the implementation they are proposing

raven ridge Jan 27, 2023, 1:59 AM

#

lone sun From a purist theoretical perspective the answer is (4) because there's really n...

I suppose that's true

deep nova Jan 27, 2023, 1:59 AM

#

They say they're adding new token types as well as new protocols into the grammar. Cool. They don't mention anything about how the non-replacement-field parts of the string are being lexed

grave jolt Jan 27, 2023, 1:59 AM

#

hmm, maybe lexing and parsing are just parsing, but at different levels of abstraction

#

or rather, with different 'elements'

#

(characters vs 'tokens')

deep nova Jan 27, 2023, 1:59 AM

#

The FSTRING_MIDDLE parts

lone sun Jan 27, 2023, 1:59 AM

#

We like lexers for good practical reasons. But if you think about them as a convenience, then (1), (2), and (3) are all equally acceptable. It's just a matter of what you think is convenient for the language you want to support.

#

I think that if you want to support nested f-strings, then (2) is probably the best route. While if you don't, then you can't beat the ease of coding (1).

raven ridge Jan 27, 2023, 2:01 AM

#

deep nova The FSTRING_MIDDLE parts

PEPs specify the behavior and contracts of the language, not the implementation. The existence or absence of a lexer in any particular implementation is an implementation detail that the PEP rightly doesn't address.

deep nova Jan 27, 2023, 2:01 AM

#

Ahh, I see

raven ridge Jan 27, 2023, 2:03 AM

#

the ability to make good error reports on syntax errors is often the biggest practical difference between different approaches to parsing. It's very easy to say "the character on line 25, character 4 is unacceptable", but different approaches will make it very difficult or impossible for you to explain why that character isn't allowed there.

deep nova Jan 27, 2023, 2:04 AM

#

Well, if that's the case, the option 4 is by far the best approach

#

Its the option with the fewest question marks, and the least demand for ad-hoc problem solving

#

If its a question of "switching contexts between two lexical grammars" and propagating those contents as well as errors correctly, well, that's a whole can worms

lone sun Jan 27, 2023, 2:06 AM

#

I think that rather than thinking of it as "switching contexts" you could think of it as "recursing into a grammar rule with a different lexer". To me that makes it feel cleaner.

deep nova Jan 27, 2023, 2:07 AM

#

As an aside question, shouldn't that sort of reporting be shunted forward to the semantic analyzer?

lone sun Jan 27, 2023, 2:07 AM

#

Also if you keep an explicit stack of lexers then that probably helps error reporting.

deep nova Jan 27, 2023, 2:07 AM

#

In my mind, the only things a lexer and a parser should be reporting on are those things that unambiguously fall within their domain

lone sun Jan 27, 2023, 2:08 AM

#

What is in their domain is not always obvious, though.

deep nova Jan 27, 2023, 2:08 AM

#

lone sun Also if you keep an explicit stack of lexers then that probably helps error repo...

Much to think about 😐

raven ridge Jan 27, 2023, 2:08 AM

#

imagine you've got a file in a Python-like language whose first line is ff"". That's a syntax error. Discovering that error with a 1-character-at-a-time parser requires look-behind. ff is allowed as the first two characters. When you hit the ", you need to look backwards and say that a " preceded by an f is valid (an f-string), but a "' preceded by 2 f's is not valid. And then you need to figure out what type of thing the two f's are (a name, I suppose) and then explain to the user that their error was putting a string literal after a name without an operator in between them.

deep nova Jan 27, 2023, 2:10 AM

#

Honestly... I'm so very torn

lone sun Jan 27, 2023, 2:10 AM

#

A bottom-up parser does all of this without even blinking. The problem is constructing the grammar for it. That grammar would have to say, "okay that first f could be part of a format string or a variable name or a from or ... oh now I saw a second f, it must be a variable name, wait the " is no good here." If you can build that grammar then reporting the location of an error is easy. (Reporting on the type of error, though, is really hard.)

deep nova Jan 27, 2023, 2:10 AM

#

Well, hold up

#

If you want to be able to solve this problem...

#

f"this is a {"test"}"

#

You simply can't do this with a lexer — not without attaching a stack and some custom logic

raven ridge Jan 27, 2023, 2:12 AM

#

who says a lexer can't have a stack?

deep nova Jan 27, 2023, 2:13 AM

#

grumbles uncomfortably

#

I really don't want an ad-hoc solution. I want something grounded in theory.

lone sun Jan 27, 2023, 2:15 AM

#

Theory says you can't do it with a DFA. But you can with a context-free grammar (even an LL grammar). So you could just write a lexer that's not a DFA, declare yourself happy, and move on.

deep nova Jan 27, 2023, 2:15 AM

#

Well — I want to talk about option 3

#

A single grammar for both environments. What really are the differences between the two environments?

#

Outside of a string you have identifiers, numbers, <strings>, keywords, and operators

#

Implicitly, you've also got whitespace, newlines, tabs, and comments

#

Inside a string you've got all of these things as well. You'd need to be able to recognize things words which didn't qualify as identifiers or number literals (maybe just default to a catch-all "blob" token). You'd need to recognize escape characters. One you entered into a replacement field you'd be back in normal territory again

raven ridge Jan 27, 2023, 2:20 AM

#

they're extremely different. Basically nothing in common.

#

the tokenization for x [ (1+2) * 3 ] and the tokenization for "x [ (1+2) * 3 ]" should almost certainly not have a single token in common.

rich cradle Jan 27, 2023, 2:22 AM

#

deep nova You simply can't do this with a lexer — not without attaching a stack and some c...

it's not uncommon for lexers to have a delimiter stack. i believe cpython does - or does something similar - to handle "ignoring" indentation within delimiter pairs.

#

i don't know about theory, but there are years of historical precedent for doing this.

deep nova Jan 27, 2023, 2:24 AM

#

raven ridge the tokenization for `x [ (1+2) * 3 ]` and the tokenization for `"x [ (1+2) * 3 ...

But there's no reason the parser can't sort that out after the fact.

"x [ (1+2) * 3 ]" becomes (QUOTE "") (WS) (L_BRACK) (WS) (L_PAREN) (INT 1) (OP +) (INT2) (WS) (STAR) (WS) (INT 3) (WS) (R_BRACK) (QUOTE ")

#

The literals are all still there. The parser can stitch a string together from these tokens quite easily

rich cradle Jan 27, 2023, 2:25 AM

#

it can, but why should it?

#

you're just going back to lexerless parsing, really

deep nova Jan 27, 2023, 2:26 AM

#

XD Yeah, pretty much

raven ridge Jan 27, 2023, 2:27 AM

#

one of the major things that a lexer is doing is figuring out where one "thing" stops and the next "thing" starts. Making the parser care about whitespace between tokens seems to largely defeat the point of lexing.

deep nova Jan 27, 2023, 2:28 AM

#

Alright, so I'll concede on that.

rich cradle Jan 27, 2023, 2:29 AM

#

you can also think of a lexer as something that turns source code into the smallest coherent units. when you're running a string through the rest of the process, you don't really care what content structure the string has (barring escapes). you just care that it's a string, and that you know what it has. there's no need to dissect it. (massive generalization here, but you get the point)

deep nova Jan 27, 2023, 2:31 AM

#

Here's what I want, and maybe you guys can tell me what I need: I want the individual components of strings to be their own tokens. I do not want to tokenize strings as atomic literals and then run through a second machine. I do want to be able to use the same quotes as bookend the string within the string's replacement fields. I want to move through the input in a single pass (or two passes, if lexing once and then parsing once)

#

Option 1 is out by definition. Option 3 is out for reasons discussed above

rich cradle Jan 27, 2023, 2:32 AM

#

f-strings in particular, yes? i know my approach works in practice (f-strings as what're ultimately delimiter pairs).

deep nova Jan 27, 2023, 2:33 AM

#

Option 2 depends on one thing: can recursive/stacked lexers handle strings which use the same quotes as enclose them within the replacement fields? Would an inner-lexer, upon finding sed quote, not need to escape immediately, or else perform costly lookahead?

rich cradle Jan 27, 2023, 2:33 AM

#

option 4 also works. ~~it seems like option 2 is also ruled out by virtue of "duplicating" the lexer.~~ it appears i misunderstood there.

deep nova Jan 27, 2023, 2:34 AM

#

I'm not opposed to recursive lexing so long as it doesn't require any complicated glue

raven ridge Jan 27, 2023, 2:48 AM

#

I want the individual components of strings to be their own tokens.
That implies that raw strings need to be lexed differently than regular strings, right off the bat. Because \n in one should be lexed as NEWLINE_ESCAPE or something, but not in the other.

lone sun Jan 27, 2023, 2:48 AM

#

deep nova I'm not opposed to recursive lexing so long as it doesn't require any complicate...

I think recursive lexing should be able to do this as long as it behaves as a coroutine. When the parser detects a nested f-string it will have to send into the lexer the fact that it needs to recurse.

deep nova Jan 27, 2023, 3:10 AM

#

Question

#

The issue with scannerless parsing is that parsers aren't very good at lexing, correct?

#

The issue that I'm facing with f strings relates to an inability to recurse and an inability to "switch contexts" as needed, correct?

#

Could I build a parser that lazy-lexed?

#

I.E. A parser/lexer combo which automatically generated a new token via DFA whenever one is requested by the parser but does not already exist?

#

Something to think about :3

steel chasm Jan 27, 2023, 6:15 AM

#

Heya - I'm writing a module and it relies on an outstanding PR for core python (asyncio). I'm not really aware of how to go about including core python in a pip install. I don't see this PR being accepted any time soon (was shipped in 2019). I would appreciate your advice on how to proceed. https://github.com/python/cpython/pull/16429 is the PR. Is it best practice to just import class and just overwrite the method with the patch?

GitHub

bpo-37141: support multiple separators in Stream.readuntil by bmerr...

Allow Stream.readuntil to take an iterable of separators and match any
of them. The earliest match endpoint wins (which ensures that results
are dependent on the chunking) and on ties shortest sepa...

rare lantern Jan 27, 2023, 11:31 AM

#

So, I noticed recently, the functools.partial is incompatible with the inspect module. I have built a workaround.

# a proper Partial implementation that works with paramspec etc
def partial(f, *args, **kwargs):
    return wraps(f)(lambda *a, **kw: f(*(args + a), **{**kwargs, **kw}))

#

this works because it returns a standard python function with the attributes of the wrapped function applied using wraps. Similar functionality cold be implemented in the default partial to allow this, especially given that wraps and partial live in the same module, this should be a fairly simple fix

rose schooner Jan 27, 2023, 12:02 PM

#

rare lantern So, I noticed recently, the functools.partial is incompatible with the inspect m...

probably because the actual partial() is an object implemented in C, specifically from _functools

#

it's not a function

rare lantern Jan 27, 2023, 12:02 PM

#

rose schooner probably because the *actual* `partial()` is an object implemented in C, specifi...

Yeah its a class

#

So it breaks the inspect

#

cause obviously is no longer representing a function

#

the above version applies the same functionality, while preserving the function

rose schooner Jan 27, 2023, 12:04 PM

#

how does it "break"?

rose schooner Jan 27, 2023, 12:04 PM

#

rare lantern So it breaks the inspect

class A:
    def __init__(self):
            x = 2

from inspect import getsource
print(getsource(A))
``` works

rare lantern Jan 27, 2023, 12:05 PM

#

you can't use things like paramspec etc etc, anythng that inspect has to inspect functions doesn't work for partials, because partial is a class

rose schooner Jan 27, 2023, 12:05 PM

#

rare lantern you can't use things like paramspec etc etc, anythng that inspect has to inspect...

that's only one of the factors

rare lantern Jan 27, 2023, 12:05 PM

#

so if you say, have a pipeline to use these details, it will break if a partial is aver included

rose schooner Jan 27, 2023, 12:05 PM

#

the other one is that it's implemented in C

rare lantern Jan 27, 2023, 12:05 PM

#

rose schooner the other one is that it's implemented in C

I know

#

but my point is

#

with the wraps decorator

#

the c object is redundant

#

because you can do it functonally, and not break things

#

(3 lines over 100 in the functools module)

rose schooner Jan 27, 2023, 12:06 PM

#

rare lantern you can't use things like paramspec etc etc, anythng that inspect has to inspect...

‫it does work

#

tested with the same A class above

#

i think the only problem is that it's in C

rare lantern Jan 27, 2023, 12:07 PM

#

rose schooner i think the only problem is that it's in C

Have been trying to put together a visual scriptor that uses the inepct module to gather the function details to assemble the nodes, and there are a bunch that will except automatically

rose schooner Jan 27, 2023, 12:07 PM

#

rose schooner tested with the same `A` class above

it seems to just get A.__init__()

#

wraps() uses partial() btw

rare lantern Jan 27, 2023, 12:08 PM

#

rose schooner it seems to just get `A.__init__()`

try it wwith partials

rose schooner Jan 27, 2023, 12:08 PM

#

def wraps(wrapped,
          assigned = WRAPPER_ASSIGNMENTS,
          updated = WRAPPER_UPDATES):
    """Decorator factory to apply update_wrapper() to a wrapper function

       Returns a decorator that invokes update_wrapper() with the decorated
       function as the wrapper argument and the arguments to wraps() as the
       remaining arguments. Default arguments are as for update_wrapper().
       This is a convenience function to simplify applying partial() to
       update_wrapper().
    """
    return partial(update_wrapper, wrapped=wrapped,
                   assigned=assigned, updated=updated)

#

so um

#

why use wraps() again?

rare lantern Jan 27, 2023, 12:09 PM

#

rose schooner `wraps()` uses `partial()` btw

You are missing whatss actually happening there though

#

its just passing the stuff on here

rose schooner Jan 27, 2023, 12:09 PM

#

update_wrapper()?

rare lantern Jan 27, 2023, 12:11 PM

#

It will still just crash with plain partial

#

because partial itself does not apply the update wrapper functionality

#

so, when inspected

#

does not appear as the function is is a partial of

#

wraps pulls up that data

#

and then the only solution is to try and filter and update these in whatever system you are working with

#

by checking if is a partial

#

and then doing something janky to try and update that

#

it seems update_wrapper also works here

rose schooner Jan 27, 2023, 12:14 PM

#

it's probably better to just define and use the custom partial() in user code instead of adding it to the stdlib

rare lantern Jan 27, 2023, 12:14 PM

#

so, its not like it couldn't be added to the partial

rose schooner Jan 27, 2023, 12:15 PM

#

rose schooner it's probably better to just define and use the custom `partial()` in user code ...

there's still the other features of partial() that a simple function doesn't replace

rare lantern Jan 27, 2023, 12:15 PM

#

rose schooner it's probably better to just define and use the custom `partial()` in user code ...

but shouldn't something that is literally behaving like a function in this case, not be able to be treated as such in all cases?

#

because its used as such in all cases

rose schooner Jan 27, 2023, 12:15 PM

#

rose schooner there's still the other features of `partial()` that a simple function doesn't r...

for example you can pickle a partial() but you can't pickle a function

rare lantern Jan 27, 2023, 12:15 PM

#

and can be used in place in all places

#

Oh python

rose schooner Jan 27, 2023, 12:16 PM

#

there's also the useful representation output

rare lantern Jan 27, 2023, 12:16 PM

#

why are you like this lol

rare lantern Jan 27, 2023, 12:16 PM

#

rose schooner there's also the useful representation output

I mean, yes

#

but make it compatible with inspect

#

he same way funcs are

#

this way

#

best of both

rose schooner Jan 27, 2023, 12:17 PM

#

!e and a hidden feature too ```py
from functools import partial
def a(y):
pass

def b(x, z):
return x * z + 2

a.func = b
print(partial(a, 5)(2))

fallen slateBOT Jan 27, 2023, 12:17 PM

#

@rose schooner :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 9, in <module>
003 | TypeError: a() takes 0 positional arguments but 1 was given

rare lantern Jan 27, 2023, 12:17 PM

#

its such an annoying little gotcha

rose schooner Jan 27, 2023, 12:17 PM

#

ok maybe it doesn't work that way

rare lantern Jan 27, 2023, 12:18 PM

#

partial = lambda f, *a, **k: update_wrapper(lambda *_a, **_k: f(*(a + _a), **{**k, **_k}), f)
partialmethod = lambda f, *a, **k: update_wrapper(lambda self, *_a, **_k: f(self, *(a + _a), **{**k, **_k}), f)

works

#

as a way to define em

rose schooner Jan 27, 2023, 12:19 PM

#

rose schooner !e and a hidden feature too ```py from functools import partial def a(y): pa...

it's in the pure python implementation in functools

rare lantern Jan 27, 2023, 12:19 PM

#

I think the update wrapper functionality could technically be applied in class

rose schooner Jan 27, 2023, 12:22 PM

#

rose schooner it's in the pure python implementation in `functools`

oh wait it's to chain partial()s effectively

rare lantern Jan 27, 2023, 12:23 PM

#

My method works with that too lol

#

cause purely functional

#internals-and-peps

source getsize function: https://stackoverflow.com/questions/449560/how-do-i-determine-the-size-of-an-object-in-python

Custom objects know their class.

Function objects seem to know way too much, including modules.

Exc...