#internals-and-peps

1 messages · Page 10 of 1

rose schooner
#

here are some improvements ```py
partial = lambda f, *a, **k: update_wrapper(lambda *_a, **_k: f(*a + _a, **k, **_k), # star unpack precedence is quite low so we can just do this
wrapped=f, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES) # provide like in wraps()
partialmethod = lambda f, *a, **k: update_wrapper(lambda self, *_a, **_k: f(self, *a + _a, **k, **_k),
wrapped=f, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES)

#

the logic is mainly just lambda *_a, **_k: f(*a + _a, **k, **_k) and lambda self, /, *_a, **_k: f(self, *a + _a, **k, **_k)

rare lantern
#

I assume you are importing the constants form functools

#

ah yes

rare lantern
rose schooner
#

f(*a + _a, **k, **_k) also has 4 alternate variations:
f(*a, *_a, **k | _k)
f(*a + _a, **k | _k)
f(*a, *_a, **k | _k)
f(*a, *_a, **k, **_k)

rose schooner
rare lantern
rose schooner
#

i don't know what any of that means but ok

fallen slateBOT
rare lantern
#

is the most basic and concise bits

warm breach
#

is there an accessible type of builtin methods

#

!e

print(type(print))
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class 'builtin_function_or_method'>
warm breach
#

nevermind found it, types.BuiltinFunctionType

deep nova
#

I see a lot of conjecture about what PEG parsers do

#

But not a lot of material on how to build one

#

Anyone able to point me in the right direction?

quick snow
warm breach
deep nova
#

Someone on another server claims that PEG parsers are cubic with respect to memory consumption in the worst case

#

PEP617 claims that the new parser falls within +-10% of the original parser when applied to the standard library as a benchmark

#

I'm just curious what lever of hackery was required to achieve that

#

Guido has always seemed (to me) to play things fast and loose when it comes to ad-hoccing his solutions

raven ridge
#

those aren't incompatible statements. Something can be cubic in the worst case when applied to adversarial inputs, and perform very well on typical inputs

#

the worst case time complexity of a backtracking regex engine is 2**n with respect to the length of the input string, for example

#

granted that takes a particularly terrible pattern, but 🤷‍♂️

deep nova
#

That is a sound analysis that holds absolutely zero actual information

#

Just so you know XD

lone sun
#

It's worth noting that the 2**n worst case basically requires an adversarial regex, too. Regexes like \([ab]\)c*\1 can be parsed in polynomial time.

#

Even if general PEG parsers require cubic memory in the worst case, it's possible that Python's grammar doesn't trigger that kind of memory usage.

#

Or that it would require pathological input, like the whole file being nothing but f"{f"{f"{f"{f"{...}"}"}"}"}".

raven ridge
deep nova
#

Stated more explicitly as you've done, that's a bit clearer. You're original phrasing sounded a lot more like "it depends"

#

Nontheless

raven ridge
#

Something can be cubic in the worst case when applied to adversarial inputs, and perform very well on typical inputs.
That wasn't an "it depends".

deep nova
#

My actual question was to do with the actual optimizations in the algorithm

raven ridge
#

that's begging the question. You're assuming that the algorithm must be optimized for memory usage to achieve its results.

deep nova
#

Godly, my brother

#

My friend

#

You've got to learn to give a straight answer

raven ridge
#

I don't know how much clearer I can be. Your premise is wrong.

deep nova
#

First off, Guido mentions several times in his blog posts about PEG parsing that optimizing the memory footprint is on his to do list. Moreover, the primary complaint I've seen with respect to PEG parsers throughout my research has been a nontrivial potential and actual excessive memory footprint

#

AND — even if a hideous memory footprint is possible in only adversarial conditions — its still worth talking about and asking about. I asked "what's been done about the memory footprint in Python's parser"

You answers "why would we optimize the algorithm in the first place". While technically an answer, it neither confirms nor denies the existence of any optimizations

#

Which is what I care about

raven ridge
#

what's been done about the memory footprint in Python's parser
As far as I know, nothing, but I'm not an expert. It sounds like you've probably read up on it more than me.

My impression is that the only work done to limit the memory usage is carefully designing the grammar to try to ensure that ambiguities are quickly resolved so that look-ahead is rarely needed.

#

LWN summarizes it as just:

There is, of course, a cost for infinite lookahead, in the form of increased memory usage. But the PEP notes that the performance of the new parser is within 10% of that the old, both in terms of speed and memory. While PEG with packrat parsing consumes more memory, the new parser does not create a concrete syntax tree, as the existing parser does; instead, it directly creates the abstract syntax tree, which makes up for most of the memory consumed by the new techniques.
Which matches up to my understanding that no particular hacks were required.

deep nova
#

Rockin

raven ridge
#

I do vaguely recall that some proposed language feature was shot down on the basis that it would require too much look ahead and could cause excessive memory usage, too - though I don't recall which

deep nova
#

The only optimization I can think of thus far is that you don't need to hold the entire token steam in memory, just the parts that are currently relevant. Once you exit a subtree, basically, you can throw out the tokens it's consumed

#

But that's a battle for another day

dim shard
#

GodlyGeek

gray galleon
#

how does python peg parser handle left recursion
its a recursive descent parser amirite

rose schooner
#

i still don't understand

#

it's liked memoization or something

gray galleon
#

do they rewrite it to become non-left recursive
like```
sum: prod ('+' prod)*

rose schooner
#

the generated one

gray galleon
gray galleon
rose schooner
gray galleon
#

and if they are memoized the parser tries the alternatives instead
i think
idk how it works exactly

#

yeah that might work

deep nova
#

@gray galleon There is a trick which relies on packrat parsing

#

It calls itself repeatedly, caching the result each time, until a recursive call fails to consume any more tokens than the previous call

#

Something like that, I havn't had a chance to crack open the code yet

#

My understanding is that allowing left recursion has more to do with convenience and expressiveness than anything else. It comes at a price: linear performance is sacrificed. That said, I'm toying with a PEG which handles operators using Pratt parsing. This is normally where most left recursion occurs (I think). If that's the case, then the left recursion becomes one extra tool in the toolbox which shouldn't cost too much in terms of performance if used sparingly

flat gazelle
#

why do dataclasses use __post_init__ instead of just treating __init__ defined on the decorated class as __post_init__?

warm breach
#

the defined class init would run first

#

technically you can just rewrite the user __init__ but that would be lying, the user's __init__ isn't actually __init__

#

also type checkers now think you have a zero argument __init__ when in fact it is the one synthesized by dataclass

flat gazelle
warm breach
flat gazelle
warm breach
#

you can do no_init=True iirc for that

#

also I'm not sure if dataclass can easily tell if you defined an init yourself?

#

maybe it's possible to do an error for that

flat gazelle
#

'__init__' in vars(cls) works just fine for that

raven ridge
flat gazelle
#

why can I override generated methods without turning off their generation at all

warm breach
#

I suppose yeah, but imo it's very strange for __init__ to not be the real init

flat gazelle
#

though IG the whole composing idea doesn't work with things like __hash__, so that would be special

warm breach
#

a type checker now needs to figure out if you defined no_init=True to figure out if __init__ is real or not

raven ridge
flat gazelle
#

yeah, I just remember everyone that I have seen try dataclasses break a dataclass with defining an __init__

#

I think it's the most obvious way to do it and it should just work

warm breach
#

we already have enough special casing for dataclasses though, we need less of those not more

#

an ignored __init__ would be another thing type checkers need to handle

flat gazelle
warm breach
#

by inheritance you should always be able to override the superclass

#

type checkers still abide by that in following your init function and not the provided one

flat gazelle
#

dataclass will always need a ton of special casing, and making an API less obvious just to make tooling marginally easier seems silly to me. (you need to scroll down past dataclasses.MISSING and other useful to know about constants to even get to the __post_init__ docs, and most people will not scroll that far.)

#

IG it is easier in that you can pretend it is a superclass

raven ridge
flat gazelle
#

which seems like a somewhat silly way to implement a dataclass from a type checker standpoint

warm breach
raven ridge
#

I think it's just ease of use, really. If someone wants to define their own __repr__ for a dataclass, it seems weird to make them also call the dataclass decorator with repr=False. It's making them state their intent in two places, they have to say "don't provide a repr", and then say "use this for the repr". That seems weirder than having the dataclass infer "they've provided a repr, so I shouldn't generate my own."

flat gazelle
#

and in general, I really do not care about type checker complexity over making the language better.

#

that does make sense for everything but __init__ IG

#

which is why we have __post_init__

warm breach
#

like you're suggesting to make __post_init__ just __init__? but doesn't that clash with the provided one in name

flat gazelle
#
def new_init(...):
    ...
    self.__original_init__(...)
cls.__init__, cls.__original_init__ = new_init, cls.__init__
raven ridge
flat gazelle
#

seems fairly straightforward to me

warm breach
flat gazelle
#

yup, so that it gets the dataclass fields

warm breach
#

that seems a lot weirder than post init

flat gazelle
#

I am willing to bet there are vastly more dataclasses with a __post_init__than with a full replaced __init__

raven ridge
#

and then you've lost the ability to override __init__, like you can override every other dunder

warm breach
#

whether you define it yourself or in another method, you always define on thing for that function, __post_init__

#

not an __init__ that gets rewritten into __original_init__ that would need alternative naming on further subclass interactions

#

also __post_init__ makes it clear this runs after __init__

#

__original_init__ seems to imply it runs before

flat gazelle
#

well, the user of the @dataclass should never see the name __original_init__ at all.

raven ridge
#

sure, they don't have to, you can close over the original __init__ instead.

flat gazelle
#

but yeah, fair enough, it is a bit more magic than the current solution. But I feel like dataclasses are already mostly treated as a magic box by most people

raven ridge
#

my problem with it isn't just that it's magic, it's that it's magic that would work differently for one dunder than for every other.

#

it makes things much harder to reason about when they're inconsistent.

flat gazelle
#

I would argue that __post_init__ is already treating __init__ as a special thing

#

there is no __post_eq__

warm breach
#

it's an additional slot, not changing the behavior of an existing one

raven ridge
flat gazelle
#

hmm, fair enough. IG no matter what you will have to go to stack overflow to figure out how to actually add a custom init to a dataclass

warm breach
#

I think in those cases it's easier just to use a normal class tbh

flat gazelle
#

it isn't. If you have a dozen fields, but need to also e.g. register the instance to a pool or do some extra checks, a __post_init__ makes sense. That's why it exists

#

but you will never ever figure out it exists unless you go to stack overflow

raven ridge
#

yeah - it seems reasonable to me to want everything a dataclass generates except __init__

flat gazelle
#

again, I think the case where you fully override __init__ is much rarer than the case where you just want a __post_init__

#

and the less common case is the only one with an obvious solution

warm breach
flat gazelle
#

__original_init__ was just an example

#

the name is opaque in the same way as other dataclass internals

#

you shouldn't need to touch it

warm breach
#

right but you would now have to change all dataclass internal code to check whether __original_init__ is defined

#

vs. now where you can unconditionally call __post_init__ or __init__

#

__init__ is obviously always there due to object

flat gazelle
#

what do you mean? Post init is conditionally defined, __original_init__ is conditionally defined

raven ridge
warm breach
#

but honestly down that route we probably could have used some special syntax for dataclasses if we had a full do-over

flat gazelle
#

and yet multiple people expect it to not do that.

raven ridge
#

that confuses me, honestly.

#

I don't see why they'd expect overriding __init__ to do something different than overriding __hash__ - or overriding an __init__ inherited from a superclass.

flat gazelle
#

well, the thought process is:
here is a class
I want to do something when its initialized
lets add an __init__

raven ridge
#

and that works fine.

flat gazelle
#

and it breaks the dataclass

raven ridge
#

it doesn't break it - it just replaces its __init__

flat gazelle
#

which is the thing people use a dataclass for in the first place

raven ridge
#

now you're responsible for setting every attribute, just like you normally do in a __init__

flat gazelle
#

not having to type out the self.a=a

#

I think the __post_init__ is a clumsy solution

raven ridge
warm breach
#

just like how if text == "a" or "b" makes logical sense within the language not not intuitively

#

we could special case that as well, like C# does

#

though to be fair C# has || && so more leeway with what or does

flat gazelle
#

IG fair enough, not everything can be a nice API.

warm breach
#

but now is too small of a thing to change with the impact of possibly breaking all extended code

flat gazelle
#

yeah, of course it can't be changed now

#

but well, at least it's a point for every single 10 python tricks you didn't know video

warm breach
#

what if we just added a new keyword

struct Point:
    x: int
    y: int

    def __init__(self):
        ...
raven ridge
#

Consistency counts for an awful lot in language design

warm breach
#

speaking of, apparently ints will become mutable in some cases in 3.12+

fallen slateBOT
#

Objects/longobject.c lines 283 to 284

// Mutate in place if there are no other references the old
// object.  This avoids an allocation in a common case.```
flat gazelle
#

upon further review of similar features, __post_init__ is still better than most of what other languages offer

#

it's pretty much a raku TWEAK, and if raku couldn't come up with a better solution, IG it's really the best option

deep nova
#

How does one achieve left-associativity with an LL parser?

#

It seems as though one must resort to term := factor ('+' factor)* or some such

#

Is this the standard approach? Would this not require some kind of second pass?

flat gazelle
#

you more or less cannot, yeah

#

since you can't know how deep to nest the first operator without knowing how many are in a row

deep nova
#

So, what, you're screwed? XD

flat gazelle
#

you can reconstruct the parse tree with semantic actions IIRC

deep nova
#

As I'm reading, there seem to be three approaches:

Refactor the grammar (to eliminate left-associativity all together???)
Use Pratt or precedence climbing
Convert recursion to iteration and post-process

#

Or use left recursion hack available to packrat parsers, I guess

#

I'd prefer to start with the first option (I'll try them all out eventually) but I can't seem to find any instructions on how

radiant garden
#

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on the right). For instance,

    1
    +
    2
    +
    3
  ...
halcyon trail
#

Pythons dataclass is actually unusual in that it's also what's responsible for providing, let's say, reasonable init for structs that just hold data

#

Well, okay, hm, one of the other examples I had in mind fell through actually

#

But the other example was Kotlin. Kotlin has dataclasses as well but they're there for hash and equals. You get dataclass style init just by declaring fields a certain way. You can do the post init in init blocks in the class body

#

But I agree after further thought that not that many languages have a great solution here

flat gazelle
quaint copper
#

Hi! Does python expose internal lexing functions like the one parsing python strings? I am looking for something akin to ast.literal_parse but only restricted to parsing strings, erroring out on other literal types, and even maybe returning meta-information about the parsed string: quotes types, literal length.. I can't find this exposed within the std, is it?

#

I can also "cheat" by recognizing either prefix in ', ", ''', """, r', r", etc. and then feed the rest of my input character by character (or quote-separated chunk by quote-separated chunk) until ast.literal_parse eventually succeeds, but this feels rather clumsy and very inefficient.

flat gazelle
halcyon trail
still prism
#

Hi guys, for some reason I can't use chown on a mounted drive in Linux

warm breach
#

So there's sys.intern to intern strings, is there some way to unintern strings...?

raven ridge
#

isn't that just making a copy of a string?

#

oh - I guess you mean to remove it from the table of interned strings, even while it's still alive, so that the next time sys.intern is called on an identical string, it doesn't return the original one and instead inserts the new one in the table?

#

if that's what you mean, no, I don't think so.

warm breach
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | INTERNED_MORTAL
002 | dog
003 | Exception ignored deletion of interned string failed:
004 | KeyError: 'dog'
warm breach
#

even though apparently it's an "ignored" error? pithink

raven ridge
#

any exception during garbage collection needs to be ignored

#

garbage collection can happen at any time on any thread - there may not even by Python frames on the stack that could have an except: block to handle it

warm breach
#

!e

from einspect import view, unsafe

s = "test123"
v = view(s)

print(v.interned.name)

with unsafe():
    other = view("dog")
    other.interned = 0
    other.move_to(v)
    
print(s)
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | INTERNED_MORTAL
002 | dog
warm breach
#

just setting "dog"'s interned field to 0 or NOT_INTERNED results in no error

#

but I'm not sure what the implication of that is

#

I guess "dog"'s intern allocation is never freed?

pliant tusk
# warm breach So there's `sys.intern` to intern strings, is there some way to unintern strings...

!e you can get a reference to the interned dictionary and possibly free them from there. ```py
from fishhook.asm import *

@hook(pythonapi.PyDict_SetDefault, restype=py_object, argtypes=[py_object]*3)
def setdefault(self, key, value):
if key == 'MAGICVAL':
return self
return pythonapi.PyDict_SetDefault(self, key, value)

pythonapi.PyUnicode_InternFromString.restype = py_object
interned = pythonapi.PyUnicode_InternFromString(b'MAGICVAL')
setdefault.unhook()

print(interned)

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

{'type': 'type', 'AttributeError': 'AttributeError', '__qualname__': '__qualname__', 'obj': 'obj', 'update': 'update', '__dict__': '__dict__', 'getattr': 'getattr', 'setattr': 'setattr', 'hasattr': 'hasattr', '__doc__': '__doc__', '__name__': '__name__', '__module__': '__module__', 'replace': 'replace', 'old': 'old', 'new': 'new', 'sys': 'sys', 'name': 'name', '_DeadlockError': '_DeadlockError', 'waiters': 'waiters', 'count': 'count', 'owner': 'owner', 'wakeup': 'wakeup', 'lock': 'lock', 'allocate_lock': 'allocate_lock', '_thread': '_thread', 'self': 'self', 'add': 'add', 'get': 'get', '_blocking_on': '_blocking_on', 'set': 'set', 'get_ident': 'get_ident', 'seen': 'seen', 'tid': 'tid', 'me': 'me', 'release': 'release', 'acquire': 'acquire', 'has_deadlock': 'has_deadlock', 'RuntimeError': 'RuntimeError', 'id': 'id', 'format': 'format', '__repr__': '__repr__', '__init__': '__init__', '_ModuleLock': '_ModuleLock', '_DummyModuleLock': '_DummyModuleLock', '_lock': '_lock', '_name': '_name',
... (truncated - too long)

Full output: too long to upload

warm breach
#

how did you get a reference to the intern dict

#

I have been trying for the last hour

pliant tusk
#

i used fishhook.asm to hook the PyDict_SetDefault C function, then I called PyUnicode_InternFromString which calls PyDict_SetDefault(interned_dict, ...)

warm breach
#

ah hm

pliant tusk
# warm breach wait what

note that fishhook.asm only explicitly supports Intel x86 assembly right now, but i am looking into (ab)using ctypes internal cffi to perform hooks

warm breach
#

so it can hook the c functions?

pliant tusk
#

*yea, but the catch is that sometimes, the functions are inlined, and thus cannot be hooked

#

although if you wrote a patchfinder, you could use it on arbitrary addresses, and pass in that

feral island
#

or referrers, always forget which is which

pliant tusk
#

!e ```py
import gc
print([o for o in gc.get_referrers('abc') if type(o) == dict and o.get('abc') == 'abc'])

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

[]
warm breach
#

get_referrers doesn't seem to include the intern dict

fallen slateBOT
#

Include/internal/pycore_interp.h line 186

struct _Py_interp_cached_objects cached_objects;```
warm breach
#

which is technically accessible via the stable api PyInterpreterState_Get

#

but that struct is massive

pliant tusk
warm breach
pliant tusk
#

yea

#

!e ```py
import fishhook.asm
import inspect

print(inspect.getsource(fishhook.asm.addr))```

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | def addr(cfunc):
002 |     ptr = c_void_p.from_address(addressof(cfunc))
003 |     return ptr.value
warm breach
#

hm pithink

#

what exactly is at that address though

pliant tusk
fallen slateBOT
#

fishhook/asm.py lines 49 to 52

n_ptr = addr(injected)
offset = n_ptr - o_ptr - 5
jmp = b'\xe9' + (offset & ((1 << 32) - 1)).to_bytes(4, ENDIAN)
mem[:] = jmp```
warm breach
#

hm firHmm

pliant tusk
#

ironically, i originally wrote this code to get the interned dictionary

warm breach
#

theoretically just defining the PyInterpreterState struct and calling the stable api would be much safer

warm breach
pliant tusk
warm breach
#

hm? It's a stable pythonapi

feral island
#

the memory location at which it exists isn't stable

pliant tusk
#

there are a lot of things defined as a macro and as a function

feral island
#

haven't checked but probably the is keyword just does a pointer comparison straight in ceval.c, not a function call

#

yes ``` case TARGET(IS_OP): {
PyObject *right = POP();
PyObject *left = TOP();
int res = (left == right)^oparg;
PyObject *b = res ? Py_True : Py_False;

pliant tusk
#

and it would be unstable as hell

warm breach
pliant tusk
#

it works on 64bit

#

using it on my 64bit mac rn

warm breach
#

I ran that in 3.11.0 and it segfaults pithink

pliant tusk
#

what cpu do you have?

warm breach
#

64bit ubuntu

pliant tusk
warm breach
#

AMD64

pliant tusk
#

ah

#

it only supports Intelx86 rn

warm breach
#

or, well, platform.processor() says x86_64

pliant tusk
#

I am working on adding support for more

warm breach
#

but I'm on the same python binary right pithink

#

the jump instruction is different?

feral island
#

binaries and instructions are completely different on different architectures

pliant tusk
#

not python opcodes

#

@warm breach can you compile the following?
code.asm

jmp $+0xff
``` with `nasm code.asm -o code` on your system? then send me the output file?
pliant tusk
#

interesting

#

it should have worked

#

was it a segfault or a bus error?

#

also when did the crash happen, did it happen after defining the hook or when the hook was triggered?

warm breach
pliant tusk
#

ok so it is crashing when the hook is triggered

#

can you open up fishhook/asm.py and add a print for offset and lmk what the offset is

warm breach
#
Fatal Python error: Segmentation fault

Current thread 0x00007f7773827280 (most recent call first):
  File "/home/ionite/repos/python/hook_test.py", line 10 in <module>
fish: Job 1, 'PYTHONDEVMODE=1 python3 hook_te…' terminated by signal SIGSEGV (Address boundary error)
warm breach
#

it was something like

45659537313787
45932713041915
#
        offset = n_ptr - o_ptr - 5
+       print(offset)
pliant tusk
#

it printed twice?

warm breach
#

no it's just different each time

pliant tusk
#

yea thats normal. But your allocations are way far apart

#

so my relative jump isnt far enough

#

maybe I should do PUSH (absolute address) + RET

#

basically, the opcode I use e9 is a relative short jump

#

so in order for it to work, the offset needs to fit into 32bits (4 bytes) or less

warm breach
#

ah hm pithink

pliant tusk
#

!e py print(hex(45659537313787))

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

0x2986f0808ffb
feral island
pliant tusk
#

this is more than 4 bytes, so it cuts it short, and jumps to the wrong location

pliant tusk
#

and my mac should have the same crash

#

probably a quirk with ubuntu's allocator vs macos

warm breach
#

it crashes on colab as well it seems (ubuntu intel)

#

though that might be related to jupyter

pliant tusk
#

replacing the relative jump with a PUSH DWORD addr; RET; would maybe work? i think

pliant tusk
#

although it works on the bot here so idk. regardless, the crash is due to offset being larger than 4 bytes, so i need to change my jump strategy

warm breach
#

petition to add PyUnicode_GetInternDict 😔

gray galleon
#

will python have destructuring parameters

#
def f(a, [b, c], d):
sacred yew
gray galleon
pliant tusk
warm breach
#

apparently just swapping 2 objects is quite complex in 3.12 😔

#

most object instance dicts are "VM" managed now apparently

pliant tusk
fallen slateBOT
#

src/einspect/views/view_base.py lines 320 to 321

def swap(self, other):
    """Swaps data at other Viewable with this View."""```
warm breach
#

seems to work for 3.8-3.12

#

not sure if I'm missing any logic pithink

fallen slateBOT
#

src/einspect/views/view_base.py line 341

buf = ctypes.create_string_buffer(other.mem_allocated)```
warm breach
#

is it only dropped after the function scope ends?

pliant tusk
warm breach
#

this is so cursed 🥴

from einspect import view

t = (1, 2, 3, 4, 5)

view(t).swap(print)

(1, 2, 3, 4, 5)(print)
# (1, 2, 3, 4, 5)
modest dragon
#

lovely new theme

#

looks very appealing

dusk comet
#

Wolfram Mathematica have several ways to call a function:

  1. f[x, y]
    If function takes only one argument you can also do that:
  2. f @ x
  3. x // f

This might be funny to implement in python

warm breach
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | > 100 <
002 | > 125 <
pliant tusk
#

!e ```py
from fishhook import *

@hook(type(lambda:0), name='matmul')
@hook(type(lambda:0), name='rfloordiv')
def func(self, other):
return self(other)

def show(x):
print('>', x, '<')

show @ 100
125 // show```

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | > 100 <
002 | > 125 <
warm breach
pliant tusk
wind helm
#

Morning guys
Quick question I hope. The inspect.getcallargs method is marked as deprecated, but in fact is very useful when you have the function reference and you want to verify whether arguments are compatible.

The guidelines suggest to use the bind or bind_partial but they implementation is different.

What could be the best workaround?
Using the functools.partialmethod?

wind helm
#

Though, unless I'm misreading, I think I can still create a signature from a callable and then invoke the two methods

gray galleon
#

is there a way to create signature object out of thin air like in the raku language?
won’t be of any use, just curious

flat gazelle
#

!e
you can just call the constructor afaict, e.g.

import inspect
print(inspect.Signature([inspect.Parameter('a', inspect.Parameter.POSITIONAL_ONLY)]))
fallen slateBOT
#

@flat gazelle :white_check_mark: Your 3.11 eval job has completed with return code 0.

(a, /)
gray galleon
#

seems verbose but whatever

flat gazelle
#

you won't get the nice raku syntax, yeah

gray galleon
#

the hard part is creating a function from that signature

flat gazelle
empty echo
deep nova
#

Quick question

#

Python allows numerous forms of escape character

#

By name, by four digit unicode, by eight digit unicode, by two digit hex, or by two digit octal

#

Are all of these strictly necessary? I'm starting into the real meat of my lexer now, and I'm wondering what I should take pains to support

spark magnet
deep nova
#

They are a part of python

#

I'm working on building my own language. I can define whatever escape behaviour I please

spark magnet
deep nova
#

Which is why I'm here. I'm wondering if some of the escape formats in Python are vestigial or else largely irrelevant. I'm just seeking context as to the design choices Python has made so I might decide how much to emulate

spark magnet
#

octal is definitely one to skip

deep nova
#

The only times I've ever used escape sequences, other than simple things like \n or \t

#

Has been with unicode codepoints — \uXXXX

#

So I guess I'll start there 🙂

spark magnet
#

good plan

feral island
#

I find the \N named literals useful for when you're dealing with unusual unicode characters

#

but these are definitely things you can add later

deep nova
#

Indeed. I just want to be forward thinking is all 🙂

raven ridge
fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

🙂
raven ridge
#

those are so fun 🙂

deep nova
#

Boss

feral island
#

though probably a pain to put in your toy language because you need a unicode database at compile time 🙂

deep nova
#

Unicode is non-negotiable

#

Long term, anyway

raven ridge
#

you can represent every possible character using either the 8-character \U or the \N, so from that point of view, if what you care about is the minimum that's necessary, you can get by with only either one of those.

warm breach
raven ridge
#

this is about string literals

deep nova
#

😮

#

Shame

warm breach
#

we have octal unicode literals...?

raven ridge
#

The only octal escape I use in strings with any particular regularity is \0

deep nova
spark magnet
#

"\012" == "\n"

raven ridge
#

Other than \0, the octal escapes are significantly less recognizable than the single-letter escapes like \n or the hex escapes

#

like, other than \0, I don't think there's another octal escape that readers will be able to parse more easily than if it was written in another way.

#

and even \0 is annoying if you're trying to use it in a string that should also contain ASCII digits

#

since if you're trying to immediately follow the \0 with digits you need to pad it out to \000

deep nova
#

Honestly, I thought that \0 was just the null terminator

raven ridge
#

so if I were greenfielding a language, I'd probably treat \0 as a special case that's equivalent to \U00000000, and then not accept any other octal escape.

deep nova
#

Was in a category of its own

#

Rockin

raven ridge
#

at least, it's not in Python or C. There might be languages where it is. 🤷‍♂️

deep nova
#

What is the null terminator in python?

raven ridge
#

it is \0 - it's just that \0 isn't its own special category, it is an octal escape for U+0000

deep nova
#

Ahh

raven ridge
#

!e ```py
print("\U00000000" is "\0")

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | <string>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
002 | True
raven ridge
#

!e more interestingly, actually: print(repr("\0000"))

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

'\x000'
raven ridge
#

the input string is \000 followed by 0, the repr is \x00 followed by 0 - that shows that \0 is just a special case for how to spell \000

#

with the octal escapes, you can give up to 3 digits, and you must give 3 digits if the thing immediately after the escape sequence is a digit that shouldn't be interpreted as part of the escape.

spark magnet
raven ridge
#

right - I meant that the character that is used as a null terminator in other languages can be written as \0 or \x00 or \U00000000 in Python. That may not have been clear.

#

it's also extremely rare to need to refer to that character in a Unicode string, though it's quite common to need to refer to it in a byte string.

fallen slateBOT
#

Objects/unicodeobject.c line 1233

new_size = sizeof(Py_UNICODE) * ((size_t)length + 1);```
`Objects/unicodeobject.c` lines 1261 to 1262
```c
_PyUnicode_WSTR(unicode)[0] = 0;
_PyUnicode_WSTR(unicode)[length] = 0;```
fallen slateBOT
#

Include/cpython/unicodeobject.h line 205

wchar_t *wstr;              /* wchar_t representation (null-terminated) */```
`Include/cpython/unicodeobject.h` line 215
```h
char *utf8;                 /* UTF-8 representation (null-terminated) */```
spark magnet
raven ridge
#

yeah, it is for CPython, but that's an implementation detail.

frank cradle
grave jolt
dull ember
#

hey

winged dagger
#

!d vectorize

fallen slateBOT
#

The @vectorize decorator

Numba’s vectorize allows Python functions taking scalar input arguments to be used as NumPy ufuncs. Creating a traditional NumPy ufunc is not the most straightforward process and involves writing some C code. Numba makes this easy. Using the vectorize() decorator, Numba can compile a pure Python function into a ufunc that operates over NumPy arrays as fast as traditional ufuncs written in C.

Using vectorize(), you write your function as operating over input scalars, rather than arrays. Numba will generate the surrounding loop (or kernel) allowing efficient iteration over the actual inputs.

The vectorize() decorator has two modes of operation:

frigid bison
#

I'm trying to use the exec statement in Python 2.7 for running a code object, but I'm getting this weird error that doesn't give any results on google.

exec obj in {}```

assert_builtin() takes exactly 1 argument (0 given)```

#

any idea what it can be?

feral island
#

though the bigger question is, why are you running Python 2.7

frigid bison
#

reverse engineering purposes which forces me to use the target's python version

#

so I have to rewrite my pyarmor unpacker 🙄

frigid bison
feral island
#

maybe the code object you are running is for assert_builtin and you're not passing the args

frigid bison
#

ah I have found the issue, it's Pyarmor specific

#

thanks, I thought it was a Python error and not a library error

boreal umbra
#

How difficult is it to reverse obfuscate pyarmored code, keeping sensible variable names aside?

frigid bison
#

it's pretty easy since it's sort of a packer. It doesn't actually change the python bytecode, it just encrypts it and decrypts it at runtime.

deep nova
#

Quick question

#

Does python's lexer treat -123 as a single token, or is it lexed as an operator and a numeric literal?

sacred yew
deep nova
#

Wonderful!

#

Thanks

warm breach
#

!e

from fishhook import hook

@hook(list | str)
def __neg__(self):
    return self
fallen slateBOT
#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 3, in <module>
003 |   File "/snekbox/user_base/lib/python3.11/site-packages/fishhook/fishhook.py", line 295, in wrapper
004 |     orig_val = vars(cls).get(name, NULL)
005 |                ^^^^^^^^^
006 | TypeError: vars() argument must have __dict__ attribute
warm breach
#

@pliant tusk this segfaults on 3.10 somehow 🥴

#

did vars somehow work on unions in 3.10?

quick snow
#

!e print(vars(list | str))

fallen slateBOT
#

@quick snow :x: Your 3.10 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 1, in <module>
003 | TypeError: vars() argument must have __dict__ attribute
quick snow
#

nope

pliant tusk
#

But ig I need to add protection from it

#

Or support

warm breach
#

yeah I'm aware just wondering why 3.10 segfaults pithink

#

it seems the vars would stop it

pliant tusk
#

Oh I have no idea

#

I'll take a look later

warm breach
# pliant tusk Or support

I'm gonna use this I think

def _to_types(
    types_or_unions: Sequence[type | UnionType],
) -> Generator[type, None, None]:
    """Yields types from a Sequence of types or unions."""
    for t in types_or_unions:
        if isinstance(t, UnionType):
            yield from _to_types(get_args(t))
        elif isinstance(t, type):
            yield t
        else:
            raise TypeError(f"cls must be a type or Union, not {t.__class__.__name__}")
#

you might want to use __args__ instead of typing.get_args if you don't want to rely on typing code breaking due to hooks though

pliant tusk
#

I'll figure something out. Need to retain my 3.8 support

warm breach
#

well you could support *classes which can be types or unions

#

so @hook(str, int) or @hook(str | int)

#

technically 3.8 can also do @hook(Union[str, int]) (but that's pretty awkward)

pliant tusk
warm breach
pliant tusk
warm breach
pliant tusk
#

Yea true

warm breach
# pliant tusk Yea true

!e also apparently allocating PySequenceMethods for __getitem__ has an internal type check for int index?

from functools import partial
from einspect import impl

@impl(type)
def __getitem__(self, item):
    return partial(self, item)

print(map[int])
fallen slateBOT
#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 8, in <module>
003 | TypeError: sequence index must be integer, not 'type'
feral island
pliant tusk
#

Need to use mp_item

warm breach
#

I guess I'll add a flag to choose between sequence or mapping

pliant tusk
#

Fishhook just allocates all of the structs it needs to on first run

#

Less checks needed

#

And also removes the danger from subclasses

fallen slateBOT
#

src/einspect/structs/slots_map.py lines 136 to 141

SLOTS_MAPPING: Final[dict[str, SlotsLike]] = {
    "__len__": tp_as_mapping["mp_length"],
    "__getitem__": tp_as_mapping["mp_subscript"],
    "__setitem__": tp_as_mapping["mp_ass_subscript"],
    "__delitem__": tp_as_mapping["mp_ass_subscript"],
}```
`src/einspect/structs/slots_map.py` lines 124 to 128
```py
SLOTS_SEQUENCE: Final[dict[str, SlotsLike]] = {
    "__len__": tp_as_sequence["sq_length"],
    "__add__": tp_as_sequence["sq_concat"],
    "__mul__": tp_as_sequence["sq_repeat"],
    "__getitem__": tp_as_sequence["sq_item"],```
warm breach
#

__getitem__ is in both so it finds the sequence one first

pliant tusk
#

Do you handle subclasses safely in einspect

warm breach
#

I guess you might want to not allocate everything?

#

not sure, I'll add a alloc="all" as well probably

pliant tusk
warm breach
pliant tusk
#

And when you start adding hooks different places in python start to make assumptions

pliant tusk
warm breach
#

!e

from einspect import impl

@impl(int)
def abc(self):
    return self

print((5).abc())
print(True.abc())
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 5
002 | True
warm breach
#

I don't think I did anything special

pliant tusk
#

!e ```py
from fishhook import *

@hook(int)
def getitem(self, idx):
return (self, idx)

print(True[0])```

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | <string>:7: SyntaxWarning: 'bool' object is not subscriptable; perhaps you missed a comma?
002 | (True, 0)
warm breach
#

oh hm

pliant tusk
#

!e ```py
from einspect import impl

@impl(int)
def getitem(self, idx):
return (self, idx)

print(True[0])```

fallen slateBOT
#

@pliant tusk :x: Your 3.11 eval job has completed with return code 1.

001 | <string>:7: SyntaxWarning: 'bool' object is not subscriptable; perhaps you missed a comma?
002 | Traceback (most recent call last):
003 |   File "<string>", line 7, in <module>
004 | TypeError: 'bool' object is not subscriptable
warm breach
#

does that work by allocating subclass PyMethods as well

pliant tusk
warm breach
#

it setattrs after making it mutable

#

unless it's a static attribute

pliant tusk
warm breach
#

like __name__ will manually set tp_name

pliant tusk
#

you need to alloc the correct struct for all subclasses of a given type before setattr

warm breach
#

yeah the alloc is before setattr

pliant tusk
#
def allocate_structs(cls):
    cls_mem = getmem(cls)
    for subcls in type(cls).__subclasses__(cls):
        allocate_structs(subcls)
    for offset, size in get_structs():
        cls_mem[offset] = cls_mem[offset] or alloc(size)
    return cls_mem
``` you need to do if for subclasses too
fallen slateBOT
#

src/einspect/views/view_type.py lines 112 to 118

# Check if this is a slots attr
if slot := get_slot(k):
    # Allocate sub-struct if needed
    self._try_alloc(slot)

with self.as_mutable():
    self._pyobject.setattr_safe(k, value)```
pliant tusk
warm breach
pliant tusk
warm breach
#

speaking of, is there a way to check if a type is a heaptype

#

is the HEAPTYPE flag still valid

#

or did that switch to IMMUTABLETYPE

pliant tusk
#

HEAPTYPE tells you the structure type

#

IMMUTABLETYPE controls whether it allows setattr

warm breach
#

I guess only allocating non-HEAPTYPE subclasses would be fine?

pliant tusk
#

also you will need to add in some special handling for object

#

due to some assumptions made by the type constructor

warm breach
#

what does object do 👀

pliant tusk
#

internally when the type constructor walks backwards there are assumptions made about what a given struct having all of the tp_as_* structs allocated means

#

so much research went into making fishhook as close to stable as possible

warm breach
#

!e

from fishhook import hook, orig

@hook(int)
def __new__(self, *args):
    print("new int", args)
    return orig(int, *args)

int("2000")
fallen slateBOT
#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | new int ('2000',)
002 | Traceback (most recent call last):
003 |   File "<string>", line 8, in <module>
004 |   File "<string>", line 6, in __new__
005 |   File "/snekbox/user_base/lib/python3.11/site-packages/fishhook/fishhook.py", line 246, in __call__
006 |     return get_cache_trace('orig', getframe(1))(*args, **kwargs)
007 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
008 | TypeError: int.__new__(int) is not safe, use object.__new__()
warm breach
#

is there a way to call the original new here

#

could return object.__new__(self) like it suggests but the int will be 0

pliant tusk
#

Not without rewrapping the original pointer

#

And hooking new is very unsafe for any type and will fail in weird ways if the end user changes it even a little bit so I didn't bother

pliant tusk
warm breach
#

ah hm

#

apparently int.__new__ has the job of allocating its own array

#

I guess that's why it's not safe?

pliant tusk
#

Yea

warm breach
#

but you can still call int.__new__ normally no?

#

!e print(int.__new__(int, "123"))

fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

123
pliant tusk
#

new has weird handling

#

!e print(vars(int).new)

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

<built-in method __new__ of type object at 0x7fabe4a87760>
grave jolt
#

Anyone knows where I can find a list of all dependencies required to build CPython? I don't want to install them, just curious

fallen slateBOT
#

Objects/typeobject.c lines 7152 to 7161

/* If staticbase is NULL now, it is a really weird type.
   In the spirit of backwards compatibility (?), just shut up. */
if (staticbase && staticbase->tp_new != type->tp_new) {
    PyErr_Format(PyExc_TypeError,
                 "%s.__new__(%s) is not safe, use %s.__new__()",
                 type->tp_name,
                 subtype->tp_name,
                 staticbase->tp_name);
    return NULL;
}```
warm breach
#

due to the hooking staticbase->tp_new != type->tp_new so that triggered this

grave jolt
#

it doesn't list them

#

I mean stuff like... zlib

#

but that's the only one given

warm breach
#

I'm not sure if there's a expanded list of pure deps

grave jolt
#

like curses and sqlite

warm breach
grave jolt
#

yeah I suppose

#

like... in a Python project you have a pyproject.toml or a requirements.txt listing the requirements

#

I don't know what's the analogue for C projects

warm breach
#

so it's dynamic based on a lot of logic

#

it essentially makes a best attempt at seeing if it can build cpython with what you've got

warm breach
# pliant tusk yea

have this working now

from einspect import impl, orig

@impl(int)
def __new__(cls, *args):
    print("in new:", cls, args)
    return orig(int).__new__(cls, *args) + 100

print(int("50"))
# in new: <class 'int'> ('50',)
# 150
fallen slateBOT
#

src/einspect/structs/py_type.py lines 231 to 232

def __call__(self, *args: tuple, **kwds: dict):
    """Implements `​tp_new_wrapper`​ with a modified safety check."""```
pliant tusk
#

That'll do it

warm breach
#

The original check claims to check that the most derived base that's not a heap type is this type, so I don't see why it even checks that tp_new is equal

fallen slateBOT
#

src/einspect/structs/py_type.py line 260

if staticbase and staticbase[0] != PyTypeObject.from_object(self._type):```
warm breach
#

I've just modified it to actually do the base type check it talked about

pliant tusk
#

A lot of the internals of types function have a lot of assumptions

pliant tusk
fallen slateBOT
#

src/einspect/structs/include/object_h.py line 85

newfunc = PYFUNCTYPE(py_object, py_object, py_object, py_object)```
pliant tusk
#

Ah that's dangerous. Try raising an exception in a hooked new

warm breach
#

how impl works is unchanged

pliant tusk
#

Ah

warm breach
#

the original error happens between the slot wrapper of __new__, which performs some sanity checks before letting you call tp_new

pliant tusk
#

I'll probably use a different strategy for fix that hook. I don't want to hard code a specific dunder

pliant tusk
warm breach
# pliant tusk What happens if you hook `int.__new__` to just print then return orig and pass ...
from einspect import impl, orig

@impl(int)
def __new__(cls, *args):
    print("in new:", cls, args)
    return orig(cls).__new__(cls, *args)

print(int("invalid"))
in new: <class 'int'> ('invalid',)
Traceback (most recent call last):
  File "main.py", line 8, in <module>
    print(int("invalid"))
          ^^^^^^^^^^^^^^
  File "main.py", line 6, in __new__
    return orig(cls).__new__(cls, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "einspect/structs/py_type.py", line 271, in __call__
    return self._tp_new(subtype, args, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'invalid'

Process finished with exit code 1
pliant tusk
#

And self._tp_new is a PYFUNCTYPE wrapper?

warm breach
#

it's PyTypeObject(int).tp_new accessed before the hook, and then casted to it's own type (PYFUNCTYPE)

#

since without the cast it will still be the same pointer attached to the ctypes Structure

pliant tusk
#

Hmm

#

I thought it might fail due to the Ignored Exception ctypes thing, guess not

warm breach
# pliant tusk Hmm

when fishhook hooks a method, does orig always resolve to the original hooked type or a subtype when called?

pliant tusk
#

orig walks up the chain by one.

#

That's how nested hooks work

warm breach
#

currently this is an infinite loop, since Foo was never hooked and does not have a cached __new__, so orig(Foo).__new__ returns itself

from einspect import impl, orig

@impl(object)
def __new__(cls, *args, **kwargs):
    print("in new:", cls, args)
    return orig(cls).__new__(cls, *args, **kwargs)

class Foo:
    ...

print(Foo())
#

not sure how best to fix this

pliant tusk
#

!e ```py
from fishhook import *

@hook(int)
@hook(int)
def add(self, other):
print(self, other)
return orig(self, other)

x = 1
1 + x ```

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 1 1
002 | 1 1
pliant tusk
#

try hooking object getitem then make a new class. (AFAIK that's one trigger of the issue)

pliant tusk
warm breach
warm breach
pliant tusk
#

If I remember correctly, any hook on object will cause that issue without the patch

warm breach
#

well segfaults if I interrupt

#

otherwise keeps going

pliant tusk
#

Try like matmul

#

That should also trigger it

warm breach
pliant tusk
#

Yea

warm breach
#

don't actually have to set the attr

pliant tusk
#

That's the same issue I ran into with fishhook

warm breach
#

does your patch fix that?

pliant tusk
#

Yea

#

It patches object and inserts a fake class at the top of the inheritance chain

warm breach
#

do you know where in the source code this happens

#

I assume something loops until the base type and then checks whether pymethods are null?

pliant tusk
#

I have it written down somewhere, I'll look for it when I get back on my laptop

pliant tusk
warm breach
#

pithink how does the pymethod alloc affect that

pliant tusk
#

Once it finds that class it has some assumptions about it

#

I need to find the source for it, I'll ping when I find my notes

grave jolt
warm breach
#

I'm currently trying to somehow get CLion's debugger to recognize cpython as a project

#

since it doesn't use make on windows...?

warm breach
#

just 26k lines, nothing to see here

grave jolt
#

what a terrible day to have brain

pliant tusk
#

@warm breach this is where the bug happens

#

It gets base->tb_base

#

On object that is NULL

warm breach
#

oh

#

because tp_as_number is defined

pliant tusk
#

Yea

warm breach
#

oh actually theres one for everything

pliant tusk
#

So my patch adds a base for object that has no tp_as_ structs defined

flat gazelle
warm breach
#

how bad would setting object->tp_base to itself be 👀

pliant tusk
#

Recursion bug

#

I tried that lol

pliant tusk
#

But you really should try to build yourself, otherwise it will not link up to the files since it won't have debug symbols

warm breach
#

I've attached to the executable I built but it drops into disassembly instead of source :(

pliant tusk
#

Yea you need to be running a version of python built with debug symbols

warm breach
#

I think I built it with -d?

#

clion still says project not configured

pliant tusk
#

./configure with no args is debug build. Then make -j4 will build using 4 threads

pliant tusk
warm breach
pliant tusk
#

And then running the python.exe that builds?

warm breach
#

yeah

fallen slateBOT
#

PCbuild/readme.txt line 38

using this configuration have "_d" added to their name:```
warm breach
#

yeah

pliant tusk
#

Hmm

#

And you have the cpython project open in clion?

warm breach
#

if I do a ctypes.string_at(0) it drops into assembly pithink

pliant tusk
#

Can you step up frames to get to c files?

#

Tbh I have always used my MacBook for debugging builds so I haven't ever dealt with windows weirdness

#

Is it possible that clion cannot parse the debug symbols in the windows build?

warm breach
#

maybe I should use vs

#

apparently they have some combined python / c debug feature pithink

pliant tusk
#

Woah that looks like I should start using VS

warm breach
#

this is really cool

pliant tusk
#

visual studio or visual studio code?

warm breach
#

can be anywhere as well

#

if I make a breakpoint in PyList_New and make a list literal

#

it can break into that

pliant tusk
#

oh nice

#

thats what my clion workflow is like

deep nova
#

Could someone explain to me, in human terms, how Python handles "logical" and "physical" lines?

#

Like, does python perform a split operation at newlines when it starts lexing, and joins lines according to rules as they are lexed?

#

Or, does it separate as it lexes? Or, is the distinction between a logical line and physical line simply an explanatory tool?

feral island
#

the tokenizer generates INDENT and DEDENT tokens

rose schooner
#

the python tokenizer returns a NEWLINE token representing a single \n

raven ridge
#

seems like just an explanatory tool to me. At the grammar level, there's just statements. Sometimes a newline ends a statement, sometimes it doesn't.

feral island
#

the tokenizer also keeps track of bracket nesting, so within brackets newline tokens aren't treated as statement separators

deep nova
#

XD I've read through the parts on physical/logical lines many times, and honestly, it just confuses me more and more as I learn more about the job of a lexer

#

If it was me, I'd strip it from the docs and instead include a section on python's INDENT and DEDENT handling machinery

rose schooner
deep nova
#

Is there any reason that Python offloads this work to the lexer? It seems like a job for the parser, or even the semantics machine

#

One of those "it seemed like a good idea at the time" type things, perhaps?

feral island
#

it's probably a lot easier this way. I'm not sure what you even mean by "the semantics machine" here

deep nova
#

Semantic Analyzer

#

But I have trouble spelling "Analyzer"

feral island
#

sure, but we're talking about determining how to split the program into statements, right?

deep nova
#

Well, at the level of lexing we aren't even doing that

#

We're just trying to figure out where one thing ends and the next starts, whatever those things might be

raven ridge
#

at the level of lexing, we're breaking down the input into token streams which can be fed to the parser. That token stream needs to include indent somehow: leading whitespace is semantically significant in Python, so if the token stream didn't include it, the parser wouldn't have enough information to operate

#

every syntactically significant feature of the language needs to somehow be preserved by the lexer

deep nova
#

True, but actually counting and enforcing indentation in the lexer is quite different from recognizing leading whitespace. The latter just means recognizing a newline followed by some number of whitespace or tab characters. You could then poop out a "linebreak" token

raven ridge
# deep nova True, but actually counting and enforcing indentation in the lexer is quite diff...

it's not enough to just spit out a "linebreak" token, you also need to spit out a token that indicates how indented the next line is. py def func(): if something: foo() bar() means something different than ```py
def func():
if something:
foo()
bar()

So either you need to emit tokens for all whitespace, or at least for leading whitespace - and the token for whitespace would need to indicate what whitespace it was, in order for the parser to be able to detect increases or decreases in depth, as well as errors resulting from mismatched whitespace (like starting one line with 8 spaces and the next line with a tab)
deep nova
#

Seems doable.

#

This isn't me dumping on Python's strategy, btw

#

I'm just trying to think it all through

#

I'm conflicted about adding such functionality to my own lexer. On the one hand, the python approach works quite well, is simple, and is intuitive. On the other, I'm hesitant to employ a machine whose job is to recognize but not interpret for the purposes of interpreting

feral island
#

my intuition is that Python's approach is quite a bit simpler than what you are suggesting. I'm sure other approaches can be made to work though.

#

in your approach the parser would need to deal with whitespace virtually everywhere in expressions

deep nova
#

Not necessarily

#

On recognizing a newline, the lexer could be triggered to consume whitespace and tab characters, emitting a token for each until a non-whitespace character is encountered

#

Or, a composite token containing both the newline and the leading whitespace could be generated. At a later step, the semantic analyzer could simpler look at each newline token's literal to count what's there

#

I'm also toying around with lazy lexing such that the lexer accepts contextual cues from the parser. I'm hoping it'll make life easier for parsing fstrings. I could always leverage that to toggle on/off whitespacing behaviour

flat gazelle
#

Be careful that if you do that, you now don't have a context free grammar. It is generally easier to slightly extend the Lexer to deal with indentation over the (much more complex) parser

deep nova
#

Well, its 6-of-1

#

Half a dozen of the other, if that's true. I can either add state to my lexer at which points the language it represents is no longer regular, or otherwise sully the context-freedom of the parser

#

Though, I'm curious — why would this make the parser not context free?

#

Oh — because instead of a purely functional flowing from state to state within the grammar/parser, I'm making choices based on the contents of the token

flat gazelle
#

Because you need to remember the indentation state, and that's not possible in CFG. You can't somehow "interleave" the various indentation levels with the actual grammar, nor can you pass the indentation to e.g. the pass statement.

deep nova
#

Hmmm

#

I mean, my instincts tell me that the right place to handle indentation rules is actually in the semantic analysis. That's a contextual domain anyway.

That said, I'm just spitballing. Python's approach seems quite effective

flat gazelle
#

If you have context, it should be possible afaik, but I never tried.

#

Semantic analysis Generally happens after parsing

#

So then the question is what good is your parsing if it can't even tell you what's in a while loop

#

You could parse each statement separately and reconstruct blocks afterwards probably. Which would be interesting

deep nova
#

I'm honestly not sure its even that complicated. Specific parser functions can be configured to automatically ignore whitespace tokens, others can be configured to accept it (I'm thinking decorators, here)

#

If anything, having more a detailed token stream would make accurate parsing even easier.

#

Don't listen to me though — I'm still very, very new at this. All I can I say is that I intend to experiment the hell out of this

flat gazelle
#

It probably isn't all things considered. You will give up parser generators, but hand rolling a parser isn't too bad.

deep nova
#

Give up parser generators?

flat gazelle
#

A parser generator is unlikely to cope with your switching between whitespace and non-white space tokens

gritty glacier
#

how exactly does the _Printer class in cpython work
So i wanna overwrite or replace the data and filenames attributes in the _Printer class in cpython but when i try to change them nothing happens.. am i doing it wrong
license._Printer__filenames = ["/test"] this should print the conents in the test file when license is called right? https://github.com/python/cpython/blob/main/Lib/_sitebuiltins.py

GitHub

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

gritty glacier
#

oh

dusk comet
#

license.__init__(...)

dusk comet
gritty glacier
dusk comet
#

Hmm, indeed

#

Are you sure that your files exists?

gritty glacier
#

yes im pretty sure i mean you can try too in a python interpreter xD its nothing serious jsut wondering why is not working i

#

i initially assumed its because of the Oserror

#

its just passes if it hits it

#

but it exists

#

the file

#

okay nvm mb

lone sun
# deep nova I mean, my instincts tell me that the right place to handle indentation rules is...

I think your instincts are misleading you on this one. The indentation rules have a semantic effect, but so does everything else in the language; it does not follow that everything should be done during the semantic analysis. I think a good rule to follow is: What do you want to do when you encounter invalid input? Suppose that someone asks you to parse:

def factorial(x):
    if x == 0:
        return 1
     else:
        return x * factorial(x-1)

Notice the incorrect indentation of the else. When the interpreter is given this input, it fails with an IndentationError immediately upon encountering the line with the else. (You can observe this by cut-and-pasting into the REPL.) It doesn't wait for the function to be completely defined.

#

I think you might appreciate Python's strategy more if you look at it as a two-pass grammar. The first pass is called "lexical analysis", but while lexical analysis is conventionally described using regular expressions and finite automata, in Python it is a much more powerful step. It's a fairly simple grammar (as these things go) but it needs to remember the amount and type of indentation from line to line (and also, as discussed earlier, lexing behaves differently depending on whether we're inside an f-string). The output of Python's lexer (like all lexers) is a stream of tokens. Normally we think of this stream as being immediately consumed by the next grammar (the one conventionally called "the grammar"), but if you wanted, it could be serialized, written to a file, and read back in at a later time.

craggy ravine
#

Hi all, I had a few questions regarding the internals of the Python shell (REPL). Is there a good article, book etc. I can consult? My questions are based around compilation in the shell.

  1. What does the Python shell (REPL) use under the hood for compilation (parsing, generating byte code)?
  2. Is byte code even generated or are expressions + other items evaluated from the AST?
  3. How is context maintained? As in, if I declare a = 42 and then in the next command say b = a + 1, how does the shell ensure that the definition of a is available to b?

Additionally, I would also really appreciate a link to shell code in the cpython repository?

Thank you!

uneven herald
#

I have a question about the general design of python:

why did Python decided to use dunder methods like len, where other languages like Java preferred a .length() convention for example? is there some historical/pedagogical/anything else evidences that motivated the choice?

warm breach
#

like iter() doesn't necessarily require an __iter__ dunder, it also has alternative forms of working like using __getitem__

#

and some methods enforce their own rules, like hash() cannot ever return -1

#

!e

class Foo:
    def __hash__(self):
        return -1

f = Foo()
print(f.__hash__())
print(hash(f))
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | -1
002 | -2
warm breach
#

also having all of these be dunders means you can name your own methods more freely without clashes

#

they also denote the fundamental difference in how they are stored and accessed,

class Foo:
    def len(self):
        ...
    def __len__(self):
        ...
#

len is stored in Foo's dictionary, while __len__ is stored in the basic Foo type's slots

uneven herald
#

ok thx! the two later were reallx what I was looking for

grave jolt
#

oh, yeah I see what you mean

#

If you do e.g. foo.__len__ = my_len it won't work

warm breach
grave jolt
#

whereas + itself cannot return NotImplemented

#

(right?)

warm breach
deep nova
#

You know, it occurs to me I've never actually seen Python's lexer

#

I've seen a few 3rd party ones, but never the one actually used by the interpreter

uneven herald
uneven herald
#

basically, the built-in operator is allowed to perform better control than what the underlying "implementation" is doing, for better safety and/or smarter decision ; smth that is not really possible when using idioms like .length() or .add(...)

#

hmmmmm
although, on my examples, + raised a NotImplementedError

warm breach
#

you're supposed to return NotImplemented

uneven herald
warm breach
#

not NotImplementedError

uneven herald
#

ig I miss something obvious

#

hm, debugger goes through the method though...

warm breach
raven ridge
# uneven herald ig I miss something obvious

a + b calls type(a).__add__(a, b) first. If that doesn't return NotImplemented, it evaluates to that value, otherwise it falls back to type(b).__radd__(b, a). If that returns NotImplemented an exception is raised, otherwise the addition evaluates to that value.

#

Consider subtraction: when it falls back to the second argument's type, that method needs to know that it was the second argument and not the first, because subtraction isn't commutative

uneven herald
uneven herald
raven ridge
#

Right, yep

uneven herald
#

thx all! learned smth quite neat today! 😄

slim drum
#

sorry for posting here but i am not really getting help in the help channel can anyone here tell me what my mistake is im a begginer

rose schooner
# craggy ravine Hi all, I had a few questions regarding the internals of the Python shell (REPL)...
  1. python REPL uses the same parser and compiler as python files, eval(), and exec(), but with a different mode from those types of running python
  2. yes, bytecode is generated
  3. those are globals, so they're stored in globals() which is available for each python session
    https://github.com/python/cpython/blob/main/Modules/main.c
GitHub

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

prime estuary
warm breach
#

is it safe to resurrect an object inside of its weakref.finalize?

#

!e

import weakref
from einspect.structs import PyObject

class Foo:
    pass

def main():
    f = Foo()
    weakref.finalize(f, lambda obj: print(obj.ob_refcnt), PyObject.from_object(f))

main()
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

0
warm breach
#

it seems the instance object has ob_refcnt of 0 when finalize runs

#

is Py_IncRef in general safe to use when ob_refcnt is 0?

raven ridge
warm breach
raven ridge
#

it's called directly from Py_DECREF - I think tp_dealloc calls the weakref finalizer, but I haven't spotted exactly how yet.

warm breach
#

how come resurrection is safe in __del__ though pithink

#

I thought it was around the same priority as weakref.finalize

raven ridge
#

it's impossible for a weakref finalizer to get a reference to the object to resurrect it, unless you're doing memory-unsafe stuff

raven ridge
#

Python code can't. you need C code or an FFI to do that.

warm breach
#

if weakref.finalize actually runs before garbage collection theoretically that would be okay...?

fallen slateBOT
#

Modules/gc_weakref.txt lines 21 to 29

OTOH, it's OK to run Python-level code that can't access unreachable
objects, and sometimes that's necessary.  The chief example is the callback
attached to a reachable weakref W to an unreachable object O.  Since O is
going away, and W is still alive, the callback must be invoked.  Because W
is still alive, everything reachable from its callback is also reachable,
so it's also safe to invoke the callback (although that's trickier than it
sounds, since other reachable weakrefs to other unreachable objects may
still exist, and be accessible to the callback -- there are lots of painful
details like this covered in the rest of this file).```
raven ridge
fallen slateBOT
#

Modules/gc_weakref.txt lines 7 to 8

Once gc has computed the set of unreachable objects, no Python-level
code can be allowed to access an unreachable object.```
raven ridge
#

so, no - I think the GC's code assumes that the object cannot be resurrected by a weakref finalizer... probably

warm breach
#

!e

import weakref
from einspect.structs import PyObject

class Foo(list):
    pass

def on_del(obj: PyObject):
    print("refcount in on_del:", obj.ob_refcnt)
    # Resurrect the object
    obj.IncRef()
    obj.IncRef()
    print(obj.into_object())

f = Foo([1, 2, 3])
obj = PyObject.from_object(f)
weakref.finalize(f, on_del, obj)

del f
print(obj.into_object())
fallen slateBOT
#

@warm breach :x: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

001 | refcount in on_del: 0
002 | [1, 2, 3]
fallen slateBOT
#

Modules/gc_weakref.txt lines 133 to 138

[In 2.4/2.3.5, we first clear all weakrefs to CT objects, whether or not
 those weakrefs are themselves CT, and whether or not they have callbacks.
 The callbacks (if any) on non-CT weakrefs (if any) are invoked later,
 after all weakrefs-to-CT have been cleared.  The callbacks (if any) on CT
 weakrefs (if any) are never invoked, for the excruciating reasons
 explained here.]```
warm breach
raven ridge
fallen slateBOT
#

Modules/gc_weakref.txt lines 148 to 156

So, to prevent any Python code from running while gc is invoking tp_clear()
on all the objects in cyclic trash,

[That was always wrong:  we can't stop Python code from running when gc
 is breaking cycles.  If an object with a __del__ method is not itself in
 a cycle, but is reachable only from CT, then breaking cycles will, as a
 matter of course, drop the refcount on that object to 0, and its __del__
 will run right then.  What we can and must stop is running any Python
 code that could access CT.]```
warm breach
# raven ridge that seems to be saying that weakref callbacks (and presumably by extension weak...

so essentially I'm trying to attach a weakref.finalize to the __hash__ function object here, and then DecRefing it from 2 references to 1, (since originally it gained 1 reference from int owning it), and within finalize, the original int.__hash__ function is restored.

from einspect import impl

@impl(int)
def __hash__(self):
    return 128

print(hash(10))  # 128

del __hash__

print(hash(10))  # 10

The fact that I DecRef the reference owned by int is obviously unsafe but I'm thinking that, in practice, since as soon as __hash__ is GC'd I immidietely replace the current garbage reference of int.__hash__ to a valid one, nothing should access garbage objects...?

#

though uh, currently int's PyObject_SetAttr tries to access the original attribute to DecRef it and segfaults.

#

so I'm starting to think this whole thing is not possible to do safely afterall? 😔

#

yeah this seems like a crazy rabbit hole I went with, it was never going to work from that first DecRef

raven ridge
#

you mean every part of einspect that involves writing to arbitrary objects? I agree 😛

#

it doesn't even seem like those are the semantics you would want, though, honestly

#

making the behavior change based on the last reference to __hash__ being dropped just seems weird. Why not make it a context manager?

warm breach
#

looks kind of weird I guess

from einspect import impl

with impl(int) as ctx:
    @ctx
    def __hash__(self):
        return 128

    print(hash(10))  # 128

print(hash(10))  # 10
rose schooner
warm breach
#

I dunno, how else would the context manager work

rose schooner
#

i thought of automatically detecting assignments for some reason

warm breach
#

also the finalizer idea also breaks with @property decorated functions

#

since property objects are not weak-refable for some reason

raven ridge
warm breach
#

hm yeah that looks better at least pithink

feral island
#

or just with einspect.patch(int, "__hash__", int_hash):?

warm breach
#

mainly I wanted the finalizer to restore the methods before interpreter shutdown though, since there are some internal calls that happen after python frames are gone

#

!e

from einspect import impl

@impl(int)
def __hash__(self):
    return 128
fallen slateBOT
#

@warm breach :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

[No output]
raven ridge
warm breach
#

like something in shutdown relies on PyLong's hash apparently

raven ridge
#

You realize that changing int's hash breaks every pre-existing dict with int keys, right?

warm breach
fallen slateBOT
#

@warm breach :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

[No output]
warm breach
#

an internal call goes to that when python frames are already gone so I assume the call accesses garbage memory

raven ridge
#

🤷‍♂️ I'm just pointing out another reason why doing what you're trying to do is fundamentally unreasonable

warm breach
#

alright so I have it not mess with ref counts now and just leave a plain weakref finalize on the function with detach=True

from einspect import impl, orig

@impl(int, detach=True)
def __hash__(self):
    print("in hash:", self)
    return orig(int).__hash__(self)
#

when the finalizer is called (at some point in interpreter shutdown) the user defined __hash__ function still has ref-count of 1, so it seems relatively safe?

warm breach
#

!e

import sys

class SomeClass:
    pass

print(sys.getrefcount(SomeClass))
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

5
warm breach
#

where do classes get 4 references from?

peak spoke
#

from get_referrers it looks like the MRO, __dict__ mappingproxy, __weakref__ descriptor and the globals

#

though I'm not sure why dict would have a reference?

warm breach
#

!e

from weakref import WeakKeyDictionary

class Foo:
    pass

d = WeakKeyDictionary()
d[Foo] = 1

del Foo

print(dict(d))
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

{<class '__main__.Foo'>: 1}
warm breach
#

currently deling a module level class doesn't remove it from weakref dicts as well, implying it's not GC'd at that stage

#

which is a bit strange

flat gazelle
#

!e

class X: pass
del X
print(object.__subclasses__()[-1])
fallen slateBOT
#

@flat gazelle :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class '__main__.X'>
flat gazelle
#

__subclasses__() keeps it forever-ish

feral island
#

!e ```
import gc
class X: pass
del X
gc.collect()
print(object.subclasses()[-1])

fallen slateBOT
#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class 'abc.ABC'>
feral island
#

__subclasses__ is a weakref. I think it doesn't get GCed because there's a cycle somewhere

flat gazelle
#

huh, interesting

flat gazelle
feral island
#

!e ```
import gc
class X: pass
print(gc.get_referents(X))

fallen slateBOT
#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

[{'__module__': '__main__', '__dict__': <attribute '__dict__' of 'X' objects>, '__weakref__': <attribute '__weakref__' of 'X' objects>, '__doc__': None}, (<class '__main__.X'>, <class 'object'>), (<class 'object'>,), <class 'object'>]
feral island
#

oops need the other one

#

!e ```import gc
class X: pass
print(gc.get_referrers(X))

fallen slateBOT
#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

[{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'gc': <module 'gc' (built-in)>, 'X': <class '__main__.X'>}, (<class '__main__.X'>, <class 'object'>), <attribute '__dict__' of 'X' objects>, <attribute '__weakref__' of 'X' objects>]
feral island
#

I guess there's a cycle between X and X.__dict__

pliant tusk
#

!e ```py
class X:pass

print(vars(X)['dict'].objclass is X)
print(X.weakref.objclass is X)``` these are the two cycles I believe

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | True
002 | True
pliant tusk
#

oh and X.__mro__

warm breach
feral island
warm breach
#

ah I see

#

without that there's no guarantee when it'll happen?

pliant tusk
deep nova
#

Does python's lexer watch for mismatched parentheses, or is that done in the parser?

lunar harbor
#

typically that is handled by the parser as the lexer just determines tokens. but I don't know about python in particular.

deep nova
#

Woot

lunar harbor
#

so pretty typical 🙂

deep nova
#

Follow up question

#

Matching parentheses is important for indentation enforcement. If you're in parens then you ignore indentation, right? How, then, can a lexer which isn't tracking parens intelligently recognize when and when not to throw an indentation error

feral island
#

though:

#

!e ```
)

fallen slateBOT
#

@feral island :x: Your 3.11 eval job has completed with return code 1.

001 |   File "<string>", line 1
002 |     )
003 |     ^
004 | SyntaxError: unmatched ')'
feral island
#

this kind of sounds like it comes from the lexer

#

yes Parser/tokenizer.c: return syntaxerror(tok, "unmatched '%c'", c);

deep nova
#

Awesome!

feral island
#

!e (

fallen slateBOT
#

@feral island :x: Your 3.11 eval job has completed with return code 1.

001 |   File "<string>", line 1
002 |     (
003 |     ^
004 | SyntaxError: '(' was never closed
feral island
#

that one comes from the parser though Parser/pegen_errors.c: "'%c' was never closed",

deep nova
#

The salient thing I needed know has, I think, been answered for me

#

If I intend to emulate Python's enforcement of indentation while lexing, I'll also need to track parentheses

#

In doing this, I may or may not also enforce parentheses matching

rich cradle
#

what you want here is probably to construct a token tree

#

and indentation can be put into that tree as a kind of delimiter

deep nova
#

O.O

#

A token tree?

#

Does the lexer recognize a \\ as a token unto itself, or does it recognize \\\n?

#

Actually — how do the lexing and parsing phases handle explicitly escaped newlines in general?

rich cradle
#

they're in strings, so they don't, at least ime

feral island
#

pretty sure escaping newlines would be handled in the lexer. I'd expect the lexer to simply not emit a token in that case, just like other whitespace within a statement

deep nova
#

Oh, not emitting a token at all is smart

#

So, what, consume the backslash, the newline, and any leading tabs or spaces, all without emitting anything?

feral island
#

yes, that's what I'd expect to happen

deep nova
#

As well, throw an error if anything else follows?

deep nova
#
def newline(self) -> Generator[Token, None, None]:

    yield self.token(Tokentype.NEWLINE)

    indentation = len(self.star(self.generic('\t')))

    while indentation > self.indentation:
            
        self.indentation.append(len(self.indentation))
        yield self.token(Tokentype.INDENT)

    if indentation not in self.indentation:
        raise SyntaxError('dedentation to inconsistent depth')
        
    while indentation < self.indentation:

        self.indentation.pop()
        yield self.token(Tokentype.DEDENT)

#

Does this about summarize it?

rich cradle
rose schooner
deep nova
#

?

rose schooner
#

so it's maybe correct

#

or equivalent

#

but there's more complicated stuff the lexer does

#

like how it detects inconsistent spacing

#
def a(x):
    if x & 1:
        print('x is odd')
        return
    \
                                 \
print('x is even')
``` this doesn't result in an `IndentationError`, rather, does what you'd expect if you did ```py
def a(x):
    if x & 1:
        print('x is odd')
        return
    print('x is even')
#

so yeah that's also something to handle

deep nova
#

For the time being, I'm going to prohibit leading spaces

#

Only tabs

#

That said, what "more complicated stuff" are we talking about?

feral island
#

!e ```
if 1:
2
3

fallen slateBOT
#

@feral island :x: Your 3.11 eval job has completed with return code 1.

001 |   File "<string>", line 3
002 |     3
003 |      ^
004 | IndentationError: unindent does not match any outer indentation level
deep nova
#

I'm pretty sure that's covered in my solution

#
    if indentation not in self.indentation:
        raise SyntaxError('dedentation to inconsistent depth')
rose schooner
deep nova
#

HA

#

I did not know that

rose schooner
#

it's 100 so it should be fairly impossible to reach

feral island
#

!e exec("\n".join(" " * i + "if 1:" for i in range(102)) + " " * 102 + "pass")

fallen slateBOT
#

@feral island :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 1, in <module>
003 |   File "<string>", line 101
004 |     if 1:
005 | IndentationError: too many levels of indentation
feral island
#

I did not know about this

rose schooner
#

so there's 2 numbers in the lexer to keep track of spaces and tabs in indentation

#

but since we're only dealing with tabs rn we can just keep it to one

rose schooner
deep nova
#

By "handling"

#

Do you mean that no indents/dedents should be created for empty lines?

#

No tokens at all?

deep nova
#

Let's say I'm parsing my string so I can handle the escaped characters

#

What do I do if I encounter \u123?

#

Or rather, if I fail to consume exactly 4 or 8 hex digits?

#

Error? I'm considering allowing \u to be \u0000 implicitly, so I could always spit that out and then let the numbers just be raw text

#

That might be a bit confusing for the user though

rose schooner
#

newlines aren't generated for empty lines

#

like totally empty lines with nothing in them

#

but they are generated for lines with tabs, comments, or both

#

despite being ignored

deep nova
#

Are they completely ignored? If so, why generate them?

rose schooner
deep nova
#

What is a repl?

rose schooner
#

wait nvm

deep nova
#

XD I've heard the term, but never stopped to ask

rose schooner
#

i understood the if statements wrong

rose schooner
#

it's what this is

deep nova
#

Python has an array module!??!??!?!?!?!?!?!?!!?!!!!?!?!?!?!

rose schooner
#

so blank lines generate a NEWLINE token
lines with only spaces/tabs/comments don't

deep nova
#

Lines with only spaces/tabs/comments don't create anything, but completely empty lines produce newlines tokens without producing any indents/dedents?

#

Seems odd

#

But okay

#

It sounds to me as though, if I were to emulate this, the first thing I'd do on encountering a newline is determine if its empty

#

So...
Consume newline
Consume whitespace
Consume comment
Consume another newline

#

If all of these succeed, and nothing else is encountered, the line is empty?

rose schooner
#

unless the line before was also whitespace/comments only

deep nova
#

Hmmm

rose schooner
#

so basically

Consume whitespace
Consume? comment
Consume newline
``` would generate nothing
#

? means its optional

deep nova
#

Actually, wouldn't it be...

#
Consume? comment
Observe? newline  <-- if observed, create newline. Don't consume, such
                      that the above can be performed for that next newline
rose schooner
#

ok so i think i get it

#

it only matters in interactive mode/the REPL

#

otherwise it's ignored

rose schooner
#

whereas if there are stuff in the line ```
Consume? whitespace
Process? indentation
Consume and generate tokens
Consume newline
Generate NEWLINE

deep nova
#

Ohhhhh man

#

I shouldn't have drank all that coffee O.o

deep nova
#

So, I'm trying to think my way through this

#

I don't see any reason that an empty line, whether it contains nothing, whitespace, a comment, or whitespace and a comment, should produce any kind of token at all

deep nova
#
    def newline(self) -> Generator[Token, None, None]:

        # consume leading whitespace
        whitespace = self.star(self.pipe(self.generic('\t'), self.generic(' ')))

        if self.observe() == '#' and self.advance():  # consume a comment

            while self.letter() or self.base10() or self.symbol():
                pass  # consume an optional comment

        elif (observed := self.observe()) == '\n' or not observed:
            pass  # line is empty

        elif self.parentheses:  # line is not empty, but is within parentheses
            pass

        else:

            yield self.token(Tokentype.NEWLINE)
    
            indentation = len(self.star(self.generic('\t')))  # count leading tabs
            
            if self.observe() == ' ':  # leading whitespace
                raise SyntaxError('leading whitespaces are prohibited (use tabs)')

            while self.indentation[-1] < indentation:  # create INDENT tokens

                self.indentation.append(len(self.indentation))
                yield self.token(Tokentype.INDENT)

            if indentation not in self.indentation:       # dedentation to unknown depth
                raise SyntaxError('dedentation to inconsistent depth')

            while self.indentation[-1] > indentation:  # create DEDENT tokens

                self.indentation.pop()
                yield self.token(Tokentype.DEDENT)
#

What a horrible function

rose schooner
deep nova
#

That's what that block does

#

It consumes a letter, a number, or any symbol. Newlines aren't among those, so the consumption will cease when that's the case

#
    def newline(self) -> Generator[Token, None, None]:

        indentation = len(self.star(self.pipe(self.generic('\t'))))

        match self.observe():

            case '\n' | '\r':
                pass

            case ' ':  # mixed spaces and tabs
                raise SyntaxError('leading whitespaces are prohibited (use tabs)')

            case '#':  # empty line with a comment only

                self.advance()  # consume '#'
                self.comment()  # consume comment body

            case  _ :  # non-empty line

                if indentation < self.indentation[-1] and indentation not in self.indentation:
                    raise SyntaxError('dedentation to inconsistent depth')

                self.advance()
                yield self.token(Tokentype.NEWLINE)

                yield from self.indents(indentation)
                yield from self.dedents(indentation)
#

A bit better, for sure

warm breach
#

@pliant tusk do you know if there's a Py_TPFLAGS_IMMUTABLE equivalent in <= 3.9 that's not Py_TPFLAGS_HEAPTYPE?

#

it seems changing HEAPTYPE as false isn't quite safe on heap types

#

like in 3.9

from fishhook import lock

class Foo:
    pass

lock(Foo)
Objects/typeobject.c:3682: type_traverse: Assertion failed: type_traverse() called on non-heap type 'Foo'
Enable tracemalloc to get the memory block allocation traceback

object address  : 000001FDC9ADD650
object refcount : 4
object type     : 00007FFEB260CC60
object type name: type
object repr     : <class 'Foo'>

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: finalizing (tstate=000001FDC8E12F50)

Current thread 0x00007b68 (most recent call first):
<no Python frame>
pliant tusk