#internals-and-peps | Python | Page 10

rose schooner Jan 27, 2023, 12:24 PM

#

here are some improvements ```py
partial = lambda f, *a, **k: update_wrapper(lambda *_a, **_k: f(*a + _a, **k, **_k), # star unpack precedence is quite low so we can just do this
wrapped=f, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES) # provide like in wraps()
partialmethod = lambda f, *a, **k: update_wrapper(lambda self, *_a, **_k: f(self, *a + _a, **k, **_k),
wrapped=f, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES)

#

the logic is mainly just lambda *_a, **_k: f(*a + _a, **k, **_k) and lambda self, /, *_a, **_k: f(self, *a + _a, **k, **_k)

rare lantern Jan 27, 2023, 12:25 PM

#

I assume you are importing the constants form functools

#

ah yes

rare lantern Jan 27, 2023, 12:26 PM

#

rose schooner here are some improvements ```py partial = lambda f, *a, **k: update_wrapper(lam...

Also, yay, someone who understands my reduced lambda code

rose schooner Jan 27, 2023, 12:26 PM

#

f(*a + _a, **k, **_k) also has 4 alternate variations:
f(*a, *_a, **k | _k)
f(*a + _a, **k | _k)
f(*a, *_a, **k | _k)
f(*a, *_a, **k, **_k)

rose schooner Jan 27, 2023, 12:26 PM

#

rare lantern Also, yay, someone who understands my reduced lambda code

perks of being an #esoteric-python developer

rare lantern Jan 27, 2023, 12:27 PM

#

rose schooner perks of being an <#470884583684964352> developer

Heh, yo would love some of my code lol, I play with a lot of monoidial, functor and combinator logics to build stuff (built a whoole lang parser using them, its pretty neat)

rose schooner Jan 27, 2023, 12:28 PM

#

i don't know what any of that means but ok

fallen slateBOT Jan 27, 2023, 12:29 PM

#

Hey @rare lantern!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

rare lantern Jan 27, 2023, 12:30 PM

#

rose schooner i don't know what any of that means but ok

https://paste.pythondiscord.com/vemeninotu

#

is the most basic and concise bits

warm breach Jan 27, 2023, 12:40 PM

#

is there an accessible type of builtin methods

#

!e

print(type(print))

fallen slateBOT Jan 27, 2023, 12:40 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class 'builtin_function_or_method'>

warm breach Jan 27, 2023, 12:49 PM

#

nevermind found it, types.BuiltinFunctionType

deep nova Jan 27, 2023, 3:07 PM

#

I see a lot of conjecture about what PEG parsers do

#

But not a lot of material on how to build one

#

Anyone able to point me in the right direction?

quick snow Jan 27, 2023, 3:53 PM

#

warm breach nevermind found it, `types.BuiltinFunctionType`

Iirc many of the types "exposed" by the types modules are also just defined there like this:

FunctionType = type(lambda:0)
BuiltinFunctionType = type(print)
...

warm breach Jan 27, 2023, 4:17 PM

#

quick snow Iirc many of the types "exposed" by the `types` modules are also just defined th...

turned out to be pointless at least for typing, since typeshed defines builtin functions as just plain callables

deep nova Jan 28, 2023, 2:45 AM

#

Someone on another server claims that PEG parsers are cubic with respect to memory consumption in the worst case

#

PEP617 claims that the new parser falls within +-10% of the original parser when applied to the standard library as a benchmark

#

I'm just curious what lever of hackery was required to achieve that

#

Guido has always seemed (to me) to play things fast and loose when it comes to ad-hoccing his solutions

raven ridge Jan 28, 2023, 3:09 AM

#

those aren't incompatible statements. Something can be cubic in the worst case when applied to adversarial inputs, and perform very well on typical inputs

#

the worst case time complexity of a backtracking regex engine is 2**n with respect to the length of the input string, for example

#

granted that takes a particularly terrible pattern, but 🤷‍♂️

deep nova Jan 28, 2023, 4:02 AM

#

That is a sound analysis that holds absolutely zero actual information

#

Just so you know XD

lone sun Jan 28, 2023, 4:03 AM

#

It's worth noting that the 2**n worst case basically requires an adversarial regex, too. Regexes like $[ab]$c*\1 can be parsed in polynomial time.

#

Even if general PEG parsers require cubic memory in the worst case, it's possible that Python's grammar doesn't trigger that kind of memory usage.

#

Or that it would require pathological input, like the whole file being nothing but f"{f"{f"{f"{f"{...}"}"}"}"}".

raven ridge Jan 28, 2023, 5:02 AM

#

deep nova That is a sound analysis that holds absolutely zero actual information

that's not true. It contained the actual information that you've mistakenly assumed that excessive memory usage for adversarial inputs ought to correlate to excessive memory usage for representative inputs.

deep nova Jan 28, 2023, 5:04 AM

#

Stated more explicitly as you've done, that's a bit clearer. You're original phrasing sounded a lot more like "it depends"

#

Nontheless

raven ridge Jan 28, 2023, 5:04 AM

#

Something can be cubic in the worst case when applied to adversarial inputs, and perform very well on typical inputs.
That wasn't an "it depends".

deep nova Jan 28, 2023, 5:04 AM

#

My actual question was to do with the actual optimizations in the algorithm

raven ridge Jan 28, 2023, 5:05 AM

#

that's begging the question. You're assuming that the algorithm must be optimized for memory usage to achieve its results.

deep nova Jan 28, 2023, 5:05 AM

#

Godly, my brother

#

My friend

#

You've got to learn to give a straight answer

raven ridge Jan 28, 2023, 5:06 AM

#

I don't know how much clearer I can be. Your premise is wrong.

deep nova Jan 28, 2023, 5:07 AM

#

First off, Guido mentions several times in his blog posts about PEG parsing that optimizing the memory footprint is on his to do list. Moreover, the primary complaint I've seen with respect to PEG parsers throughout my research has been a nontrivial potential and actual excessive memory footprint

#

AND — even if a hideous memory footprint is possible in only adversarial conditions — its still worth talking about and asking about. I asked "what's been done about the memory footprint in Python's parser"

You answers "why would we optimize the algorithm in the first place". While technically an answer, it neither confirms nor denies the existence of any optimizations

#

Which is what I care about

raven ridge Jan 28, 2023, 5:14 AM

#

what's been done about the memory footprint in Python's parser
As far as I know, nothing, but I'm not an expert. It sounds like you've probably read up on it more than me.

My impression is that the only work done to limit the memory usage is carefully designing the grammar to try to ensure that ambiguities are quickly resolved so that look-ahead is rarely needed.

#

LWN summarizes it as just:

There is, of course, a cost for infinite lookahead, in the form of increased memory usage. But the PEP notes that the performance of the new parser is within 10% of that the old, both in terms of speed and memory. While PEG with packrat parsing consumes more memory, the new parser does not create a concrete syntax tree, as the existing parser does; instead, it directly creates the abstract syntax tree, which makes up for most of the memory consumed by the new techniques.
Which matches up to my understanding that no particular hacks were required.

deep nova Jan 28, 2023, 5:23 AM

#

Rockin

raven ridge Jan 28, 2023, 5:26 AM

#

I do vaguely recall that some proposed language feature was shot down on the basis that it would require too much look ahead and could cause excessive memory usage, too - though I don't recall which

deep nova Jan 28, 2023, 5:29 AM

#

The only optimization I can think of thus far is that you don't need to hold the entire token steam in memory, just the parts that are currently relevant. Once you exit a subtree, basically, you can throw out the tokens it's consumed

#

But that's a battle for another day

dim shard Jan 28, 2023, 6:31 AM

#

GodlyGeek

gray galleon Jan 28, 2023, 8:54 AM

#

how does python peg parser handle left recursion
its a recursive descent parser amirite

rose schooner Jan 28, 2023, 8:55 AM

#

gray galleon how does python peg parser handle left recursion its a recursive descent parser ...

some trick

#

i still don't understand

#

it's liked memoization or something

gray galleon Jan 28, 2023, 8:55 AM

#

do they rewrite it to become non-left recursive
like```
sum: prod ('+' prod)*

rose schooner Jan 28, 2023, 8:56 AM

#

gray galleon do they rewrite it to become non-left recursive like``` sum: prod ('+' prod)* ``...

they still recurse in the parser

#

the generated one

gray galleon Jan 28, 2023, 8:59 AM

#

rose schooner they still recurse in the parser

dk they might be postprocessed later on

gray galleon Jan 28, 2023, 9:03 AM

#

rose schooner it's liked memoization or something

left recursive derivation are memoized or what

rose schooner Jan 28, 2023, 9:03 AM

#

gray galleon left recursive derivation are memoized or what

i think

gray galleon Jan 28, 2023, 9:04 AM

#

and if they are memoized the parser tries the alternatives instead
i think
idk how it works exactly

#

yeah that might work

deep nova Jan 28, 2023, 9:51 AM

#

@gray galleon There is a trick which relies on packrat parsing

#

It calls itself repeatedly, caching the result each time, until a recursive call fails to consume any more tokens than the previous call

#

Something like that, I havn't had a chance to crack open the code yet

#

https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1

Medium

Left-recursive PEG grammars

I’ve alluded to left-recursion as a stumbling block a few times, and it’s time to tackle it. The basic problem is that with a recursive…

#

My understanding is that allowing left recursion has more to do with convenience and expressiveness than anything else. It comes at a price: linear performance is sacrificed. That said, I'm toying with a PEG which handles operators using Pratt parsing. This is normally where most left recursion occurs (I think). If that's the case, then the left recursion becomes one extra tool in the toolbox which shouldn't cost too much in terms of performance if used sparingly

flat gazelle Jan 28, 2023, 7:29 PM

#

why do dataclasses use __post_init__ instead of just treating __init__ defined on the decorated class as __post_init__?

warm breach Jan 28, 2023, 8:45 PM

#

flat gazelle why do dataclasses use `__post_init__` instead of just treating `__init__` defin...

but how?

#

the defined class init would run first

#

technically you can just rewrite the user __init__ but that would be lying, the user's __init__ isn't actually __init__

#

also type checkers now think you have a zero argument __init__ when in fact it is the one synthesized by dataclass

flat gazelle Jan 28, 2023, 8:47 PM

#

warm breach also type checkers now think you have a zero argument `__init__` when in fact it...

type checkers already special-case dataclasses, that seems like a non-issue to me

warm breach Jan 28, 2023, 8:47 PM

#

flat gazelle type checkers already special-case dataclasses, that seems like a non-issue to m...

they special case a synthesized init, but you can override it '

flat gazelle Jan 28, 2023, 8:48 PM

#

warm breach technically you can just rewrite the user `__init__` but that would be lying, th...

this one is fair, though I think it is more surprising that a dataclass just breaks if you define an __init__ yourself, I would expect at least an error if you tell it to generate an __init__ and also define one

warm breach Jan 28, 2023, 8:48 PM

#

you can do no_init=True iirc for that

#

also I'm not sure if dataclass can easily tell if you defined an init yourself?

#

maybe it's possible to do an error for that

flat gazelle Jan 28, 2023, 8:50 PM

#

'__init__' in vars(cls) works just fine for that

raven ridge Jan 28, 2023, 8:51 PM

#

flat gazelle why do dataclasses use `__post_init__` instead of just treating `__init__` defin...

Because then you couldn't override __init__ like you can override every other generated method.

flat gazelle Jan 28, 2023, 8:51 PM

#

why can I override generated methods without turning off their generation at all

warm breach Jan 28, 2023, 8:51 PM

#

I suppose yeah, but imo it's very strange for __init__ to not be the real init

flat gazelle Jan 28, 2023, 8:52 PM

#

though IG the whole composing idea doesn't work with things like __hash__, so that would be special

warm breach Jan 28, 2023, 8:52 PM

#

a type checker now needs to figure out if you defined no_init=True to figure out if __init__ is real or not

raven ridge Jan 28, 2023, 8:52 PM

#

flat gazelle why can I override generated methods without turning off their generation at all

Convenience, simplicity, expressiveness... But either way, changing that now would be backwards incompatible, so that's never going to happen.

flat gazelle Jan 28, 2023, 8:53 PM

#

yeah, I just remember everyone that I have seen try dataclasses break a dataclass with defining an __init__

#

I think it's the most obvious way to do it and it should just work

warm breach Jan 28, 2023, 8:54 PM

#

we already have enough special casing for dataclasses though, we need less of those not more

#

an ignored __init__ would be another thing type checkers need to handle

flat gazelle Jan 28, 2023, 8:54 PM

#

warm breach a type checker now needs to figure out if you defined `no_init=True` to figure o...

already have to do that for kw_only

warm breach Jan 28, 2023, 8:56 PM

#

flat gazelle already have to do that for `kw_only`

yeah, but at least it's a virtual "superclass" provided init

#

by inheritance you should always be able to override the superclass

#

#

type checkers still abide by that in following your init function and not the provided one

flat gazelle Jan 28, 2023, 8:58 PM

#

dataclass will always need a ton of special casing, and making an API less obvious just to make tooling marginally easier seems silly to me. (you need to scroll down past dataclasses.MISSING and other useful to know about constants to even get to the __post_init__ docs, and most people will not scroll that far.)

#

IG it is easier in that you can pretend it is a superclass

raven ridge Jan 28, 2023, 8:58 PM

#

warm breach yeah, but at least it's a virtual "superclass" provided init

what do you mean by 'virtual "superclass"'?

flat gazelle Jan 28, 2023, 8:58 PM

#

which seems like a somewhat silly way to implement a dataclass from a type checker standpoint

warm breach Jan 28, 2023, 8:59 PM

#

raven ridge what do you mean by 'virtual "superclass"'?

just, logically I guess. A clash between provided and written methods should resolve to user written ones

raven ridge Jan 28, 2023, 9:00 PM

#

I think it's just ease of use, really. If someone wants to define their own __repr__ for a dataclass, it seems weird to make them also call the dataclass decorator with repr=False. It's making them state their intent in two places, they have to say "don't provide a repr", and then say "use this for the repr". That seems weirder than having the dataclass infer "they've provided a repr, so I shouldn't generate my own."

flat gazelle Jan 28, 2023, 9:00 PM

#

and in general, I really do not care about type checker complexity over making the language better.

#

that does make sense for everything but __init__ IG

#

which is why we have __post_init__

warm breach Jan 28, 2023, 9:02 PM

#

flat gazelle which is why we have `__post_init__`

how would it even synthesize the init with your own init?

#

like you're suggesting to make __post_init__ just __init__? but doesn't that clash with the provided one in name

flat gazelle Jan 28, 2023, 9:03 PM

#

def new_init(...):
    ...
    self.__original_init__(...)
cls.__init__, cls.__original_init__ = new_init, cls.__init__

raven ridge Jan 28, 2023, 9:03 PM

#

flat gazelle that does make sense for everything but `__init__` IG

it makes sense even for __init__, I'd argue. If you provide your own __init__, the dataclass infers "they've provided a __init__, so I shouldn't generate my own", just like it does for every other dunder.

flat gazelle Jan 28, 2023, 9:03 PM

#

seems fairly straightforward to me

warm breach Jan 28, 2023, 9:03 PM

#

flat gazelle ```py def new_init(...): ... self.__original_init__(...) cls.__init__, c...

so original init runs after init?

flat gazelle Jan 28, 2023, 9:04 PM

#

yup, so that it gets the dataclass fields

warm breach Jan 28, 2023, 9:04 PM

#

that seems a lot weirder than post init

flat gazelle Jan 28, 2023, 9:04 PM

#

I am willing to bet there are vastly more dataclasses with a __post_init__than with a full replaced __init__

raven ridge Jan 28, 2023, 9:04 PM

#

and then you've lost the ability to override __init__, like you can override every other dunder

warm breach Jan 28, 2023, 9:04 PM

#

whether you define it yourself or in another method, you always define on thing for that function, __post_init__

#

not an __init__ that gets rewritten into __original_init__ that would need alternative naming on further subclass interactions

#

also __post_init__ makes it clear this runs after __init__

#

__original_init__ seems to imply it runs before

flat gazelle Jan 28, 2023, 9:05 PM

#

well, the user of the @dataclass should never see the name __original_init__ at all.

raven ridge Jan 28, 2023, 9:05 PM

#

sure, they don't have to, you can close over the original __init__ instead.

flat gazelle Jan 28, 2023, 9:06 PM

#

but yeah, fair enough, it is a bit more magic than the current solution. But I feel like dataclasses are already mostly treated as a magic box by most people

raven ridge Jan 28, 2023, 9:07 PM

#

my problem with it isn't just that it's magic, it's that it's magic that would work differently for one dunder than for every other.

#

it makes things much harder to reason about when they're inconsistent.

flat gazelle Jan 28, 2023, 9:07 PM

#

I would argue that __post_init__ is already treating __init__ as a special thing

#

there is no __post_eq__

warm breach Jan 28, 2023, 9:08 PM

#

flat gazelle I would argue that `__post_init__` is already treating `__init__` as a special t...

right, __post_init__ is the special thing only for __init__, it doesn't make __init__ special

#

it's an additional slot, not changing the behavior of an existing one

raven ridge Jan 28, 2023, 9:08 PM

#

flat gazelle I would argue that `__post_init__` is already treating `__init__` as a special t...

sure - but that's extending the normal behavior (by allowing the generated method to call a user-supplied one), rather than modifying the normal behavior (by having a method be generated even when the user supplied a definition)

flat gazelle Jan 28, 2023, 9:09 PM

#

hmm, fair enough. IG no matter what you will have to go to stack overflow to figure out how to actually add a custom init to a dataclass

warm breach Jan 28, 2023, 9:09 PM

#

I think in those cases it's easier just to use a normal class tbh

flat gazelle Jan 28, 2023, 9:10 PM

#

it isn't. If you have a dozen fields, but need to also e.g. register the instance to a pool or do some extra checks, a __post_init__ makes sense. That's why it exists

#

but you will never ever figure out it exists unless you go to stack overflow

raven ridge Jan 28, 2023, 9:10 PM

#

yeah - it seems reasonable to me to want everything a dataclass generates except __init__

flat gazelle Jan 28, 2023, 9:11 PM

#

again, I think the case where you fully override __init__ is much rarer than the case where you just want a __post_init__

#

and the less common case is the only one with an obvious solution

warm breach Jan 28, 2023, 9:11 PM

#

flat gazelle again, I think the case where you fully override `__init__` is much rarer than t...

so you don't define an __init__ yourself and use the provided one, would __original_init__ be defined?

flat gazelle Jan 28, 2023, 9:12 PM

#

__original_init__ was just an example

#

the name is opaque in the same way as other dataclass internals

#

you shouldn't need to touch it

warm breach Jan 28, 2023, 9:12 PM

#

right but you would now have to change all dataclass internal code to check whether __original_init__ is defined

#

vs. now where you can unconditionally call __post_init__ or __init__

#

__init__ is obviously always there due to object

flat gazelle Jan 28, 2023, 9:13 PM

#

what do you mean? Post init is conditionally defined, __original_init__ is conditionally defined

raven ridge Jan 28, 2023, 9:13 PM

#

flat gazelle and the less common case is the only one with an obvious solution

it's obvious because it works the same as replacing any other method, either on a dataclass or inherited from a superclass.

warm breach Jan 28, 2023, 9:13 PM

#

but honestly down that route we probably could have used some special syntax for dataclasses if we had a full do-over

flat gazelle Jan 28, 2023, 9:14 PM

#

and yet multiple people expect it to not do that.

raven ridge Jan 28, 2023, 9:14 PM

#

that confuses me, honestly.

#

I don't see why they'd expect overriding __init__ to do something different than overriding __hash__ - or overriding an __init__ inherited from a superclass.

flat gazelle Jan 28, 2023, 9:15 PM

#

well, the thought process is:
here is a class
I want to do something when its initialized
lets add an __init__

raven ridge Jan 28, 2023, 9:15 PM

#

and that works fine.

flat gazelle Jan 28, 2023, 9:15 PM

#

and it breaks the dataclass

raven ridge Jan 28, 2023, 9:15 PM

#

it doesn't break it - it just replaces its __init__

flat gazelle Jan 28, 2023, 9:15 PM

#

which is the thing people use a dataclass for in the first place

raven ridge Jan 28, 2023, 9:15 PM

#

now you're responsible for setting every attribute, just like you normally do in a __init__

flat gazelle Jan 28, 2023, 9:16 PM

#

not having to type out the self.a=a

#

I think the __post_init__ is a clumsy solution

raven ridge Jan 28, 2023, 9:16 PM

#

flat gazelle which is the thing people use a dataclass for in the first place

if that's all you wanted, you'd be using a types.SimpleNamespace. People use a dataclass for all of the other dunders that it generates.

warm breach Jan 28, 2023, 9:16 PM

#

flat gazelle which is the thing people use a dataclass for in the first place

isn't the point for dataclass to provide __init__? If you then provide your own as well it'd make sense for it to be overwritten

#

just like how if text == "a" or "b" makes logical sense within the language not not intuitively

#

we could special case that as well, like C# does

#

though to be fair C# has || && so more leeway with what or does

flat gazelle Jan 28, 2023, 9:20 PM

#

IG fair enough, not everything can be a nice API.

warm breach Jan 28, 2023, 9:20 PM

#

flat gazelle IG fair enough, not everything can be a nice API.

I think it's just one of those things that could have gone either way when dataclass was first created

#

but now is too small of a thing to change with the impact of possibly breaking all extended code

flat gazelle Jan 28, 2023, 9:21 PM

#

yeah, of course it can't be changed now

#

but well, at least it's a point for every single 10 python tricks you didn't know video

warm breach Jan 28, 2023, 9:22 PM

#

what if we just added a new keyword

struct Point:
    x: int
    y: int

    def __init__(self):
        ...

raven ridge Jan 28, 2023, 9:24 PM

#

Consistency counts for an awful lot in language design

warm breach Jan 28, 2023, 9:25 PM

#

speaking of, apparently ints will become mutable in some cases in 3.12+

#

https://github.com/python/cpython/blob/main/Objects/longobject.c#L283-L284

fallen slateBOT Jan 28, 2023, 9:26 PM

#

Objects/longobject.c lines 283 to 284

// Mutate in place if there are no other references the old
// object.  This avoids an allocation in a common case.```

flat gazelle Jan 28, 2023, 9:26 PM

#

upon further review of similar features, __post_init__ is still better than most of what other languages offer

#

it's pretty much a raku TWEAK, and if raku couldn't come up with a better solution, IG it's really the best option

deep nova Jan 28, 2023, 10:29 PM

#

How does one achieve left-associativity with an LL parser?

#

It seems as though one must resort to term := factor ('+' factor)* or some such

#

Is this the standard approach? Would this not require some kind of second pass?

flat gazelle Jan 28, 2023, 10:31 PM

#

you more or less cannot, yeah

#

since you can't know how deep to nest the first operator without knowing how many are in a row

deep nova Jan 28, 2023, 10:32 PM

#

So, what, you're screwed? XD

flat gazelle Jan 28, 2023, 10:42 PM

#

you can reconstruct the parse tree with semantic actions IIRC

deep nova Jan 28, 2023, 10:49 PM

#

As I'm reading, there seem to be three approaches:

Refactor the grammar (to eliminate left-associativity all together???)
Use Pratt or precedence climbing
Convert recursion to iteration and post-process

#

Or use left recursion hack available to packrat parsers, I guess

#

I'd prefer to start with the first option (I'll try them all out eventually) but I can't seem to find any instructions on how

radiant garden Jan 29, 2023, 9:54 AM

#

https://en.wikipedia.org/wiki/Left_recursion has a section on how

Left recursion

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on the right). For instance,

halcyon trail Jan 31, 2023, 4:59 AM

#

flat gazelle upon further review of similar features, `__post_init__` is still better than mo...

Fwiw, sorry for the necro, I don't think this is true, unless I misunderstood

#

Pythons dataclass is actually unusual in that it's also what's responsible for providing, let's say, reasonable init for structs that just hold data

#

Well, okay, hm, one of the other examples I had in mind fell through actually

#

But the other example was Kotlin. Kotlin has dataclasses as well but they're there for hash and equals. You get dataclass style init just by declaring fields a certain way. You can do the post init in init blocks in the class body

#

But I agree after further thought that not that many languages have a great solution here

flat gazelle Jan 31, 2023, 7:43 AM

#

halcyon trail But the other example was Kotlin. Kotlin has dataclasses as well but they're the...

Kotlin init blocks run after the constructor? I assumed it would work the same as java.

quaint copper Jan 31, 2023, 9:29 AM

#

Hi! Does python expose internal lexing functions like the one parsing python strings? I am looking for something akin to ast.literal_parse but only restricted to parsing strings, erroring out on other literal types, and even maybe returning meta-information about the parsed string: quotes types, literal length.. I can't find this exposed within the std, is it?

#

Or do I need to implement this myself based on the specifications there? https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

#

I can also "cheat" by recognizing either prefix in ', ", ''', """, r', r", etc. and then feed the rest of my input character by character (or quote-separated chunk by quote-separated chunk) until ast.literal_parse eventually succeeds, but this feels rather clumsy and very inefficient.

flat gazelle Jan 31, 2023, 9:47 AM

#

quaint copper Hi! Does python expose internal lexing functions like the one parsing python str...

it may be possible to leverage the tokenize module to do so. If you don't need specfically python strings, I believe shlex can do something similar as well.

halcyon trail Jan 31, 2023, 2:22 PM

#

@flat gazelle https://pl.kotl.in/P0et_rVuy

Kotlin Playground

Kotlin Playground: Edit, Run, Share Kotlin Code Online

still prism Jan 31, 2023, 10:56 PM

#

Hi guys, for some reason I can't use chown on a mounted drive in Linux

grave jolt Jan 31, 2023, 11:03 PM

#

still prism Hi guys, for some reason I can't use chown on a mounted drive in Linux

#unix

warm breach Feb 1, 2023, 1:51 AM

#

So there's sys.intern to intern strings, is there some way to unintern strings...?

raven ridge Feb 1, 2023, 2:08 AM

#

isn't that just making a copy of a string?

#

oh - I guess you mean to remove it from the table of interned strings, even while it's still alive, so that the next time sys.intern is called on an identical string, it doesn't return the original one and instead inserts the new one in the table?

#

if that's what you mean, no, I don't think so.

warm breach Feb 1, 2023, 3:07 AM

#

raven ridge if that's what you mean, no, I don't think so.

!e mainly since 😔

from einspect import view, unsafe

s = "test123"
v = view(s)

print(v.interned.name)

with unsafe():
    view("dog").move_to(v)
    
print(s)

fallen slateBOT Feb 1, 2023, 3:07 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | INTERNED_MORTAL
002 | dog
003 | Exception ignored deletion of interned string failed:
004 | KeyError: 'dog'

warm breach Feb 1, 2023, 3:08 AM

#

even though apparently it's an "ignored" error? pithink

raven ridge Feb 1, 2023, 3:08 AM

#

any exception during garbage collection needs to be ignored

#

garbage collection can happen at any time on any thread - there may not even by Python frames on the stack that could have an except: block to handle it

warm breach Feb 1, 2023, 3:09 AM

#

!e

from einspect import view, unsafe

s = "test123"
v = view(s)

print(v.interned.name)

with unsafe():
    other = view("dog")
    other.interned = 0
    other.move_to(v)
    
print(s)

fallen slateBOT Feb 1, 2023, 3:09 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | INTERNED_MORTAL
002 | dog

warm breach Feb 1, 2023, 3:09 AM

#

just setting "dog"'s interned field to 0 or NOT_INTERNED results in no error

#

but I'm not sure what the implication of that is

#

I guess "dog"'s intern allocation is never freed?

pliant tusk Feb 1, 2023, 5:25 AM

#

warm breach So there's `sys.intern` to intern strings, is there some way to unintern strings...

!e you can get a reference to the interned dictionary and possibly free them from there. ```py
from fishhook.asm import *

@hook(pythonapi.PyDict_SetDefault, restype=py_object, argtypes=[py_object]*3)
def setdefault(self, key, value):
if key == 'MAGICVAL':
return self
return pythonapi.PyDict_SetDefault(self, key, value)

pythonapi.PyUnicode_InternFromString.restype = py_object
interned = pythonapi.PyUnicode_InternFromString(b'MAGICVAL')
setdefault.unhook()

print(interned)

fallen slateBOT Feb 1, 2023, 5:25 AM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

{'type': 'type', 'AttributeError': 'AttributeError', '__qualname__': '__qualname__', 'obj': 'obj', 'update': 'update', '__dict__': '__dict__', 'getattr': 'getattr', 'setattr': 'setattr', 'hasattr': 'hasattr', '__doc__': '__doc__', '__name__': '__name__', '__module__': '__module__', 'replace': 'replace', 'old': 'old', 'new': 'new', 'sys': 'sys', 'name': 'name', '_DeadlockError': '_DeadlockError', 'waiters': 'waiters', 'count': 'count', 'owner': 'owner', 'wakeup': 'wakeup', 'lock': 'lock', 'allocate_lock': 'allocate_lock', '_thread': '_thread', 'self': 'self', 'add': 'add', 'get': 'get', '_blocking_on': '_blocking_on', 'set': 'set', 'get_ident': 'get_ident', 'seen': 'seen', 'tid': 'tid', 'me': 'me', 'release': 'release', 'acquire': 'acquire', 'has_deadlock': 'has_deadlock', 'RuntimeError': 'RuntimeError', 'id': 'id', 'format': 'format', '__repr__': '__repr__', '__init__': '__init__', '_ModuleLock': '_ModuleLock', '_DummyModuleLock': '_DummyModuleLock', '_lock': '_lock', '_name': '_name',
... (truncated - too long)

Full output: too long to upload

warm breach Feb 1, 2023, 5:25 AM

#

pliant tusk !e you can get a reference to the interned dictionary and possibly free them fro...

wait what

#

how did you get a reference to the intern dict

#

I have been trying for the last hour

pliant tusk Feb 1, 2023, 5:26 AM

#

i used fishhook.asm to hook the PyDict_SetDefault C function, then I called PyUnicode_InternFromString which calls PyDict_SetDefault(interned_dict, ...)

warm breach Feb 1, 2023, 5:28 AM

#

ah hm

pliant tusk Feb 1, 2023, 5:28 AM

#

warm breach wait what

note that fishhook.asm only explicitly supports Intel x86 assembly right now, but i am looking into (ab)using ctypes internal cffi to perform hooks

warm breach Feb 1, 2023, 5:28 AM

#

so it can hook the c functions?

pliant tusk Feb 1, 2023, 5:29 AM

#

*yea, but the catch is that sometimes, the functions are inlined, and thus cannot be hooked

#

although if you wrote a patchfinder, you could use it on arbitrary addresses, and pass in that

feral island Feb 1, 2023, 5:30 AM

#

warm breach I have been trying for the last hour

did you try gc.get_referents?

#

or referrers, always forget which is which

pliant tusk Feb 1, 2023, 5:32 AM

#

!e ```py
import gc
print([o for o in gc.get_referrers('abc') if type(o) == dict and o.get('abc') == 'abc'])

fallen slateBOT Feb 1, 2023, 5:32 AM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

[]

warm breach Feb 1, 2023, 5:32 AM

#

get_referrers doesn't seem to include the intern dict

#

so it looks like it's in this entry within the _is struct https://github.com/python/cpython/blob/main/Include/internal/pycore_interp.h#L186

fallen slateBOT Feb 1, 2023, 5:33 AM

#

Include/internal/pycore_interp.h line 186

struct _Py_interp_cached_objects cached_objects;```

warm breach Feb 1, 2023, 5:33 AM

#

which is technically accessible via the stable api PyInterpreterState_Get

#

but that struct is massive

pliant tusk Feb 1, 2023, 5:33 AM

#

warm breach so it can hook the c functions?

once i add ctypes cffi to fishhook you could just use fishhook.asm

warm breach Feb 1, 2023, 5:34 AM

#

pliant tusk once i add ctypes cffi to fishhook you could just use fishhook.asm

are you able to get the address of the c function from the FuncPtr?

pliant tusk Feb 1, 2023, 5:34 AM

#

yea

#

!e ```py
import fishhook.asm
import inspect

print(inspect.getsource(fishhook.asm.addr))```

fallen slateBOT Feb 1, 2023, 5:35 AM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | def addr(cfunc):
002 |     ptr = c_void_p.from_address(addressof(cfunc))
003 |     return ptr.value

warm breach Feb 1, 2023, 5:35 AM

#

hm pithink

#

what exactly is at that address though

pliant tusk Feb 1, 2023, 5:36 AM

#

warm breach hm <:pithink:652247559909277706>

assembly

#

https://github.com/chilaxan/fishhook/blob/master/fishhook/asm.py#L49-L52 fishhook.asm modifies the assembly, adding in a relative jump

fallen slateBOT Feb 1, 2023, 5:37 AM

#

fishhook/asm.py lines 49 to 52

n_ptr = addr(injected)
offset = n_ptr - o_ptr - 5
jmp = b'\xe9' + (offset & ((1 << 32) - 1)).to_bytes(4, ENDIAN)
mem[:] = jmp```

warm breach Feb 1, 2023, 5:39 AM

#

hm firHmm

pliant tusk Feb 1, 2023, 5:41 AM

#

warm breach hm <:firHmm:974893065288421416>

yes, fishhook.asm is cursed as hell

#

ironically, i originally wrote this code to get the interned dictionary

warm breach Feb 1, 2023, 5:42 AM

#

theoretically just defining the PyInterpreterState struct and calling the stable api would be much safer

warm breach Feb 1, 2023, 5:44 AM

#

pliant tusk !e you can get a reference to the interned dictionary and possibly free them fro...

could this override Py_IS and affect behavior of the is keyword then 👀

pliant tusk Feb 1, 2023, 5:45 AM

#

warm breach could this override `Py_IS` and affect behavior of the `is` keyword then 👀

technically, yes, but you would need to know what address to patch

warm breach Feb 1, 2023, 5:45 AM

#

hm? It's a stable pythonapi

feral island Feb 1, 2023, 5:45 AM

#

the memory location at which it exists isn't stable

pliant tusk Feb 1, 2023, 5:45 AM

#

there are a lot of things defined as a macro and as a function

feral island Feb 1, 2023, 5:46 AM

#

haven't checked but probably the is keyword just does a pointer comparison straight in ceval.c, not a function call

#

yes ``` case TARGET(IS_OP): {
PyObject *right = POP();
PyObject *left = TOP();
int res = (left == right)^oparg;
PyObject *b = res ? Py_True : Py_False;

pliant tusk Feb 1, 2023, 5:47 AM

#

feral island haven't checked but probably the `is` keyword just does a pointer comparison str...

you could technically patch that with fishhook.asm but you would need to know where to inject

#

and it would be unstable as hell

warm breach Feb 1, 2023, 5:51 AM

#

pliant tusk !e you can get a reference to the interned dictionary and possibly free them fro...

oh does this only work on 32 bit

pliant tusk Feb 1, 2023, 5:52 AM

#

it works on 64bit

#

using it on my 64bit mac rn

warm breach Feb 1, 2023, 5:52 AM

#

I ran that in 3.11.0 and it segfaults pithink

pliant tusk Feb 1, 2023, 5:52 AM

#

what cpu do you have?

warm breach Feb 1, 2023, 5:52 AM

#

64bit ubuntu

pliant tusk Feb 1, 2023, 5:52 AM

#

warm breach 64bit ubuntu

what cpu architecture?

warm breach Feb 1, 2023, 5:53 AM

#

AMD64

pliant tusk Feb 1, 2023, 5:53 AM

#

ah

#

it only supports Intelx86 rn

warm breach Feb 1, 2023, 5:53 AM

#

or, well, platform.processor() says x86_64

pliant tusk Feb 1, 2023, 5:53 AM

#

I am working on adding support for more

warm breach Feb 1, 2023, 5:54 AM

#

but I'm on the same python binary right pithink

#

the jump instruction is different?

feral island Feb 1, 2023, 5:54 AM

#

binaries and instructions are completely different on different architectures

pliant tusk Feb 1, 2023, 5:55 AM

#

warm breach the jump instruction is different?

fishhook.asm is working with the raw instructions that are sent to your cpu

#

not python opcodes

#

@warm breach can you compile the following?
code.asm

jmp $+0xff
``` with `nasm code.asm -o code` on your system? then send me the output file?

warm breach Feb 1, 2023, 6:07 AM

#

pliant tusk <@233059161401720832> can you compile the following? code.asm ```asm jmp $+0xff...

📎 code

pliant tusk Feb 1, 2023, 6:08 AM

#

interesting

#

it should have worked

#

was it a segfault or a bus error?

#

also when did the crash happen, did it happen after defining the hook or when the hook was triggered?

warm breach Feb 1, 2023, 6:12 AM

#

pliant tusk also when did the crash happen, did it happen after defining the hook or when th...

it doesn't crash if I comment the line

# interned = pythonapi.PyUnicode_InternFromString(b'MAGICVAL')

pliant tusk Feb 1, 2023, 6:12 AM

#

ok so it is crashing when the hook is triggered

#

can you open up fishhook/asm.py and add a print for offset and lmk what the offset is

warm breach Feb 1, 2023, 6:13 AM

#

Fatal Python error: Segmentation fault

Current thread 0x00007f7773827280 (most recent call first):
  File "/home/ionite/repos/python/hook_test.py", line 10 in <module>
fish: Job 1, 'PYTHONDEVMODE=1 python3 hook_te…' terminated by signal SIGSEGV (Address boundary error)

warm breach Feb 1, 2023, 6:16 AM

#

pliant tusk can you open up `fishhook/asm.py` and add a print for `offset` and lmk what the ...

uh

#

it was something like

45659537313787
45932713041915

#

        offset = n_ptr - o_ptr - 5
+       print(offset)

pliant tusk Feb 1, 2023, 6:17 AM

#

it printed twice?

warm breach Feb 1, 2023, 6:17 AM

#

no it's just different each time

#

pliant tusk Feb 1, 2023, 6:17 AM

#

yea thats normal. But your allocations are way far apart

#

so my relative jump isnt far enough

#

maybe I should do PUSH (absolute address) + RET

#

basically, the opcode I use e9 is a relative short jump

#

so in order for it to work, the offset needs to fit into 32bits (4 bytes) or less

warm breach Feb 1, 2023, 6:20 AM

#

ah hm pithink

pliant tusk Feb 1, 2023, 6:20 AM

#

!e py print(hex(45659537313787))

fallen slateBOT Feb 1, 2023, 6:20 AM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

0x2986f0808ffb

feral island Feb 1, 2023, 6:20 AM

#

pliant tusk yea thats normal. But your allocations are way far apart

ASLR in action probably?

pliant tusk Feb 1, 2023, 6:20 AM

#

this is more than 4 bytes, so it cuts it short, and jumps to the wrong location

pliant tusk Feb 1, 2023, 6:21 AM

#

feral island ASLR in action probably?

afaik, if it was, then ctypes calls wouldn't work on @warm breachs system at all

#

and my mac should have the same crash

#

probably a quirk with ubuntu's allocator vs macos

warm breach Feb 1, 2023, 6:21 AM

#

it crashes on colab as well it seems (ubuntu intel)

#

though that might be related to jupyter

pliant tusk Feb 1, 2023, 6:22 AM

#

replacing the relative jump with a PUSH DWORD addr; RET; would maybe work? i think

pliant tusk Feb 1, 2023, 6:22 AM

#

warm breach it crashes on colab as well it seems (ubuntu intel)

yea its probably a quirk with ubuntu's allocator

#

although it works on the bot here so idk. regardless, the crash is due to offset being larger than 4 bytes, so i need to change my jump strategy

warm breach Feb 1, 2023, 6:26 AM

#

petition to add PyUnicode_GetInternDict 😔

gray galleon Feb 1, 2023, 8:08 AM

#

will python have destructuring parameters

#

def f(a, [b, c], d):

sacred yew Feb 1, 2023, 8:17 AM

#

gray galleon will python have destructuring parameters

was removed in 2 to 3 transition I believe

#

https://peps.python.org/pep-3113/

PEP 3113 – Removal of Tuple Parameter Unpacking | peps.python.org

Python Enhancement Proposals (PEPs)

gray galleon Feb 1, 2023, 10:29 AM

#

warm breach it crashes on colab as well it seems (ubuntu intel)

you writing asm in python?

pliant tusk Feb 1, 2023, 3:15 PM

#

gray galleon you writing asm in python?

yea, fishhook.asm uses assembly to do very low level hooks

warm breach Feb 1, 2023, 4:23 PM

#

#

apparently just swapping 2 objects is quite complex in 3.12 😔

#

most object instance dicts are "VM" managed now apparently

pliant tusk Feb 1, 2023, 4:29 PM

#

warm breach apparently just swapping 2 objects is quite complex in 3.12 😔

to be fair object swapping like that was already very unsafe

warm breach Feb 1, 2023, 4:29 PM

#

pliant tusk to be fair object swapping like that was already very unsafe

I just merged https://github.com/ionite34/einspect/blob/main/src/einspect/views/view_base.py#L320-L321

fallen slateBOT Feb 1, 2023, 4:29 PM

#

src/einspect/views/view_base.py lines 320 to 321

def swap(self, other):
    """Swaps data at other Viewable with this View."""```

warm breach Feb 1, 2023, 4:30 PM

#

seems to work for 3.8-3.12

#

not sure if I'm missing any logic pithink

#

https://github.com/ionite34/einspect/blob/main/src/einspect/views/view_base.py#L341 not sure how the lifetime of this works

fallen slateBOT Feb 1, 2023, 4:31 PM

#

src/einspect/views/view_base.py line 341

buf = ctypes.create_string_buffer(other.mem_allocated)```

warm breach Feb 1, 2023, 4:31 PM

#

is it only dropped after the function scope ends?

pliant tusk Feb 1, 2023, 4:46 PM

#

warm breach is it only dropped after the function scope ends?

i believe so

warm breach Feb 1, 2023, 4:55 PM

#

this is so cursed 🥴

from einspect import view

t = (1, 2, 3, 4, 5)

view(t).swap(print)

(1, 2, 3, 4, 5)(print)
# (1, 2, 3, 4, 5)

modest dragon Feb 1, 2023, 8:19 PM

#

lovely new theme

#

looks very appealing

dusk comet Feb 1, 2023, 9:35 PM

#

Wolfram Mathematica have several ways to call a function:

f[x, y]
If function takes only one argument you can also do that:
f @ x
x // f

This might be funny to implement in python

onyx shadow Feb 1, 2023, 9:47 PM

#

warm breach this is so cursed 🥴 ```py from einspect import view t = (1, 2, 3, 4, 5) view...

NO

warm breach Feb 1, 2023, 9:54 PM

#

dusk comet Wolfram Mathematica have several ways to call a function: 1) f[x, y] If function...

!e

from einspect import view

fn = lambda a, b: a(b)
view(type(fn))["__matmul__"] = fn
view(type(fn))["__rfloordiv__"] = fn

def show(x):
    print(">", x, "<")
    
show @ 100
125 // show

fallen slateBOT Feb 1, 2023, 9:55 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | > 100 <
002 | > 125 <

pliant tusk Feb 1, 2023, 10:20 PM

#

!e ```py
from fishhook import *

@hook(type(lambda:0), name='matmul')
@hook(type(lambda:0), name='rfloordiv')
def func(self, other):
return self(other)

def show(x):
print('>', x, '<')

show @ 100
125 // show```

fallen slateBOT Feb 1, 2023, 10:20 PM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | > 100 <
002 | > 125 <

warm breach Feb 1, 2023, 10:35 PM

#

pliant tusk !e ```py from fishhook import * @hook(type(lambda:0), name='__matmul__') @hook(...

ah the name kwarg is nice, didn't know about that

pliant tusk Feb 1, 2023, 10:52 PM

#

warm breach ah the name kwarg is nice, didn't know about that

there is also a func kwarg, so you can hook like this hook(int, '__matmul__', lambda a, b: (a, b))

wind helm Feb 2, 2023, 7:36 AM

#

Morning guys
Quick question I hope. The inspect.getcallargs method is marked as deprecated, but in fact is very useful when you have the function reference and you want to verify whether arguments are compatible.

The guidelines suggest to use the bind or bind_partial but they implementation is different.

What could be the best workaround?
Using the functools.partialmethod?

wind helm Feb 2, 2023, 7:55 AM

#

Though, unless I'm misreading, I think I can still create a signature from a callable and then invoke the two methods

gray galleon Feb 2, 2023, 9:37 AM

#

is there a way to create signature object out of thin air like in the raku language?
won’t be of any use, just curious

flat gazelle Feb 2, 2023, 9:48 AM

#

gray galleon is there a way to create signature object out of thin air like in the raku langu...

see below

#

!e
you can just call the constructor afaict, e.g.

import inspect
print(inspect.Signature([inspect.Parameter('a', inspect.Parameter.POSITIONAL_ONLY)]))

fallen slateBOT Feb 2, 2023, 9:49 AM

#

@flat gazelle :white_check_mark: Your 3.11 eval job has completed with return code 0.

(a, /)

gray galleon Feb 2, 2023, 9:49 AM

#

seems verbose but whatever

flat gazelle Feb 2, 2023, 9:49 AM

#

you won't get the nice raku syntax, yeah

gray galleon Feb 2, 2023, 9:51 AM

#

the hard part is creating a function from that signature

flat gazelle Feb 2, 2023, 9:54 AM

#

yeah, that was a non-goal of the PEP afaict. https://peps.python.org/pep-0362/ It's just for introspection, not for constructing functions

PEP 362 – Function Signature Object | peps.python.org

Python Enhancement Proposals (PEPs)

empty echo Feb 2, 2023, 11:07 AM

#

warm breach petition to add `PyUnicode_GetInternDict` 😔

Is there a PyUnicode_Intern_And_Mangle?

deep nova Feb 2, 2023, 11:00 PM

#

Quick question

#

Python allows numerous forms of escape character

#

By name, by four digit unicode, by eight digit unicode, by two digit hex, or by two digit octal

#

Are all of these strictly necessary? I'm starting into the real meat of my lexer now, and I'm wondering what I should take pains to support

spark magnet Feb 2, 2023, 11:02 PM

#

deep nova Are all of these strictly necessary? I'm starting into the real meat of my lexer...

not sure what you mean by "necessary": they are part of the language

deep nova Feb 2, 2023, 11:04 PM

#

They are a part of python

#

I'm working on building my own language. I can define whatever escape behaviour I please

spark magnet Feb 2, 2023, 11:04 PM

#

deep nova They are a part of *python*

i guess i don't know what lexer you are writing

spark magnet Feb 2, 2023, 11:04 PM

#

deep nova I'm working on building my own language. I can define whatever escape behaviour ...

oh, then you get to decide

deep nova Feb 2, 2023, 11:05 PM

#

Which is why I'm here. I'm wondering if some of the escape formats in Python are vestigial or else largely irrelevant. I'm just seeking context as to the design choices Python has made so I might decide how much to emulate

spark magnet Feb 2, 2023, 11:06 PM

#

deep nova Which is why I'm here. I'm wondering if some of the escape formats in Python are...

i would say, start simple, and add more later if you want to.

#

octal is definitely one to skip

deep nova Feb 2, 2023, 11:08 PM

#

The only times I've ever used escape sequences, other than simple things like \n or \t

#

Has been with unicode codepoints — \uXXXX

#

So I guess I'll start there 🙂

spark magnet Feb 2, 2023, 11:08 PM

#

good plan

feral island Feb 2, 2023, 11:09 PM

#

deep nova Are all of these strictly necessary? I'm starting into the real meat of my lexer...

the \x ones are very nice for writing bytes literals

#

I find the \N named literals useful for when you're dealing with unusual unicode characters

#

but these are definitely things you can add later

deep nova Feb 2, 2023, 11:10 PM

#

Indeed. I just want to be forward thinking is all 🙂

raven ridge Feb 2, 2023, 11:10 PM

#

deep nova By name, by four digit unicode, by eight digit unicode, by two digit hex, or by ...

!e ```py
print("\N{Slightly Smiling Face}")

fallen slateBOT Feb 2, 2023, 11:10 PM

#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

🙂

raven ridge Feb 2, 2023, 11:10 PM

#

those are so fun 🙂

deep nova Feb 2, 2023, 11:11 PM

#

Boss

feral island Feb 2, 2023, 11:11 PM

#

though probably a pain to put in your toy language because you need a unicode database at compile time 🙂

deep nova Feb 2, 2023, 11:12 PM

#

Unicode is non-negotiable

#

Long term, anyway

raven ridge Feb 2, 2023, 11:12 PM

#

you can represent every possible character using either the 8-character \U or the \N, so from that point of view, if what you care about is the minimum that's necessary, you can get by with only either one of those.

warm breach Feb 2, 2023, 11:12 PM

#

spark magnet octal is definitely one to skip

octal is useful for permissions bits though

Path(...).chmod(0o777)

spark magnet Feb 2, 2023, 11:13 PM

#

warm breach octal is useful for permissions bits though ```py Path(...).chmod(0o777) ```

that's not an escape!

raven ridge Feb 2, 2023, 11:13 PM

#

this is about string literals

deep nova Feb 2, 2023, 11:13 PM

#

😮

#

Shame

warm breach Feb 2, 2023, 11:13 PM

#

we have octal unicode literals...?

raven ridge Feb 2, 2023, 11:13 PM

#

The only octal escape I use in strings with any particular regularity is \0

deep nova Feb 2, 2023, 11:13 PM

#

warm breach we have octal unicode literals...?

That's what I said

spark magnet Feb 2, 2023, 11:14 PM

#

"\012" == "\n"

raven ridge Feb 2, 2023, 11:16 PM

#

Other than \0, the octal escapes are significantly less recognizable than the single-letter escapes like \n or the hex escapes

#

like, other than \0, I don't think there's another octal escape that readers will be able to parse more easily than if it was written in another way.

#

and even \0 is annoying if you're trying to use it in a string that should also contain ASCII digits

#

since if you're trying to immediately follow the \0 with digits you need to pad it out to \000

deep nova Feb 2, 2023, 11:19 PM

#

Honestly, I thought that \0 was just the null terminator

raven ridge Feb 2, 2023, 11:19 PM

#

so if I were greenfielding a language, I'd probably treat \0 as a special case that's equivalent to \U00000000, and then not accept any other octal escape.

deep nova Feb 2, 2023, 11:19 PM

#

Was in a category of its own

#

Rockin

raven ridge Feb 2, 2023, 11:19 PM

#

deep nova Honestly, I thought that `\0` was just the null terminator

it's not, but that's probably most people's mental model for it.

#

at least, it's not in Python or C. There might be languages where it is. 🤷‍♂️

deep nova Feb 2, 2023, 11:20 PM

#

What is the null terminator in python?

raven ridge Feb 2, 2023, 11:20 PM

#

it is \0 - it's just that \0 isn't its own special category, it is an octal escape for U+0000

deep nova Feb 2, 2023, 11:21 PM

#

Ahh

raven ridge Feb 2, 2023, 11:22 PM

#

!e ```py
print("\U00000000" is "\0")

fallen slateBOT Feb 2, 2023, 11:22 PM

#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | <string>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
002 | True

raven ridge Feb 2, 2023, 11:24 PM

#

!e more interestingly, actually: print(repr("\0000"))

fallen slateBOT Feb 2, 2023, 11:24 PM

#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

'\x000'

raven ridge Feb 2, 2023, 11:25 PM

#

the input string is \000 followed by 0, the repr is \x00 followed by 0 - that shows that \0 is just a special case for how to spell \000

#

with the octal escapes, you can give up to 3 digits, and you must give 3 digits if the thing immediately after the escape sequence is a digit that shouldn't be interpreted as part of the escape.

spark magnet Feb 2, 2023, 11:26 PM

#

deep nova What is the null terminator in python?

there is no terminator in Python strings.

raven ridge Feb 2, 2023, 11:28 PM

#

right - I meant that the character that is used as a null terminator in other languages can be written as \0 or \x00 or \U00000000 in Python. That may not have been clear.

#

it's also extremely rare to need to refer to that character in a Unicode string, though it's quite common to need to refer to it in a byte string.

warm breach Feb 3, 2023, 12:41 AM

#

spark magnet there is no terminator in Python strings.

it may be internally null terminated though right pithink at least for cpython

#

https://github.com/python/cpython/blob/3.11/Objects/unicodeobject.c#L1233 https://github.com/python/cpython/blob/3.11/Objects/unicodeobject.c#L1261-L1262

fallen slateBOT Feb 3, 2023, 12:44 AM

#

Objects/unicodeobject.c line 1233

new_size = sizeof(Py_UNICODE) * ((size_t)length + 1);```
`Objects/unicodeobject.c` lines 1261 to 1262
```c
_PyUnicode_WSTR(unicode)[0] = 0;
_PyUnicode_WSTR(unicode)[length] = 0;```

warm breach Feb 3, 2023, 12:46 AM

#

https://github.com/python/cpython/blob/3.11/Include/cpython/unicodeobject.h#L205 https://github.com/python/cpython/blob/3.11/Include/cpython/unicodeobject.h#L215

fallen slateBOT Feb 3, 2023, 12:46 AM

#

Include/cpython/unicodeobject.h line 205

wchar_t *wstr;              /* wchar_t representation (null-terminated) */```
`Include/cpython/unicodeobject.h` line 215
```h
char *utf8;                 /* UTF-8 representation (null-terminated) */```

spark magnet Feb 3, 2023, 12:51 AM

#

warm breach it may be internally null terminated though right <:pithink:652247559909277706> ...

I guess so, but you can't tell from the outside.

raven ridge Feb 3, 2023, 12:57 AM

#

yeah, it is for CPython, but that's an implementation detail.

#

that allows https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUTF8 to be returned as a null-terminated const char* "for free" for ASCII strings

frank cradle Feb 3, 2023, 5:58 AM

#

https://www.lasros.com/article/category/python

grave jolt Feb 3, 2023, 7:39 AM

#

warm breach octal is useful for permissions bits though ```py Path(...).chmod(0o777) ```

literally magic numbers 😔

dull ember Feb 3, 2023, 12:50 PM

#

hey

winged dagger Feb 3, 2023, 5:21 PM

#

!d vectorize

fallen slateBOT Feb 3, 2023, 5:21 PM

#

vectorize

The @vectorize decorator

Numba’s vectorize allows Python functions taking scalar input arguments to be used as NumPy ufuncs. Creating a traditional NumPy ufunc is not the most straightforward process and involves writing some C code. Numba makes this easy. Using the vectorize() decorator, Numba can compile a pure Python function into a ufunc that operates over NumPy arrays as fast as traditional ufuncs written in C.

Using vectorize(), you write your function as operating over input scalars, rather than arrays. Numba will generate the surrounding loop (or kernel) allowing efficient iteration over the actual inputs.

The vectorize() decorator has two modes of operation:

frigid bison Feb 3, 2023, 8:19 PM

#

I'm trying to use the exec statement in Python 2.7 for running a code object, but I'm getting this weird error that doesn't give any results on google.

exec obj in {}```

assert_builtin() takes exactly 1 argument (0 given)```

#

any idea what it can be?

feral island Feb 3, 2023, 8:20 PM

#

frigid bison I'm trying to use the `exec` statement in Python 2.7 for running a code object, ...

you probably have an assert_builtin() function in the code object you are running

#

though the bigger question is, why are you running Python 2.7

frigid bison Feb 3, 2023, 8:21 PM

#

reverse engineering purposes which forces me to use the target's python version

#

so I have to rewrite my pyarmor unpacker 🙄

frigid bison Feb 3, 2023, 8:22 PM

#

feral island you probably have an `assert_builtin()` function in the code object you are runn...

ah this is possible but it does run when ran like normal

feral island Feb 3, 2023, 8:23 PM

#

maybe the code object you are running is for assert_builtin and you're not passing the args

frigid bison Feb 3, 2023, 8:23 PM

#

ah I have found the issue, it's Pyarmor specific

#

https://github.com/dashingsoft/pyarmor/blob/master/src/protect_code.pt

GitHub

pyarmor/protect_code.pt at master · dashingsoft/pyarmor

A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts. - pyarmor/protect_code.pt at master · dashingsoft/pyarmor

#

thanks, I thought it was a Python error and not a library error

boreal umbra Feb 3, 2023, 8:33 PM

#

How difficult is it to reverse obfuscate pyarmored code, keeping sensible variable names aside?

frigid bison Feb 3, 2023, 9:27 PM

#

it's pretty easy since it's sort of a packer. It doesn't actually change the python bytecode, it just encrypts it and decrypts it at runtime.

#

You can read about it here https://github.com/Svenskithesource/PyArmor-Unpacker#write-up

deep nova Feb 4, 2023, 6:18 AM

#

Quick question

#

Does python's lexer treat -123 as a single token, or is it lexed as an operator and a numeric literal?

sacred yew Feb 4, 2023, 6:20 AM

#

deep nova Does python's lexer treat `-123` as a single token, or is it lexed as an operato...

https://docs.python.org/3/reference/lexical_analysis.html#integer-literals operator and literal

Python documentation

2. Lexical analysis

A Python program is read by a parser. Input to the parser is a stream of tokens, generated by the lexical analyzer. This chapter describes how the lexical analyzer breaks a file into tokens. Python...

deep nova Feb 4, 2023, 6:20 AM

#

Wonderful!

#

Thanks

warm breach Feb 4, 2023, 1:26 PM

#

!e

from fishhook import hook

@hook(list | str)
def __neg__(self):
    return self

fallen slateBOT Feb 4, 2023, 1:26 PM

#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 3, in <module>
003 |   File "/snekbox/user_base/lib/python3.11/site-packages/fishhook/fishhook.py", line 295, in wrapper
004 |     orig_val = vars(cls).get(name, NULL)
005 |                ^^^^^^^^^
006 | TypeError: vars() argument must have __dict__ attribute

warm breach Feb 4, 2023, 1:27 PM

#

@pliant tusk this segfaults on 3.10 somehow 🥴

#

#

did vars somehow work on unions in 3.10?

quick snow Feb 4, 2023, 1:28 PM

#

!e print(vars(list | str))

fallen slateBOT Feb 4, 2023, 1:29 PM

#

@quick snow :x: Your 3.10 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 1, in <module>
003 | TypeError: vars() argument must have __dict__ attribute

quick snow Feb 4, 2023, 1:29 PM

#

nope

pliant tusk Feb 4, 2023, 3:04 PM

#

warm breach <@274715613115711488> this segfaults on 3.10 somehow 🥴

Fishhook doesn't support unions at all rn

#

But ig I need to add protection from it

#

Or support

warm breach Feb 4, 2023, 3:04 PM

#

yeah I'm aware just wondering why 3.10 segfaults pithink

#

it seems the vars would stop it

pliant tusk Feb 4, 2023, 3:04 PM

#

Oh I have no idea

#

I'll take a look later

warm breach Feb 4, 2023, 3:05 PM

#

pliant tusk Or support

I'm gonna use this I think

def _to_types(
    types_or_unions: Sequence[type | UnionType],
) -> Generator[type, None, None]:
    """Yields types from a Sequence of types or unions."""
    for t in types_or_unions:
        if isinstance(t, UnionType):
            yield from _to_types(get_args(t))
        elif isinstance(t, type):
            yield t
        else:
            raise TypeError(f"cls must be a type or Union, not {t.__class__.__name__}")

#

you might want to use __args__ instead of typing.get_args if you don't want to rely on typing code breaking due to hooks though

pliant tusk Feb 4, 2023, 3:07 PM

#

I'll figure something out. Need to retain my 3.8 support

warm breach Feb 4, 2023, 3:08 PM

#

well you could support *classes which can be types or unions

#

so @hook(str, int) or @hook(str | int)

#

technically 3.8 can also do @hook(Union[str, int]) (but that's pretty awkward)

pliant tusk Feb 4, 2023, 3:08 PM

#

warm breach well you could support `*classes` which can be types or unions

Yea that's what I will probably do

warm breach Feb 4, 2023, 3:10 PM

#

warm breach I'm gonna use this I think ```py def _to_types( types_or_unions: Sequence[ty...

actually 3.8 doesn't have types.UnionType 😔

pliant tusk Feb 4, 2023, 3:14 PM

#

warm breach actually 3.8 doesn't have `types.UnionType` 😔

To be fair you can already stack @hook with fishhook

warm breach Feb 4, 2023, 3:33 PM

#

pliant tusk To be fair you can already stack `@hook ` with fishhook

yeah, it just gets kind of long vertically though

pliant tusk Feb 4, 2023, 3:49 PM

#

Yea true

warm breach Feb 4, 2023, 4:06 PM

#

pliant tusk Yea true

!e also apparently allocating PySequenceMethods for __getitem__ has an internal type check for int index?

from functools import partial
from einspect import impl

@impl(type)
def __getitem__(self, item):
    return partial(self, item)

print(map[int])

fallen slateBOT Feb 4, 2023, 4:06 PM

#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 8, in <module>
003 | TypeError: sequence index must be integer, not 'type'

feral island Feb 4, 2023, 4:07 PM

#

warm breach !e also apparently allocating `PySequenceMethods` for `__getitem__` has an inter...

I think the slot takes an int argument

pliant tusk Feb 4, 2023, 4:07 PM

#

warm breach !e also apparently allocating `PySequenceMethods` for `__getitem__` has an inter...

Yea, that calls sq_item(PyObject*, int)

#

Need to use mp_item

warm breach Feb 4, 2023, 4:08 PM

#

I guess I'll add a flag to choose between sequence or mapping

pliant tusk Feb 4, 2023, 4:08 PM

#

Fishhook just allocates all of the structs it needs to on first run

#

Less checks needed

#

And also removes the danger from subclasses

warm breach Feb 4, 2023, 4:08 PM

#

https://github.com/ionite34/einspect/blob/main/src/einspect/structs/slots_map.py#L136-L141 https://github.com/ionite34/einspect/blob/main/src/einspect/structs/slots_map.py#L124-L128

fallen slateBOT Feb 4, 2023, 4:08 PM

#

src/einspect/structs/slots_map.py lines 136 to 141

SLOTS_MAPPING: Final[dict[str, SlotsLike]] = {
    "__len__": tp_as_mapping["mp_length"],
    "__getitem__": tp_as_mapping["mp_subscript"],
    "__setitem__": tp_as_mapping["mp_ass_subscript"],
    "__delitem__": tp_as_mapping["mp_ass_subscript"],
}```
`src/einspect/structs/slots_map.py` lines 124 to 128
```py
SLOTS_SEQUENCE: Final[dict[str, SlotsLike]] = {
    "__len__": tp_as_sequence["sq_length"],
    "__add__": tp_as_sequence["sq_concat"],
    "__mul__": tp_as_sequence["sq_repeat"],
    "__getitem__": tp_as_sequence["sq_item"],```

warm breach Feb 4, 2023, 4:08 PM

#

__getitem__ is in both so it finds the sequence one first

pliant tusk Feb 4, 2023, 4:09 PM

#

warm breach https://github.com/ionite34/einspect/blob/main/src/einspect/structs/slots_map.py...

That's another reason why fishhook acts the way it does

#

Do you handle subclasses safely in einspect

warm breach Feb 4, 2023, 4:09 PM

#

I guess you might want to not allocate everything?

#

not sure, I'll add a alloc="all" as well probably

pliant tusk Feb 4, 2023, 4:10 PM

#

warm breach I guess you might want to not allocate everything?

Regular heap classes have all of them allocated

warm breach Feb 4, 2023, 4:10 PM

#

pliant tusk Do you handle subclasses safely in einspect

like for method hooks?

pliant tusk Feb 4, 2023, 4:10 PM

#

And when you start adding hooks different places in python start to make assumptions

pliant tusk Feb 4, 2023, 4:10 PM

#

warm breach like for method hooks?

Like hook int should affect bool

warm breach Feb 4, 2023, 4:11 PM

#

pliant tusk Like hook int should affect bool

isn't that just automatic?

#

!e

from einspect import impl

@impl(int)
def abc(self):
    return self

print((5).abc())
print(True.abc())

fallen slateBOT Feb 4, 2023, 4:11 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 5
002 | True

warm breach Feb 4, 2023, 4:11 PM

#

I don't think I did anything special

pliant tusk Feb 4, 2023, 4:12 PM

#

!e ```py
from fishhook import *

@hook(int)
def getitem(self, idx):
return (self, idx)

print(True[0])```

fallen slateBOT Feb 4, 2023, 4:12 PM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | <string>:7: SyntaxWarning: 'bool' object is not subscriptable; perhaps you missed a comma?
002 | (True, 0)

warm breach Feb 4, 2023, 4:12 PM

#

oh hm

pliant tusk Feb 4, 2023, 4:12 PM

#

!e ```py
from einspect import impl

@impl(int)
def getitem(self, idx):
return (self, idx)

print(True[0])```

fallen slateBOT Feb 4, 2023, 4:12 PM

#

@pliant tusk :x: Your 3.11 eval job has completed with return code 1.

001 | <string>:7: SyntaxWarning: 'bool' object is not subscriptable; perhaps you missed a comma?
002 | Traceback (most recent call last):
003 |   File "<string>", line 7, in <module>
004 | TypeError: 'bool' object is not subscriptable

warm breach Feb 4, 2023, 4:14 PM

#

does that work by allocating subclass PyMethods as well

pliant tusk Feb 4, 2023, 4:14 PM

#

warm breach does that work by allocating subclass PyMethods as well

depends on if einspect is using setattr or setting slot pointers manually

warm breach Feb 4, 2023, 4:15 PM

#

it setattrs after making it mutable

#

unless it's a static attribute

pliant tusk Feb 4, 2023, 4:15 PM

#

warm breach it setattrs after making it mutable

😬 then you currently may have unexpected memory corruption

warm breach Feb 4, 2023, 4:15 PM

#

like __name__ will manually set tp_name

pliant tusk Feb 4, 2023, 4:16 PM

#

you need to alloc the correct struct for all subclasses of a given type before setattr

warm breach Feb 4, 2023, 4:16 PM

#

yeah the alloc is before setattr

pliant tusk Feb 4, 2023, 4:16 PM

#

def allocate_structs(cls):
    cls_mem = getmem(cls)
    for subcls in type(cls).__subclasses__(cls):
        allocate_structs(subcls)
    for offset, size in get_structs():
        cls_mem[offset] = cls_mem[offset] or alloc(size)
    return cls_mem
``` you need to do if for subclasses too

warm breach Feb 4, 2023, 4:16 PM

#

https://github.com/ionite34/einspect/blob/main/src/einspect/views/view_type.py#L112-L118

fallen slateBOT Feb 4, 2023, 4:16 PM

#

src/einspect/views/view_type.py lines 112 to 118

# Check if this is a slots attr
if slot := get_slot(k):
    # Allocate sub-struct if needed
    self._try_alloc(slot)

with self.as_mutable():
    self._pyobject.setattr_safe(k, value)```

pliant tusk Feb 4, 2023, 4:17 PM

#

fallen slate `src/einspect/views/view_type.py` lines 112 to 118 ```py # Check if this is a sl...

@impl won't affect subclasses then

warm breach Feb 4, 2023, 4:17 PM

#

pliant tusk ```py def allocate_structs(cls): cls_mem = getmem(cls) for subcls in typ...

I guess it doesn't matter if there is a python class extending that? since heap types have all PyMethods?

pliant tusk Feb 4, 2023, 4:18 PM

#

warm breach I guess it doesn't matter if there is a python class extending that? since heap ...

heap types yea, but static stuff

warm breach Feb 4, 2023, 4:19 PM

#

speaking of, is there a way to check if a type is a heaptype

#

is the HEAPTYPE flag still valid

#

or did that switch to IMMUTABLETYPE

pliant tusk Feb 4, 2023, 4:19 PM

#

HEAPTYPE tells you the structure type

#

IMMUTABLETYPE controls whether it allows setattr

warm breach Feb 4, 2023, 4:20 PM

#

I guess only allocating non-HEAPTYPE subclasses would be fine?

pliant tusk Feb 4, 2023, 4:21 PM

#

warm breach I guess only allocating non-HEAPTYPE subclasses would be fine?

you should alloc for all subclasses to ensure type consistency

#

also you will need to add in some special handling for object

#

due to some assumptions made by the type constructor

warm breach Feb 4, 2023, 4:22 PM

#

what does object do 👀

pliant tusk Feb 4, 2023, 4:24 PM

#

internally when the type constructor walks backwards there are assumptions made about what a given struct having all of the tp_as_* structs allocated means

#

https://github.com/chilaxan/fishhook/blob/master/fishhook/fishhook.py#L101-L136 thats the reason for this

GitHub

fishhook/fishhook.py at master · chilaxan/fishhook

Contribute to chilaxan/fishhook development by creating an account on GitHub.

#

so much research went into making fishhook as close to stable as possible

warm breach Feb 4, 2023, 4:27 PM

#

pliant tusk also you will need to add in some special handling for `object`

also speaking of, do you know why some __new__ can't be accessed as a dunder and says something about not being safe

#

!e

from fishhook import hook, orig

@hook(int)
def __new__(self, *args):
    print("new int", args)
    return orig(int, *args)

int("2000")

fallen slateBOT Feb 4, 2023, 4:27 PM

#

@warm breach :x: Your 3.11 eval job has completed with return code 1.

001 | new int ('2000',)
002 | Traceback (most recent call last):
003 |   File "<string>", line 8, in <module>
004 |   File "<string>", line 6, in __new__
005 |   File "/snekbox/user_base/lib/python3.11/site-packages/fishhook/fishhook.py", line 246, in __call__
006 |     return get_cache_trace('orig', getframe(1))(*args, **kwargs)
007 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
008 | TypeError: int.__new__(int) is not safe, use object.__new__()

warm breach Feb 4, 2023, 4:27 PM

#

is there a way to call the original new here

#

could return object.__new__(self) like it suggests but the int will be 0

pliant tusk Feb 4, 2023, 4:28 PM

#

Not without rewrapping the original pointer

#

And hooking new is very unsafe for any type and will fail in weird ways if the end user changes it even a little bit so I didn't bother

pliant tusk Feb 4, 2023, 4:31 PM

#

pliant tusk And hooking new is very unsafe for any type and will fail in weird ways if the e...

To be fair I'm willing to make concessions like that in exchange for fishhook being dynamic enough that it rarely needs updates, even across versions

warm breach Feb 4, 2023, 4:31 PM

#

ah hm

#

apparently int.__new__ has the job of allocating its own array

#

I guess that's why it's not safe?

pliant tusk Feb 4, 2023, 4:32 PM

#

Yea

warm breach Feb 4, 2023, 4:33 PM

#

but you can still call int.__new__ normally no?

#

!e print(int.__new__(int, "123"))

fallen slateBOT Feb 4, 2023, 4:33 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

pliant tusk Feb 4, 2023, 4:37 PM

#

new has weird handling

#

!e print(vars(int).new)

fallen slateBOT Feb 4, 2023, 4:37 PM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

<built-in method __new__ of type object at 0x7fabe4a87760>

warm breach Feb 4, 2023, 4:40 PM

#

pliant tusk new has weird handling

ah found it

#

https://github.com/python/cpython/blob/3.11/Objects/typeobject.c#L7152-L7161

grave jolt Feb 4, 2023, 4:40 PM

#

Anyone knows where I can find a list of all dependencies required to build CPython? I don't want to install them, just curious

fallen slateBOT Feb 4, 2023, 4:40 PM

#

Objects/typeobject.c lines 7152 to 7161

/* If staticbase is NULL now, it is a really weird type.
   In the spirit of backwards compatibility (?), just shut up. */
if (staticbase && staticbase->tp_new != type->tp_new) {
    PyErr_Format(PyExc_TypeError,
                 "%s.__new__(%s) is not safe, use %s.__new__()",
                 type->tp_name,
                 subtype->tp_name,
                 staticbase->tp_name);
    return NULL;
}```

warm breach Feb 4, 2023, 4:41 PM

#

due to the hooking staticbase->tp_new != type->tp_new so that triggered this

warm breach Feb 4, 2023, 4:41 PM

#

grave jolt Anyone knows where I can find a list of all dependencies required to build CPyth...

https://devguide.python.org/getting-started/setup-building/#build-dependencies

grave jolt Feb 4, 2023, 4:42 PM

#

it doesn't list them

#

I mean stuff like... zlib

#

but that's the only one given

warm breach Feb 4, 2023, 4:43 PM

#

grave jolt I mean stuff like... `zlib`

these ones?

#

I'm not sure if there's a expanded list of pure deps

pliant tusk Feb 4, 2023, 4:44 PM

#

warm breach due to the hooking `staticbase->tp_new != type->tp_new` so that triggered this

yea

grave jolt Feb 4, 2023, 4:48 PM

#

warm breach these ones?

Those are all optional modules

#

like curses and sqlite

warm breach Feb 4, 2023, 4:49 PM

#

grave jolt Those are all _optional_ modules

well, zlib is also optional

grave jolt Feb 4, 2023, 4:49 PM

#

yeah I suppose

#

like... in a Python project you have a pyproject.toml or a requirements.txt listing the requirements

#

I don't know what's the analogue for C projects

warm breach Feb 4, 2023, 4:51 PM

#

grave jolt I don't know what's the analogue for C projects

makefile I guess, but even that is auto generated for cpython

#

https://github.com/python/cpython/blob/main/configure.ac

#

so it's dynamic based on a lot of logic

#

it essentially makes a best attempt at seeing if it can build cpython with what you've got

warm breach Feb 4, 2023, 6:01 PM

#

pliant tusk yea

have this working now

from einspect import impl, orig

@impl(int)
def __new__(cls, *args):
    print("in new:", cls, args)
    return orig(int).__new__(cls, *args) + 100

print(int("50"))
# in new: <class 'int'> ('50',)
# 150

#

had to store tp_new directly and make my own slot wrapper for it, while modifying that safety check in the original one https://github.com/ionite34/einspect/blob/dev/src/einspect/structs/py_type.py#L231-L232

fallen slateBOT Feb 4, 2023, 6:02 PM

#

src/einspect/structs/py_type.py lines 231 to 232

def __call__(self, *args: tuple, **kwds: dict):
    """Implements `tp_new_wrapper` with a modified safety check."""```

pliant tusk Feb 4, 2023, 6:03 PM

#

That'll do it

warm breach Feb 4, 2023, 6:03 PM

#

The original check claims to check that the most derived base that's not a heap type is this type, so I don't see why it even checks that tp_new is equal

#

https://github.com/ionite34/einspect/blob/dev/src/einspect/structs/py_type.py#L260

fallen slateBOT Feb 4, 2023, 6:04 PM

#

src/einspect/structs/py_type.py line 260

if staticbase and staticbase[0] != PyTypeObject.from_object(self._type):```

warm breach Feb 4, 2023, 6:04 PM

#

I've just modified it to actually do the base type check it talked about

pliant tusk Feb 4, 2023, 6:05 PM

#

A lot of the internals of types function have a lot of assumptions

pliant tusk Feb 4, 2023, 6:07 PM

#

warm breach had to store `tp_new` directly and make my own slot wrapper for it, while modify...

Do you insert the new wrapper with CFUNCTYPE?

warm breach Feb 4, 2023, 6:07 PM

#

pliant tusk Do you insert the new wrapper with CFUNCTYPE?

PYFUNCTYPE https://github.com/ionite34/einspect/blob/dev/src/einspect/structs/include/object_h.py#L85

fallen slateBOT Feb 4, 2023, 6:07 PM

#

src/einspect/structs/include/object_h.py line 85

newfunc = PYFUNCTYPE(py_object, py_object, py_object, py_object)```

pliant tusk Feb 4, 2023, 6:08 PM

#

Ah that's dangerous. Try raising an exception in a hooked new

warm breach Feb 4, 2023, 6:09 PM

#

pliant tusk Do you insert the new wrapper with CFUNCTYPE?

it's not inserted exactly, it's just an object returned on orig(...) calls

#

how impl works is unchanged

pliant tusk Feb 4, 2023, 6:09 PM

#

Ah

warm breach Feb 4, 2023, 6:10 PM

#

the original error happens between the slot wrapper of __new__, which performs some sanity checks before letting you call tp_new

pliant tusk Feb 4, 2023, 6:11 PM

#

I'll probably use a different strategy for fix that hook. I don't want to hard code a specific dunder

pliant tusk Feb 4, 2023, 6:33 PM

#

warm breach the original error happens between the slot wrapper of `__new__`, which performs...

What happens if you hook int.__new__ to just print then return orig and pass it int('invalid')

warm breach Feb 4, 2023, 6:38 PM

#

pliant tusk What happens if you hook `int.__new__` to just print then return orig and pass ...

from einspect import impl, orig

@impl(int)
def __new__(cls, *args):
    print("in new:", cls, args)
    return orig(cls).__new__(cls, *args)

print(int("invalid"))

in new: <class 'int'> ('invalid',)
Traceback (most recent call last):
  File "main.py", line 8, in <module>
    print(int("invalid"))
          ^^^^^^^^^^^^^^
  File "main.py", line 6, in __new__
    return orig(cls).__new__(cls, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "einspect/structs/py_type.py", line 271, in __call__
    return self._tp_new(subtype, args, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'invalid'

Process finished with exit code 1

pliant tusk Feb 4, 2023, 6:40 PM

#

And self._tp_new is a PYFUNCTYPE wrapper?

warm breach Feb 4, 2023, 6:40 PM

#

it's PyTypeObject(int).tp_new accessed before the hook, and then casted to it's own type (PYFUNCTYPE)

#

since without the cast it will still be the same pointer attached to the ctypes Structure

pliant tusk Feb 4, 2023, 6:41 PM

#

Hmm

#

I thought it might fail due to the Ignored Exception ctypes thing, guess not

warm breach Feb 4, 2023, 8:01 PM

#

pliant tusk Hmm

when fishhook hooks a method, does orig always resolve to the original hooked type or a subtype when called?

pliant tusk Feb 4, 2023, 8:03 PM

#

orig walks up the chain by one.

#

That's how nested hooks work

warm breach Feb 4, 2023, 8:05 PM

#

currently this is an infinite loop, since Foo was never hooked and does not have a cached __new__, so orig(Foo).__new__ returns itself

from einspect import impl, orig

@impl(object)
def __new__(cls, *args, **kwargs):
    print("in new:", cls, args)
    return orig(cls).__new__(cls, *args, **kwargs)

class Foo:
    ...

print(Foo())

#

not sure how best to fix this

pliant tusk Feb 4, 2023, 8:08 PM

#

!e ```py
from fishhook import *

@hook(int)
@hook(int)
def add(self, other):
print(self, other)
return orig(self, other)

x = 1
1 + x ```

fallen slateBOT Feb 4, 2023, 8:08 PM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 1 1
002 | 1 1

pliant tusk Feb 4, 2023, 8:09 PM

#

warm breach currently this is an infinite loop, since `Foo` was never hooked and does not ha...

You'll have other issues with that unless you have implemented your object patch

#

try hooking object getitem then make a new class. (AFAIK that's one trigger of the issue)

pliant tusk Feb 4, 2023, 8:13 PM

#

warm breach currently this is an infinite loop, since `Foo` was never hooked and does not ha...

That's another reason why I hadn't bothered with new hooks yet, there isn't an elegant solution to orig

warm breach Feb 4, 2023, 8:15 PM

#

pliant tusk That's another reason why I hadn't bothered with new hooks yet, there isn't an e...

currently I've special cased any orig(...).__new__ calls trying to return non-builtin function types to instead return the wrapped orig(object).__new__, but that feels a bit 🥴

pliant tusk Feb 4, 2023, 8:15 PM

#

warm breach currently I've special cased any `orig(...).__new__` calls trying to return non-...

Yea exactly, not elegant lol

warm breach Feb 4, 2023, 8:16 PM

#

pliant tusk try hooking object getitem then make a new class. (AFAIK that's one trigger of t...

its stuck in some infinite loop if I do that pithink

pliant tusk Feb 4, 2023, 8:17 PM

#

If I remember correctly, any hook on object will cause that issue without the patch

pliant tusk Feb 4, 2023, 8:17 PM

#

warm breach its stuck in some infinite loop if I do that <:pithink:652247559909277706>

Recursive segfault?

warm breach Feb 4, 2023, 8:17 PM

#

well segfaults if I interrupt

#

otherwise keeps going

pliant tusk Feb 4, 2023, 8:17 PM

#

Try like matmul

#

That should also trigger it

warm breach Feb 4, 2023, 8:18 PM

#

pliant tusk That should also trigger it

it seems it triggers on any PyMethod allocation for object

pliant tusk Feb 4, 2023, 8:18 PM

#

Yea

warm breach Feb 4, 2023, 8:18 PM

#

don't actually have to set the attr

pliant tusk Feb 4, 2023, 8:18 PM

#

That's the same issue I ran into with fishhook

warm breach Feb 4, 2023, 8:19 PM

#

does your patch fix that?

pliant tusk Feb 4, 2023, 8:19 PM

#

Yea

#

It patches object and inserts a fake class at the top of the inheritance chain

warm breach Feb 4, 2023, 8:21 PM

#

do you know where in the source code this happens

#

I assume something loops until the base type and then checks whether pymethods are null?

pliant tusk Feb 4, 2023, 8:21 PM

#

I have it written down somewhere, I'll look for it when I get back on my laptop

pliant tusk Feb 4, 2023, 8:23 PM

#

warm breach I assume something loops until the base type and then checks whether pymethods a...

If I remember correctly, it loops until a given class has a null tp_base

warm breach Feb 4, 2023, 8:23 PM

#

pithink how does the pymethod alloc affect that

pliant tusk Feb 4, 2023, 8:24 PM

#

Once it finds that class it has some assumptions about it

#

I need to find the source for it, I'll ping when I find my notes

grave jolt Feb 4, 2023, 8:37 PM

#

warm breach makefile I guess, but even that is auto generated for cpython

why does every C/C++ project has their own complicated build system that can be its own project...

warm breach Feb 4, 2023, 8:37 PM

#

grave jolt why does every C/C++ project has their own complicated build system that can be ...

I have no idea 😔

#

I'm currently trying to somehow get CLion's debugger to recognize cpython as a project

#

since it doesn't use make on windows...?

warm breach Feb 4, 2023, 8:39 PM

#

grave jolt why does every C/C++ project has their own complicated build system that can be ...

oh yeah if you think ./configure is complex, have a look at https://github.com/python/cpython/tree/3.11/PCbuild

#

#

just 26k lines, nothing to see here

grave jolt Feb 4, 2023, 8:41 PM

#

what a terrible day to have brain

pliant tusk Feb 4, 2023, 8:43 PM

#

https://github.com/python/cpython/blob/a89e6713c4de99d4be5a1304b134e57a24ab10ac/Objects/typeobject.c#L6264-L6293

GitHub

cpython/typeobject.c at a89e6713c4de99d4be5a1304b134e57a24ab10ac · ...

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

#

@warm breach this is where the bug happens

#

It gets base->tb_base

#

On object that is NULL

warm breach Feb 4, 2023, 8:44 PM

#

oh

#

because tp_as_number is defined

pliant tusk Feb 4, 2023, 8:44 PM

#

Yea

warm breach Feb 4, 2023, 8:44 PM

#

oh actually theres one for everything

pliant tusk Feb 4, 2023, 8:45 PM

#

So my patch adds a base for object that has no tp_as_ structs defined

flat gazelle Feb 4, 2023, 8:45 PM

#

grave jolt why does every C/C++ project has their own complicated build system that can be ...

because makefiles suck and nothing else is even vaguely standard

warm breach Feb 4, 2023, 8:45 PM

#

how bad would setting object->tp_base to itself be 👀

pliant tusk Feb 4, 2023, 8:46 PM

#

warm breach how bad would setting object->tp_base to itself be 👀

It would crash

#

Recursion bug

#

I tried that lol

pliant tusk Feb 4, 2023, 8:47 PM

#

warm breach I'm currently trying to somehow get CLion's debugger to recognize cpython as a p...

Should be able to just clone the repo locally, run a python process, then open the project in clion then use atractivo debugger to process

#

But you really should try to build yourself, otherwise it will not link up to the files since it won't have debug symbols

warm breach Feb 4, 2023, 8:47 PM

#

I've attached to the executable I built but it drops into disassembly instead of source :(

pliant tusk Feb 4, 2023, 8:48 PM

#

Yea you need to be running a version of python built with debug symbols

warm breach Feb 4, 2023, 8:48 PM

#

I think I built it with -d?

#

clion still says project not configured

pliant tusk Feb 4, 2023, 8:49 PM

#

./configure with no args is debug build. Then make -j4 will build using 4 threads

pliant tusk Feb 4, 2023, 8:49 PM

#

warm breach clion still says project not configured

Yea that's normal, since you are not trying to configure cpython for cmake

warm breach Feb 4, 2023, 8:50 PM

#

pliant tusk ./configure with no args is debug build. Then make -j4 will build using 4 thread...

I'm using PCbuild\build.bat -e -d though

pliant tusk Feb 4, 2023, 8:50 PM

#

And then running the python.exe that builds?

warm breach Feb 4, 2023, 8:50 PM

#

yeah

pliant tusk Feb 4, 2023, 8:51 PM

#

-d should make python_d.exe

#

https://github.com/python/cpython/blob/a89e6713c4de99d4be5a1304b134e57a24ab10ac/PCbuild/readme.txt#L38

fallen slateBOT Feb 4, 2023, 8:52 PM

#

PCbuild/readme.txt line 38

using this configuration have "_d" added to their name:```

warm breach Feb 4, 2023, 8:52 PM

#

yeah

#

pliant tusk Feb 4, 2023, 8:52 PM

#

Hmm

#

And you have the cpython project open in clion?

warm breach Feb 4, 2023, 8:53 PM

#

if I do a ctypes.string_at(0) it drops into assembly pithink

pliant tusk Feb 4, 2023, 8:54 PM

#

Can you step up frames to get to c files?

#

Tbh I have always used my MacBook for debugging builds so I haven't ever dealt with windows weirdness

#

Is it possible that clion cannot parse the debug symbols in the windows build?

warm breach Feb 4, 2023, 8:56 PM

#

maybe I should use vs

#

https://learn.microsoft.com/en-us/visualstudio/python/debugging-mixed-mode-c-cpp-python-in-visual-studio?view=vs-2022

Mixed-mode debugging for Python - Visual Studio (Windows)

Simultaneously debug C++ and Python in Visual Studio including stepping between environments, viewing values, and evaluating expressions.

#

apparently they have some combined python / c debug feature pithink

#

pliant tusk Feb 4, 2023, 8:57 PM

#

Woah that looks like I should start using VS

warm breach Feb 4, 2023, 10:06 PM

#

pliant tusk Woah that looks like I should start using VS

oh wow

#

this is really cool

pliant tusk Feb 4, 2023, 10:09 PM

#

visual studio or visual studio code?

warm breach Feb 4, 2023, 10:13 PM

#

pliant tusk visual studio or visual studio code?

visual studio

#

#

can be anywhere as well

#

if I make a breakpoint in PyList_New and make a list literal

#

it can break into that

pliant tusk Feb 4, 2023, 10:13 PM

#

oh nice

#

thats what my clion workflow is like

deep nova Feb 5, 2023, 4:47 AM

#

Could someone explain to me, in human terms, how Python handles "logical" and "physical" lines?

#

Like, does python perform a split operation at newlines when it starts lexing, and joins lines according to rules as they are lexed?

#

Or, does it separate as it lexes? Or, is the distinction between a logical line and physical line simply an explanatory tool?

rose schooner Feb 5, 2023, 4:56 AM

#

deep nova Or, does it separate as it lexes? Or, is the distinction between a logical line ...

it just does that

feral island Feb 5, 2023, 4:57 AM

#

deep nova Could someone explain to me, in human terms, how Python handles "logical" and "p...

I don't think the lexer/parser know about logical vs. physical lines

#

the tokenizer generates INDENT and DEDENT tokens

rose schooner Feb 5, 2023, 4:57 AM

#

the python tokenizer returns a NEWLINE token representing a single \n

raven ridge Feb 5, 2023, 4:57 AM

#

seems like just an explanatory tool to me. At the grammar level, there's just statements. Sometimes a newline ends a statement, sometimes it doesn't.

feral island Feb 5, 2023, 4:58 AM

#

the tokenizer also keeps track of bracket nesting, so within brackets newline tokens aren't treated as statement separators

deep nova Feb 5, 2023, 4:58 AM

#

XD I've read through the parts on physical/logical lines many times, and honestly, it just confuses me more and more as I learn more about the job of a lexer

#

If it was me, I'd strip it from the docs and instead include a section on python's INDENT and DEDENT handling machinery

deep nova Feb 5, 2023, 4:59 AM

#

feral island the tokenizer also keeps track of bracket nesting, so within brackets newline to...

HMMMMMMMMMMMMM

rose schooner Feb 5, 2023, 4:59 AM

#

rose schooner the python tokenizer returns a `NEWLINE` token representing a single `\n`

the exact condition where it doesn't do this in some cases i don't exactly know

deep nova Feb 5, 2023, 5:00 AM

#

Is there any reason that Python offloads this work to the lexer? It seems like a job for the parser, or even the semantics machine

#

One of those "it seemed like a good idea at the time" type things, perhaps?

feral island Feb 5, 2023, 5:01 AM

#

it's probably a lot easier this way. I'm not sure what you even mean by "the semantics machine" here

deep nova Feb 5, 2023, 5:01 AM

#

Semantic Analyzer

#

But I have trouble spelling "Analyzer"

feral island Feb 5, 2023, 5:02 AM

#

sure, but we're talking about determining how to split the program into statements, right?

deep nova Feb 5, 2023, 5:03 AM

#

Well, at the level of lexing we aren't even doing that

#

We're just trying to figure out where one thing ends and the next starts, whatever those things might be

raven ridge Feb 5, 2023, 5:04 AM

#

at the level of lexing, we're breaking down the input into token streams which can be fed to the parser. That token stream needs to include indent somehow: leading whitespace is semantically significant in Python, so if the token stream didn't include it, the parser wouldn't have enough information to operate

#

every syntactically significant feature of the language needs to somehow be preserved by the lexer

deep nova Feb 5, 2023, 5:18 AM

#

True, but actually counting and enforcing indentation in the lexer is quite different from recognizing leading whitespace. The latter just means recognizing a newline followed by some number of whitespace or tab characters. You could then poop out a "linebreak" token

raven ridge Feb 5, 2023, 6:59 AM

#

deep nova True, but actually counting and enforcing indentation in the lexer is quite diff...

it's not enough to just spit out a "linebreak" token, you also need to spit out a token that indicates how indented the next line is. py def func(): if something: foo() bar() means something different than ```py
def func():
if something:
foo()
bar()

So either you need to emit tokens for all whitespace, or at least for leading whitespace - and the token for whitespace would need to indicate what whitespace it was, in order for the parser to be able to detect increases or decreases in depth, as well as errors resulting from mismatched whitespace (like starting one line with 8 spaces and the next line with a tab)

deep nova Feb 5, 2023, 7:42 AM

#

Seems doable.

#

This isn't me dumping on Python's strategy, btw

#

I'm just trying to think it all through

#

I'm conflicted about adding such functionality to my own lexer. On the one hand, the python approach works quite well, is simple, and is intuitive. On the other, I'm hesitant to employ a machine whose job is to recognize but not interpret for the purposes of interpreting

feral island Feb 5, 2023, 7:47 AM

#

my intuition is that Python's approach is quite a bit simpler than what you are suggesting. I'm sure other approaches can be made to work though.

#

in your approach the parser would need to deal with whitespace virtually everywhere in expressions

deep nova Feb 5, 2023, 7:50 AM

#

Not necessarily

#

On recognizing a newline, the lexer could be triggered to consume whitespace and tab characters, emitting a token for each until a non-whitespace character is encountered

#

Or, a composite token containing both the newline and the leading whitespace could be generated. At a later step, the semantic analyzer could simpler look at each newline token's literal to count what's there

#

I'm also toying around with lazy lexing such that the lexer accepts contextual cues from the parser. I'm hoping it'll make life easier for parsing fstrings. I could always leverage that to toggle on/off whitespacing behaviour

flat gazelle Feb 5, 2023, 7:54 AM

#

Be careful that if you do that, you now don't have a context free grammar. It is generally easier to slightly extend the Lexer to deal with indentation over the (much more complex) parser

deep nova Feb 5, 2023, 7:54 AM

#

Well, its 6-of-1

#

Half a dozen of the other, if that's true. I can either add state to my lexer at which points the language it represents is no longer regular, or otherwise sully the context-freedom of the parser

#

Though, I'm curious — why would this make the parser not context free?

#

Oh — because instead of a purely functional flowing from state to state within the grammar/parser, I'm making choices based on the contents of the token

flat gazelle Feb 5, 2023, 7:57 AM

#

Because you need to remember the indentation state, and that's not possible in CFG. You can't somehow "interleave" the various indentation levels with the actual grammar, nor can you pass the indentation to e.g. the pass statement.

deep nova Feb 5, 2023, 7:57 AM

#

Hmmm

#

I mean, my instincts tell me that the right place to handle indentation rules is actually in the semantic analysis. That's a contextual domain anyway.

That said, I'm just spitballing. Python's approach seems quite effective

flat gazelle Feb 5, 2023, 7:58 AM

#

If you have context, it should be possible afaik, but I never tried.

#

Semantic analysis Generally happens after parsing

#

So then the question is what good is your parsing if it can't even tell you what's in a while loop

#

You could parse each statement separately and reconstruct blocks afterwards probably. Which would be interesting

deep nova Feb 5, 2023, 8:02 AM

#

I'm honestly not sure its even that complicated. Specific parser functions can be configured to automatically ignore whitespace tokens, others can be configured to accept it (I'm thinking decorators, here)

#

If anything, having more a detailed token stream would make accurate parsing even easier.

#

Don't listen to me though — I'm still very, very new at this. All I can I say is that I intend to experiment the hell out of this

flat gazelle Feb 5, 2023, 8:04 AM

#

It probably isn't all things considered. You will give up parser generators, but hand rolling a parser isn't too bad.

deep nova Feb 5, 2023, 8:04 AM

#

Give up parser generators?

flat gazelle Feb 5, 2023, 8:06 AM

#

A parser generator is unlikely to cope with your switching between whitespace and non-white space tokens

gritty glacier Feb 5, 2023, 9:09 AM

#

how exactly does the _Printer class in cpython work
So i wanna overwrite or replace the data and filenames attributes in the _Printer class in cpython but when i try to change them nothing happens.. am i doing it wrong
license._Printer__filenames = ["/test"] this should print the conents in the test file when license is called right? https://github.com/python/cpython/blob/main/Lib/_sitebuiltins.py

GitHub

cpython/_sitebuiltins.py at main · python/cpython

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

dusk comet Feb 5, 2023, 9:55 AM

#

gritty glacier how exactly does the _Printer class in cpython work So i wanna overwrite or repl...

You can call __init__ again

gritty glacier Feb 5, 2023, 9:56 AM

#

oh

dusk comet Feb 5, 2023, 9:56 AM

#

license.__init__(...)

dusk comet Feb 5, 2023, 9:57 AM

#

gritty glacier how exactly does the _Printer class in cpython work So i wanna overwrite or repl...

If you are doing this, you should call .__setup manually

gritty glacier Feb 5, 2023, 9:57 AM

#

dusk comet If you are doing this, you should call `.__setup` manually

but .__setup is being called in the __call__ method right?

dusk comet Feb 5, 2023, 9:59 AM

#

Hmm, indeed

#

Are you sure that your files exists?

gritty glacier Feb 5, 2023, 10:00 AM

#

yes im pretty sure i mean you can try too in a python interpreter xD its nothing serious jsut wondering why is not working i

#

i initially assumed its because of the Oserror

#

its just passes if it hits it

#

but it exists

#

the file

#

okay nvm mb

lone sun Feb 5, 2023, 3:32 PM

#

deep nova I mean, my instincts tell me that the right place to handle indentation rules is...

I think your instincts are misleading you on this one. The indentation rules have a semantic effect, but so does everything else in the language; it does not follow that everything should be done during the semantic analysis. I think a good rule to follow is: What do you want to do when you encounter invalid input? Suppose that someone asks you to parse:

def factorial(x):
    if x == 0:
        return 1
     else:
        return x * factorial(x-1)

Notice the incorrect indentation of the else. When the interpreter is given this input, it fails with an IndentationError immediately upon encountering the line with the else. (You can observe this by cut-and-pasting into the REPL.) It doesn't wait for the function to be completely defined.

#

I think you might appreciate Python's strategy more if you look at it as a two-pass grammar. The first pass is called "lexical analysis", but while lexical analysis is conventionally described using regular expressions and finite automata, in Python it is a much more powerful step. It's a fairly simple grammar (as these things go) but it needs to remember the amount and type of indentation from line to line (and also, as discussed earlier, lexing behaves differently depending on whether we're inside an f-string). The output of Python's lexer (like all lexers) is a stream of tokens. Normally we think of this stream as being immediately consumed by the next grammar (the one conventionally called "the grammar"), but if you wanted, it could be serialized, written to a file, and read back in at a later time.

craggy ravine Feb 5, 2023, 3:42 PM

#

Hi all, I had a few questions regarding the internals of the Python shell (REPL). Is there a good article, book etc. I can consult? My questions are based around compilation in the shell.

What does the Python shell (REPL) use under the hood for compilation (parsing, generating byte code)?
Is byte code even generated or are expressions + other items evaluated from the AST?
How is context maintained? As in, if I declare a = 42 and then in the next command say b = a + 1, how does the shell ensure that the definition of a is available to b?

Additionally, I would also really appreciate a link to shell code in the cpython repository?

Thank you!

uneven herald Feb 5, 2023, 3:44 PM

#

I have a question about the general design of python:

why did Python decided to use dunder methods like len, where other languages like Java preferred a .length() convention for example? is there some historical/pedagogical/anything else evidences that motivated the choice?

warm breach Feb 5, 2023, 4:01 PM

#

uneven herald I have a question about the general design of python: why did Python decided to...

there's usually one more layer of abstraction between the functions and the dunders

#

like iter() doesn't necessarily require an __iter__ dunder, it also has alternative forms of working like using __getitem__

#

and some methods enforce their own rules, like hash() cannot ever return -1

#

!e

class Foo:
    def __hash__(self):
        return -1

f = Foo()
print(f.__hash__())
print(hash(f))

fallen slateBOT Feb 5, 2023, 4:02 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | -1
002 | -2

warm breach Feb 5, 2023, 4:03 PM

#

also having all of these be dunders means you can name your own methods more freely without clashes

#

they also denote the fundamental difference in how they are stored and accessed,

class Foo:
    def len(self):
        ...
    def __len__(self):
        ...

#

len is stored in Foo's dictionary, while __len__ is stored in the basic Foo type's slots

uneven herald Feb 5, 2023, 4:15 PM

#

ok thx! the two later were reallx what I was looking for

grave jolt Feb 5, 2023, 5:58 PM

#

warm breach `len` is stored in `Foo`'s dictionary, while `__len__` is stored in the basic `F...

that's a CPython detail though, no?

#

oh, yeah I see what you mean

#

If you do e.g. foo.__len__ = my_len it won't work

warm breach Feb 5, 2023, 6:00 PM

#

grave jolt If you do e.g. `foo.__len__ = my_len` it won't work

len(foo) calls foo.__class__.__len__(foo) essentially

grave jolt Feb 5, 2023, 6:01 PM

#

uneven herald ok thx! the two later were reallx what I was looking for

some have better reasons to exist, arithmetic methods like + or < will inquire the second operand if the left one returned NotImplemented

#

whereas + itself cannot return NotImplemented

#

(right?)

warm breach Feb 5, 2023, 6:08 PM

#

grave jolt some have better reasons to exist, arithmetic methods like `+` or `<` will inqui...

which also has the nice effect of it being impossible to know for sure whether two types support + without running them

deep nova Feb 5, 2023, 6:14 PM

#

lone sun I think you might appreciate Python's strategy more if you look at it as a two-p...

Smort

#

You know, it occurs to me I've never actually seen Python's lexer

#

I've seen a few 3rd party ones, but never the one actually used by the interpreter

uneven herald Feb 5, 2023, 6:35 PM

#

warm breach there's usually one more layer of abstraction between the functions and the dund...

I just understood the impact of what you meant here.. neat!

grave jolt Feb 5, 2023, 7:02 PM

#

warm breach which also has the nice effect of it being impossible to know for sure whether t...

true Python experience

uneven herald Feb 5, 2023, 7:20 PM

#

grave jolt some have better reasons to exist, arithmetic methods like `+` or `<` will inqui...

yeah that's in the spirit of the first examples given by ionite I think

#

basically, the built-in operator is allowed to perform better control than what the underlying "implementation" is doing, for better safety and/or smarter decision ; smth that is not really possible when using idioms like .length() or .add(...)

#

hmmmmm
although, on my examples, + raised a NotImplementedError

warm breach Feb 5, 2023, 7:30 PM

#

uneven herald hmmmmm although, on my examples, `+` raised a `NotImplementedError`

if you raise it'll always go through

#

you're supposed to return NotImplemented

uneven herald Feb 5, 2023, 7:34 PM

#

warm breach if you raise it'll always go through

wat do I do wrong:

class Ill:
  def __add__(self, another):
    return NotImplemented


class Safe(Ill):
  def __add__(self, another):
    return self


ill = Ill()
safe = Safe()

print(ill + safe) # This crashes

warm breach Feb 5, 2023, 7:36 PM

#

uneven herald wat do I do wrong: ```py class Ill: def __add__(self, another): return No...

hm? it's a TypeError

#

not NotImplementedError

uneven herald Feb 5, 2023, 7:38 PM

#

ig I miss something obvious

#

hm, debugger goes through the method though...

uneven herald Feb 5, 2023, 7:41 PM

#

grave jolt some have better reasons to exist, arithmetic methods like `+` or `<` will inqui...

are you sure?

warm breach Feb 5, 2023, 7:43 PM

#

uneven herald wat do I do wrong: ```py class Ill: def __add__(self, another): return No...

for this you have to define __radd__ on Safe

raven ridge Feb 5, 2023, 7:50 PM

#

uneven herald ig I miss something obvious

a + b calls type(a).__add__(a, b) first. If that doesn't return NotImplemented, it evaluates to that value, otherwise it falls back to type(b).__radd__(b, a). If that returns NotImplemented an exception is raised, otherwise the addition evaluates to that value.

#

Consider subtraction: when it falls back to the second argument's type, that method needs to know that it was the second argument and not the first, because subtraction isn't commutative

uneven herald Feb 5, 2023, 7:52 PM

#

warm breach for this you have to define `__radd__` on `Safe`

neat! thx a lot

uneven herald Feb 5, 2023, 7:54 PM

#

raven ridge Consider subtraction: when it falls back to the second argument's type, that met...

yup indeed ; didn't know about the radd and related, but indeed it makes more sense like that because not every + is commutative even in the std lib (like the one of list or string)

raven ridge Feb 5, 2023, 7:54 PM

#

Right, yep

uneven herald Feb 5, 2023, 7:54 PM

#

thx all! learned smth quite neat today! 😄

slim drum Feb 5, 2023, 7:57 PM

#

sorry for posting here but i am not really getting help in the help channel can anyone here tell me what my mistake is im a begginer

Store__Recommendations_-_Google_Chrome_2_5_2023_11_54_53_AM.png

rose schooner Feb 5, 2023, 9:24 PM

#

craggy ravine Hi all, I had a few questions regarding the internals of the Python shell (REPL)...

python REPL uses the same parser and compiler as python files, eval(), and exec(), but with a different mode from those types of running python
yes, bytecode is generated
those are globals, so they're stored in globals() which is available for each python session
‫https://github.com/python/cpython/blob/main/Modules/main.c

GitHub

cpython/main.c at main · python/cpython

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

craggy ravine Feb 5, 2023, 10:42 PM

#

rose schooner 1. python REPL uses the same parser and compiler as python files, `eval()`, and ...

Thanks @rose schooner !

prime estuary Feb 5, 2023, 11:32 PM

#

craggy ravine Thanks <@310263589913100288> !

You also might want to look into the code module, which is an extensible pure-Python reimplementation of the REPL loop.

warm breach Feb 6, 2023, 3:32 AM

#

is it safe to resurrect an object inside of its weakref.finalize?

#

!e

import weakref
from einspect.structs import PyObject

class Foo:
    pass

def main():
    f = Foo()
    weakref.finalize(f, lambda obj: print(obj.ob_refcnt), PyObject.from_object(f))

main()

fallen slateBOT Feb 6, 2023, 3:37 AM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

warm breach Feb 6, 2023, 3:37 AM

#

it seems the instance object has ob_refcnt of 0 when finalize runs

#

is Py_IncRef in general safe to use when ob_refcnt is 0?

raven ridge Feb 6, 2023, 4:04 AM

#

warm breach is `Py_IncRef` in general safe to use when `ob_refcnt` is 0?

no. It's certainly not safe to use after the garbage collector has collected an object and its tp_dealloc has been called.

warm breach Feb 6, 2023, 4:05 AM

#

raven ridge no. It's certainly not safe to use after the garbage collector has collected an ...

does tp_dealloc run before weakref.finalize?

raven ridge Feb 6, 2023, 4:06 AM

#

it's called directly from Py_DECREF - I think tp_dealloc calls the weakref finalizer, but I haven't spotted exactly how yet.

warm breach Feb 6, 2023, 4:07 AM

#

how come resurrection is safe in __del__ though pithink

#

I thought it was around the same priority as weakref.finalize

raven ridge Feb 6, 2023, 4:08 AM

#

it's impossible for a weakref finalizer to get a reference to the object to resurrect it, unless you're doing memory-unsafe stuff

warm breach Feb 6, 2023, 4:12 AM

#

raven ridge it's impossible for a weakref finalizer to get a reference to the object to resu...

could keep the address? 🥴

raven ridge Feb 6, 2023, 4:12 AM

#

Python code can't. you need C code or an FFI to do that.

warm breach Feb 6, 2023, 4:12 AM

#

if weakref.finalize actually runs before garbage collection theoretically that would be okay...?

raven ridge Feb 6, 2023, 4:15 AM

#

https://github.com/python/cpython/blob/74d5f61ebd1cb14907bf7dae1ad9c1e676707bc5/Modules/gc_weakref.txt#L21-L29

fallen slateBOT Feb 6, 2023, 4:15 AM

#

Modules/gc_weakref.txt lines 21 to 29

OTOH, it's OK to run Python-level code that can't access unreachable
objects, and sometimes that's necessary.  The chief example is the callback
attached to a reachable weakref W to an unreachable object O.  Since O is
going away, and W is still alive, the callback must be invoked.  Because W
is still alive, everything reachable from its callback is also reachable,
so it's also safe to invoke the callback (although that's trickier than it
sounds, since other reachable weakrefs to other unreachable objects may
still exist, and be accessible to the callback -- there are lots of painful
details like this covered in the rest of this file).```

raven ridge Feb 6, 2023, 4:16 AM

#

that "OTOH" is on the other hand of this: https://github.com/python/cpython/blob/74d5f61ebd1cb14907bf7dae1ad9c1e676707bc5/Modules/gc_weakref.txt#L7-L8

fallen slateBOT Feb 6, 2023, 4:16 AM

#

Modules/gc_weakref.txt lines 7 to 8

Once gc has computed the set of unreachable objects, no Python-level
code can be allowed to access an unreachable object.```

raven ridge Feb 6, 2023, 4:17 AM

#

so, no - I think the GC's code assumes that the object cannot be resurrected by a weakref finalizer... probably

warm breach Feb 6, 2023, 4:22 AM

#

!e

import weakref
from einspect.structs import PyObject

class Foo(list):
    pass

def on_del(obj: PyObject):
    print("refcount in on_del:", obj.ob_refcnt)
    # Resurrect the object
    obj.IncRef()
    obj.IncRef()
    print(obj.into_object())

f = Foo([1, 2, 3])
obj = PyObject.from_object(f)
weakref.finalize(f, on_del, obj)

del f
print(obj.into_object())

fallen slateBOT Feb 6, 2023, 4:22 AM

#

@warm breach :x: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

001 | refcount in on_del: 0
002 | [1, 2, 3]

raven ridge Feb 6, 2023, 4:23 AM

#

https://github.com/python/cpython/blob/74d5f61ebd1cb14907bf7dae1ad9c1e676707bc5/Modules/gc_weakref.txt#L133-L138

fallen slateBOT Feb 6, 2023, 4:23 AM

#

Modules/gc_weakref.txt lines 133 to 138

[In 2.4/2.3.5, we first clear all weakrefs to CT objects, whether or not
 those weakrefs are themselves CT, and whether or not they have callbacks.
 The callbacks (if any) on non-CT weakrefs (if any) are invoked later,
 after all weakrefs-to-CT have been cleared.  The callbacks (if any) on CT
 weakrefs (if any) are never invoked, for the excruciating reasons
 explained here.]```

warm breach Feb 6, 2023, 4:23 AM

#

raven ridge so, no - I think the GC's code assumes that the object cannot be resurrected by ...

so it seems the object is still not de-allocated when finalize's function is called, but IncRef will not stop the deallocation

raven ridge Feb 6, 2023, 4:25 AM

#

fallen slate `Modules/gc_weakref.txt` lines 133 to 138 ```txt [In 2.4/2.3.5, we first clear a...

that seems to be saying that weakref callbacks (and presumably by extension weakref finalizers) are run after the weakrefs themselves have been cleared, specifically to ensure that those weakrefs can't resurrect the object

#

also: https://github.com/python/cpython/blob/74d5f61ebd1cb14907bf7dae1ad9c1e676707bc5/Modules/gc_weakref.txt#L148-L156

fallen slateBOT Feb 6, 2023, 4:31 AM

#

Modules/gc_weakref.txt lines 148 to 156

So, to prevent any Python code from running while gc is invoking tp_clear()
on all the objects in cyclic trash,

[That was always wrong:  we can't stop Python code from running when gc
 is breaking cycles.  If an object with a __del__ method is not itself in
 a cycle, but is reachable only from CT, then breaking cycles will, as a
 matter of course, drop the refcount on that object to 0, and its __del__
 will run right then.  What we can and must stop is running any Python
 code that could access CT.]```

warm breach Feb 6, 2023, 4:31 AM

#

raven ridge that seems to be saying that weakref callbacks (and presumably by extension weak...

so essentially I'm trying to attach a weakref.finalize to the __hash__ function object here, and then DecRefing it from 2 references to 1, (since originally it gained 1 reference from int owning it), and within finalize, the original int.__hash__ function is restored.

from einspect import impl

@impl(int)
def __hash__(self):
    return 128

print(hash(10))  # 128

del __hash__

print(hash(10))  # 10

The fact that I DecRef the reference owned by int is obviously unsafe but I'm thinking that, in practice, since as soon as __hash__ is GC'd I immidietely replace the current garbage reference of int.__hash__ to a valid one, nothing should access garbage objects...?

#

though uh, currently int's PyObject_SetAttr tries to access the original attribute to DecRef it and segfaults.

#

so I'm starting to think this whole thing is not possible to do safely afterall? 😔

#

yeah this seems like a crazy rabbit hole I went with, it was never going to work from that first DecRef

raven ridge Feb 6, 2023, 4:36 AM

#

you mean every part of einspect that involves writing to arbitrary objects? I agree 😛

#

it doesn't even seem like those are the semantics you would want, though, honestly

#

making the behavior change based on the last reference to __hash__ being dropped just seems weird. Why not make it a context manager?

warm breach Feb 6, 2023, 4:45 AM

#

raven ridge making the behavior change based on the last reference to `__hash__` being dropp...

yeah that was my first idea, then I thought it'd be a bit too verbose and went through all that 😔

#

looks kind of weird I guess

from einspect import impl

with impl(int) as ctx:
    @ctx
    def __hash__(self):
        return 128

    print(hash(10))  # 128

print(hash(10))  # 10

rose schooner Feb 6, 2023, 4:46 AM

#

warm breach looks kind of weird I guess ```py from einspect import impl with impl(int) as c...

why does it have to be decorated

warm breach Feb 6, 2023, 4:46 AM

#

I dunno, how else would the context manager work

rose schooner Feb 6, 2023, 4:46 AM

#

warm breach I dunno, how else would the context manager work

ok fair

#

i thought of automatically detecting assignments for some reason

warm breach Feb 6, 2023, 4:48 AM

#

also the finalizer idea also breaks with @property decorated functions

#

since property objects are not weak-refable for some reason

raven ridge Feb 6, 2023, 4:52 AM

#

warm breach I dunno, how else would the context manager work

def int_hash(self):
    return 128

with einspect.context() as ctx:
    ctx.patch(int, "__hash__", int_hash)
    print(hash(10))  # 128

print(hash(10))  # 10

warm breach Feb 6, 2023, 4:53 AM

#

hm yeah that looks better at least pithink

feral island Feb 6, 2023, 4:53 AM

#

or just with einspect.patch(int, "__hash__", int_hash):?

warm breach Feb 6, 2023, 4:54 AM

#

mainly I wanted the finalizer to restore the methods before interpreter shutdown though, since there are some internal calls that happen after python frames are gone

#

!e

from einspect import impl

@impl(int)
def __hash__(self):
    return 128

fallen slateBOT Feb 6, 2023, 4:54 AM

#

@warm breach :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

[No output]

raven ridge Feb 6, 2023, 4:54 AM

#

feral island or just `with einspect.patch(int, "__hash__", int_hash):`?

or that - depends on whether you anticipate multiple patches getting applied together or not

warm breach Feb 6, 2023, 4:54 AM

#

like something in shutdown relies on PyLong's hash apparently

raven ridge Feb 6, 2023, 4:55 AM

#

You realize that changing int's hash breaks every pre-existing dict with int keys, right?

warm breach Feb 6, 2023, 4:56 AM

#

raven ridge You realize that changing int's hash breaks every pre-existing dict with int key...

!e well even if we functionally don't change it

from einspect import impl, orig

@impl(int)
def __hash__(self):
    return orig(int).__hash__(self)

fallen slateBOT Feb 6, 2023, 4:56 AM

#

@warm breach :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

[No output]

warm breach Feb 6, 2023, 4:57 AM

#

an internal call goes to that when python frames are already gone so I assume the call accesses garbage memory

raven ridge Feb 6, 2023, 4:57 AM

#

🤷‍♂️ I'm just pointing out another reason why doing what you're trying to do is fundamentally unreasonable

warm breach Feb 6, 2023, 6:27 AM

#

alright so I have it not mess with ref counts now and just leave a plain weakref finalize on the function with detach=True

from einspect import impl, orig

@impl(int, detach=True)
def __hash__(self):
    print("in hash:", self)
    return orig(int).__hash__(self)

#

when the finalizer is called (at some point in interpreter shutdown) the user defined __hash__ function still has ref-count of 1, so it seems relatively safe?

warm breach Feb 6, 2023, 5:22 PM

#

!e

import sys

class SomeClass:
    pass

print(sys.getrefcount(SomeClass))

fallen slateBOT Feb 6, 2023, 5:22 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

warm breach Feb 6, 2023, 5:22 PM

#

where do classes get 4 references from?

peak spoke Feb 6, 2023, 5:27 PM

#

from get_referrers it looks like the MRO, __dict__ mappingproxy, __weakref__ descriptor and the globals

#

though I'm not sure why dict would have a reference?

warm breach Feb 6, 2023, 5:30 PM

#

!e

from weakref import WeakKeyDictionary

class Foo:
    pass

d = WeakKeyDictionary()
d[Foo] = 1

del Foo

print(dict(d))

fallen slateBOT Feb 6, 2023, 5:30 PM

#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

{<class '__main__.Foo'>: 1}

warm breach Feb 6, 2023, 5:30 PM

#

currently deling a module level class doesn't remove it from weakref dicts as well, implying it's not GC'd at that stage

#

which is a bit strange

flat gazelle Feb 6, 2023, 6:01 PM

#

!e

class X: pass
del X
print(object.__subclasses__()[-1])

fallen slateBOT Feb 6, 2023, 6:01 PM

#

@flat gazelle :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class '__main__.X'>

flat gazelle Feb 6, 2023, 6:01 PM

#

__subclasses__() keeps it forever-ish

feral island Feb 6, 2023, 6:02 PM

#

!e ```
import gc
class X: pass
del X
gc.collect()
print(object.subclasses()[-1])

fallen slateBOT Feb 6, 2023, 6:02 PM

#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class 'abc.ABC'>

feral island Feb 6, 2023, 6:02 PM

#

__subclasses__ is a weakref. I think it doesn't get GCed because there's a cycle somewhere

flat gazelle Feb 6, 2023, 6:03 PM

#

huh, interesting

flat gazelle Feb 6, 2023, 6:03 PM

#

peak spoke from get_referrers it looks like the MRO, `__dict__` mappingproxy, `__weakref__`...

IG that's the dict mentioned here.

feral island Feb 6, 2023, 6:03 PM

#

!e ```
import gc
class X: pass
print(gc.get_referents(X))

fallen slateBOT Feb 6, 2023, 6:03 PM

#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

[{'__module__': '__main__', '__dict__': <attribute '__dict__' of 'X' objects>, '__weakref__': <attribute '__weakref__' of 'X' objects>, '__doc__': None}, (<class '__main__.X'>, <class 'object'>), (<class 'object'>,), <class 'object'>]

feral island Feb 6, 2023, 6:03 PM

#

oops need the other one

#

!e ```import gc
class X: pass
print(gc.get_referrers(X))

fallen slateBOT Feb 6, 2023, 6:04 PM

#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

[{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'gc': <module 'gc' (built-in)>, 'X': <class '__main__.X'>}, (<class '__main__.X'>, <class 'object'>), <attribute '__dict__' of 'X' objects>, <attribute '__weakref__' of 'X' objects>]

feral island Feb 6, 2023, 6:04 PM

#

I guess there's a cycle between X and X.__dict__

pliant tusk Feb 6, 2023, 6:17 PM

#

!e ```py
class X:pass

print(vars(X)['dict'].objclass is X)
print(X.weakref.objclass is X)``` these are the two cycles I believe

fallen slateBOT Feb 6, 2023, 6:17 PM

#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | True
002 | True

pliant tusk Feb 6, 2023, 6:26 PM

#

oh and X.__mro__

warm breach Feb 6, 2023, 7:00 PM

#

pliant tusk !e ```py class X:pass print(vars(X)['__dict__'].__objclass__ is X) print(X.__we...

shouldn't this be detected by cyclic GC?

feral island Feb 6, 2023, 7:00 PM

#

warm breach shouldn't this be detected by cyclic GC?

it is, that's why the class is deleted only after gc.collect

warm breach Feb 6, 2023, 7:01 PM

#

ah I see

#

without that there's no guarantee when it'll happen?

pliant tusk Feb 6, 2023, 7:07 PM

#

warm breach without that there's no guarantee when it'll happen?

yea, (you can also trigger a collection by increasing memory pressure)

deep nova Feb 6, 2023, 7:12 PM

#

Does python's lexer watch for mismatched parentheses, or is that done in the parser?

lunar harbor Feb 6, 2023, 7:16 PM

#

typically that is handled by the parser as the lexer just determines tokens. but I don't know about python in particular.

fickle ferry Feb 6, 2023, 7:20 PM

#

deep nova Does python's lexer watch for mismatched parentheses, or is that done in the par...

It is handled by the parser

deep nova Feb 6, 2023, 7:36 PM

#

Woot

lunar harbor Feb 6, 2023, 7:48 PM

#

so pretty typical 🙂

deep nova Feb 6, 2023, 7:57 PM

#

Follow up question

#

Matching parentheses is important for indentation enforcement. If you're in parens then you ignore indentation, right? How, then, can a lexer which isn't tracking parens intelligently recognize when and when not to throw an indentation error

feral island Feb 6, 2023, 8:00 PM

#

deep nova Matching parentheses is important for indentation enforcement. If you're in pare...

I think it does track them, it just doesn't throw the errors when things don't match

#

though:

#

!e ```
)

fallen slateBOT Feb 6, 2023, 8:01 PM

#

@feral island :x: Your 3.11 eval job has completed with return code 1.

001 |   File "<string>", line 1
002 |     )
003 |     ^
004 | SyntaxError: unmatched ')'

feral island Feb 6, 2023, 8:01 PM

#

this kind of sounds like it comes from the lexer

#

yes Parser/tokenizer.c: return syntaxerror(tok, "unmatched '%c'", c);

deep nova Feb 6, 2023, 8:01 PM

#

Awesome!

feral island Feb 6, 2023, 8:02 PM

#

!e (

fallen slateBOT Feb 6, 2023, 8:02 PM

#

@feral island :x: Your 3.11 eval job has completed with return code 1.

001 |   File "<string>", line 1
002 |     (
003 |     ^
004 | SyntaxError: '(' was never closed

feral island Feb 6, 2023, 8:03 PM

#

that one comes from the parser though Parser/pegen_errors.c: "'%c' was never closed",

deep nova Feb 6, 2023, 8:05 PM

#

The salient thing I needed know has, I think, been answered for me

#

If I intend to emulate Python's enforcement of indentation while lexing, I'll also need to track parentheses

#

In doing this, I may or may not also enforce parentheses matching

rich cradle Feb 6, 2023, 9:03 PM

#

what you want here is probably to construct a token tree

#

and indentation can be put into that tree as a kind of delimiter

deep nova Feb 6, 2023, 9:08 PM

#

O.O

#

A token tree?

#

Does the lexer recognize a \\ as a token unto itself, or does it recognize \\\n?

#

Actually — how do the lexing and parsing phases handle explicitly escaped newlines in general?

rich cradle Feb 6, 2023, 9:13 PM

#

they're in strings, so they don't, at least ime

feral island Feb 6, 2023, 9:15 PM

#

pretty sure escaping newlines would be handled in the lexer. I'd expect the lexer to simply not emit a token in that case, just like other whitespace within a statement

deep nova Feb 6, 2023, 9:15 PM

#

Oh, not emitting a token at all is smart

#

So, what, consume the backslash, the newline, and any leading tabs or spaces, all without emitting anything?

feral island Feb 6, 2023, 9:16 PM

#

yes, that's what I'd expect to happen

deep nova Feb 6, 2023, 9:16 PM

#

As well, throw an error if anything else follows?

deep nova Feb 6, 2023, 9:37 PM

#

def newline(self) -> Generator[Token, None, None]:

    yield self.token(Tokentype.NEWLINE)

    indentation = len(self.star(self.generic('\t')))

    while indentation > self.indentation:
            
        self.indentation.append(len(self.indentation))
        yield self.token(Tokentype.INDENT)

    if indentation not in self.indentation:
        raise SyntaxError('dedentation to inconsistent depth')
        
    while indentation < self.indentation:

        self.indentation.pop()
        yield self.token(Tokentype.DEDENT)

#

Does this about summarize it?

rich cradle Feb 6, 2023, 9:38 PM

#

feral island pretty sure escaping newlines would be handled in the lexer. I'd expect the lexe...

ah the python style \. forgot about that one.

rose schooner Feb 6, 2023, 10:53 PM

#

deep nova ```py def newline(self) -> Generator[Token, None, None]: yield self.token(T...

i think they all return separately

deep nova Feb 6, 2023, 10:53 PM

#

?

rose schooner Feb 6, 2023, 10:54 PM

#

so it's maybe correct

#

or equivalent

#

but there's more complicated stuff the lexer does

#

like how it detects inconsistent spacing

#

while taking a look at the tokenizer i found that #1072290334169641000

#

def a(x):
    if x & 1:
        print('x is odd')
        return
    \
                                 \
print('x is even')
``` this doesn't result in an `IndentationError`, rather, does what you'd expect if you did ```py
def a(x):
    if x & 1:
        print('x is odd')
        return
    print('x is even')

#

so yeah that's also something to handle

deep nova Feb 6, 2023, 11:08 PM

#

For the time being, I'm going to prohibit leading spaces

#

Only tabs

#

That said, what "more complicated stuff" are we talking about?

feral island Feb 6, 2023, 11:10 PM

#

!e ```
if 1:
2
3

fallen slateBOT Feb 6, 2023, 11:10 PM

#

@feral island :x: Your 3.11 eval job has completed with return code 1.

001 |   File "<string>", line 3
002 |     3
003 |      ^
004 | IndentationError: unindent does not match any outer indentation level

deep nova Feb 6, 2023, 11:10 PM

#

I'm pretty sure that's covered in my solution

#

    if indentation not in self.indentation:
        raise SyntaxError('dedentation to inconsistent depth')

rose schooner Feb 6, 2023, 11:14 PM

#

deep nova I'm pretty sure that's covered in my solution

there's also a maximum level of indentation

deep nova Feb 6, 2023, 11:14 PM

#

HA

#

I did not know that

rose schooner Feb 6, 2023, 11:14 PM

#

it's 100 so it should be fairly impossible to reach

feral island Feb 6, 2023, 11:15 PM

#

!e exec("\n".join(" " * i + "if 1:" for i in range(102)) + " " * 102 + "pass")

fallen slateBOT Feb 6, 2023, 11:15 PM

#

@feral island :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 1, in <module>
003 |   File "<string>", line 101
004 |     if 1:
005 | IndentationError: too many levels of indentation

feral island Feb 6, 2023, 11:16 PM

#

I did not know about this

rose schooner Feb 6, 2023, 11:17 PM

#

so there's 2 numbers in the lexer to keep track of spaces and tabs in indentation

#

but since we're only dealing with tabs rn we can just keep it to one

rose schooner Feb 6, 2023, 11:21 PM

#

deep nova ```py def newline(self) -> Generator[Token, None, None]: yield self.token(T...

looks good but there should be handling for blank lines and lines with only tabs, comments, or both

deep nova Feb 6, 2023, 11:31 PM

#

By "handling"

#

Do you mean that no indents/dedents should be created for empty lines?

#

No tokens at all?

rich cradle Feb 6, 2023, 11:31 PM

#

deep nova Do you mean that no indents/dedents should be created for empty lines?

generally yes

deep nova Feb 6, 2023, 11:37 PM

#

Let's say I'm parsing my string so I can handle the escaped characters

#

What do I do if I encounter \u123?

#

Or rather, if I fail to consume exactly 4 or 8 hex digits?

#

Error? I'm considering allowing \u to be \u0000 implicitly, so I could always spit that out and then let the numbers just be raw text

#

That might be a bit confusing for the user though

rose schooner Feb 6, 2023, 11:54 PM

#

deep nova No tokens at all?

yep

#

newlines aren't generated for empty lines

#

like totally empty lines with nothing in them

#

but they are generated for lines with tabs, comments, or both

#

despite being ignored

deep nova Feb 7, 2023, 12:13 AM

#

Are they completely ignored? If so, why generate them?

rose schooner Feb 7, 2023, 12:14 AM

#

deep nova Are they *completely* ignored? If so, why generate them?

the REPL still counts them as part of the function body

deep nova Feb 7, 2023, 12:15 AM

#

What is a repl?

rose schooner Feb 7, 2023, 12:15 AM

#

wait nvm

deep nova Feb 7, 2023, 12:15 AM

#

XD I've heard the term, but never stopped to ask

rose schooner Feb 7, 2023, 12:15 AM

#

i understood the if statements wrong

rose schooner Feb 7, 2023, 12:15 AM

#

deep nova What is a repl?

read-evaluate-print-loop

#

it's what this is

deep nova Feb 7, 2023, 12:17 AM

#

Python has an array module!??!??!?!?!?!?!?!?!!?!!!!?!?!?!?!

rose schooner Feb 7, 2023, 12:18 AM

#

so blank lines generate a NEWLINE token
lines with only spaces/tabs/comments don't

deep nova Feb 7, 2023, 12:27 AM

#

Lines with only spaces/tabs/comments don't create anything, but completely empty lines produce newlines tokens without producing any indents/dedents?

#

Seems odd

#

But okay

#

It sounds to me as though, if I were to emulate this, the first thing I'd do on encountering a newline is determine if its empty

#

So...
Consume newline
Consume whitespace
Consume comment
Consume another newline

#

If all of these succeed, and nothing else is encountered, the line is empty?

rose schooner Feb 7, 2023, 12:30 AM

#

deep nova If all of these succeed, and nothing else is encountered, the line is empty?

the line is "empty" and only one NEWLINE is generated

#

unless the line before was also whitespace/comments only

deep nova Feb 7, 2023, 12:31 AM

#

Hmmm

rose schooner Feb 7, 2023, 12:32 AM

#

so basically

Consume whitespace
Consume? comment
Consume newline
``` would generate nothing

#

? means its optional

deep nova Feb 7, 2023, 12:32 AM

#

rose schooner `?` means its optional

Yeah

#

Actually, wouldn't it be...

#

Consume? comment
Observe? newline  <-- if observed, create newline. Don't consume, such
                      that the above can be performed for that next newline

rose schooner Feb 7, 2023, 12:41 AM

#

ok so i think i get it

#

it only matters in interactive mode/the REPL

#

otherwise it's ignored

rose schooner Feb 7, 2023, 12:48 AM

#

deep nova ```Consume whitespace Consume? comment Observe? newline <-- if observed, create...

so ```
Consume whitespace
Observe newline/comment
Consume? comment
Consume newline

#

whereas if there are stuff in the line ```
Consume? whitespace
Process? indentation
Consume and generate tokens
Consume newline
Generate NEWLINE

deep nova Feb 7, 2023, 12:53 AM

#

Ohhhhh man

#

I shouldn't have drank all that coffee O.o

deep nova Feb 7, 2023, 1:09 AM

#

So, I'm trying to think my way through this

#

I don't see any reason that an empty line, whether it contains nothing, whitespace, a comment, or whitespace and a comment, should produce any kind of token at all

deep nova Feb 7, 2023, 1:41 AM

#

    def newline(self) -> Generator[Token, None, None]:

        # consume leading whitespace
        whitespace = self.star(self.pipe(self.generic('\t'), self.generic(' ')))

        if self.observe() == '#' and self.advance():  # consume a comment

            while self.letter() or self.base10() or self.symbol():
                pass  # consume an optional comment

        elif (observed := self.observe()) == '\n' or not observed:
            pass  # line is empty

        elif self.parentheses:  # line is not empty, but is within parentheses
            pass

        else:

            yield self.token(Tokentype.NEWLINE)
    
            indentation = len(self.star(self.generic('\t')))  # count leading tabs
            
            if self.observe() == ' ':  # leading whitespace
                raise SyntaxError('leading whitespaces are prohibited (use tabs)')

            while self.indentation[-1] < indentation:  # create INDENT tokens

                self.indentation.append(len(self.indentation))
                yield self.token(Tokentype.INDENT)

            if indentation not in self.indentation:       # dedentation to unknown depth
                raise SyntaxError('dedentation to inconsistent depth')

            while self.indentation[-1] > indentation:  # create DEDENT tokens

                self.indentation.pop()
                yield self.token(Tokentype.DEDENT)

#

What a horrible function

rose schooner Feb 7, 2023, 1:51 AM

#

deep nova ```py def newline(self) -> Generator[Token, None, None]: # consume ...

why don't you consume until you observe a newline here ```py
if self.observe() == '#' and self.advance(): # consume a comment

        while self.letter() or self.base10() or self.symbol():
            pass  # consume an optional comment

deep nova Feb 7, 2023, 1:52 AM

#

That's what that block does

#

It consumes a letter, a number, or any symbol. Newlines aren't among those, so the consumption will cease when that's the case

#

    def newline(self) -> Generator[Token, None, None]:

        indentation = len(self.star(self.pipe(self.generic('\t'))))

        match self.observe():

            case '\n' | '\r':
                pass

            case ' ':  # mixed spaces and tabs
                raise SyntaxError('leading whitespaces are prohibited (use tabs)')

            case '#':  # empty line with a comment only

                self.advance()  # consume '#'
                self.comment()  # consume comment body

            case  _ :  # non-empty line

                if indentation < self.indentation[-1] and indentation not in self.indentation:
                    raise SyntaxError('dedentation to inconsistent depth')

                self.advance()
                yield self.token(Tokentype.NEWLINE)

                yield from self.indents(indentation)
                yield from self.dedents(indentation)

#

A bit better, for sure

warm breach Feb 7, 2023, 6:27 AM

#

@pliant tusk do you know if there's a Py_TPFLAGS_IMMUTABLE equivalent in <= 3.9 that's not Py_TPFLAGS_HEAPTYPE?

#

it seems changing HEAPTYPE as false isn't quite safe on heap types

#

like in 3.9

from fishhook import lock

class Foo:
    pass

lock(Foo)

Objects/typeobject.c:3682: type_traverse: Assertion failed: type_traverse() called on non-heap type 'Foo'
Enable tracemalloc to get the memory block allocation traceback

object address  : 000001FDC9ADD650
object refcount : 4
object type     : 00007FFEB260CC60
object type name: type
object repr     : <class 'Foo'>

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: finalizing (tstate=000001FDC8E12F50)

Current thread 0x00007b68 (most recent call first):
<no Python frame>

pliant tusk Feb 7, 2023, 2:00 PM

#

warm breach <@274715613115711488> do you know if there's a `Py_TPFLAGS_IMMUTABLE` equivalent...

There is not any equivalent. And yea, the TPFLAGS_HEAPTYPE is dangerous to leave toggled (which is why fishhook restores flags after it's done with adding its hook)