#internals-and-peps

1 messages · Page 12 of 1

feral island
#

honestly the whole thing about not using f-strings in logging mostly feels like premature optimization to me

halcyon trail
#

I mean, yeah, because we're currently in a situation where there's a trade-off between readability and optimization

#

I also just default to using f-strings for logging

deep nova
#

THIIIIIIIIIIIIS

halcyon trail
#

it's only premature optimization if you have to do extra work though or make compromises

#

if "the way" is both ergonomic and fast then it's not premature anything

grave jolt
# deep nova THIIIIIIIIIIIIS

I mean, I understand Agda using the literal letter λ for anonymous functions. But that's agda (and λ is actually good-looking and concise unlike lambda)

#

even haskell uses \x -> x

halcyon trail
#

writing logger.info { f"Hello {user}" } seems pretty nice to me

halcyon trail
# grave jolt even haskell uses `\x -> x`

I like the approach of swift and kotlin where, I imagine, they sat down on the very first day and said "okay, lambdas get the absolute best syntax in the language. Now that's done, let's look at everything else"

grave jolt
# halcyon trail writing `logger.info { f"Hello {user}" }` seems pretty nice to me

Hey, let's do it in the Enterprise Python Style, Extra Clean and Readable ™️ ```py
@logger.info
def log_hello_user_greeting() -> str:
"""
Log a greeting phrase mentioning the user's name, but
only if the logging verbosity is set to :logging.INFO or higher.
"""
user_to_be_greeted = user
greeting = "Hello"
return f"{greeting} {user_to_be_greeted}"

fallen slateBOT
#

Objects/descrobject.c lines 1268 to 1269

/* This has no reason to be in this file except that adding new files is a
   bit of a pain */```
feral island
#

that one makes a lot more sense than reversed in enumobject.c honestly

#

descrobject.c is for all the internal descriptors

warm breach
#

can you make a wrapper object in python?

#

or is it only for slot methods?

#

int.__add__ is a "<slot wrapper ..." though

#

this thing is supposed to be a "<method wrapper ..."?

feral island
#

__add__ is a slot

pliant tusk
#

!e print(type(1 .__add__))

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class 'method-wrapper'>
feral island
grave jolt
#

fuck, I pinned something

#

fixed

grave jolt
#

that's just good design, provide sensible defaults!

feral cedar
#

smfh. open to extension, closed to modification

halcyon trail
#

if we have learned anything about software engineering in the last 30 years

#

it's that inheritance is always the best

#

always

grave jolt
#

yeah

#

every human inherited stuff from their mom and dad

halcyon trail
#

it solves all problems, once and for all

feral cedar
grave jolt
#

You won't be able to modify this code because we need to book a meeting to make a change in a docstring. As that's potentially a breaking change for our documentation readers.

halcyon trail
#

I can't accept any enterprise code that doesn't include a factory

#

rejected

grave jolt
#

Isn't HelloUserGreeterLogger a log message factory?

halcyon trail
#

how are people supposed to create HelloUserGreeterLogger

#

hmm? hmmm?

grave jolt
#

There's obviously a metaclass mechanism

halcyon trail
#

do you want people to soil themselves with touching a concrete constructor

grave jolt
#

in LoggerProtocol

halcyon trail
#

😛

#

in a reddit thread recently people were levelling charges like this unironically at logging and it made me sad

#

"stinks of Java" 😦

grave jolt
# halcyon trail hmm? hmmm?
# noqa
from __future__ import annotations
from logrossmeister.utils import MetaLoggerProtocolFactoryProtocolRepositoryProtocolFactory

LogUserReturnTypeT = TypeVar("LogUserReturnTypeT", bound=None)

LOG_USER_SLEEPING_TIME_CONSTANT_SECONDS = 0.217

async def log_user(user_to_be_greeted: UserProtocol | LogUserReturnTypeT) -> LogUserReturnTypeT:
    if user_to_be_greeted is None:
        return user_to_be_greeted
    meta_logger_protocol_factory_protocol_repository =\
        await MetaLoggerProtocolFactoryProtocolRepositoryProtocolFactory.get()
    meta_logger_protocol_factory =\
        await meta_logger_protocol_factory_protocol_repository.get()
    await meta_logger = meta_logger_protocol_factory.get(HelloUserGreeterLogger)
    async with meta_logger.lock():
        await meta_logger.args.clear()
        await meta_logger.args.append_("user_to_be_greeted")
        await meta_logger.args.user_to_be_greeted = user_to_be_greeted
        loggable = await meta_logger.create_loggable(mongodb=True, async_=True, django=True)
        await loggable.log()
        await meta_logger.args.clear()
        await asyncio.sleep(LOG_USER_SLEEPING_TIME_CONSTANT_SECONDS)
halcyon trail
#

yes

grave jolt
#

now as an exercise, write a test suite for this function

halcyon trail
#

I feel you growing powerful. Now strike me down

#

and your journey to the dark side shall be complete

grave jolt
#

fixed*

grave jolt
#

(btw I'm mildly sorry for shitposting in this serious channel)

halcyon trail
#

also what's with your new icon thingie. is that a Rust reference

grave jolt
#

it's... complicated

halcyon trail
#

weird I thought I asked about your icon not a relationship status. discord are you ok

raven ridge
#

we were talking about the syntactic macros PEP the other day - it seems that this would be a reasonable use for macros in Python, actually. People want:
a) To be able to sprinkle logging code in their application without slowing it down
b) To be able to use f-strings for forming their log messages
c) To write their log statements in a way that's succinct and readable

If logging was macro-based, we'd be able to accomplish all 3, by wrapping log call arguments in an object that formats lazily automatically, so that writing py info!(logger, f"Guess what: {expensive_call()}") gets translated automagically to ```py
logger.info(LazyLoggingFormatter(lambda: f"Guess what: {expensive_call()}"))

Or hell, we could just do it with a lazy f-string macro in the first place: ```py
import! lazyformat as lf
logger.info(lf!"Guess what: {expensive_call()}")
#

@warm breach You were asking for examples of places where the syntactic macros proposal might be useful, and I think this is a pretty reasonable one.

grave jolt
#

or you could just pass in a lambda 🙂

halcyon trail
#

I agree that since lambdas have been botched too badly to be used for this, maybe macros could do instead

#

but does it justify macros, prob not (but other people will decide that)

raven ridge
halcyon trail
#

you pass in a lambda that returns a string when evaluated

#

i'm not sure why you need a format method

#

maybe if you want to keep structured data around, I suppose?

raven ridge
halcyon trail
#

to the function

#

logger.info(lambda: f"hello {expensive_call()}")

raven ridge
#

!e ```py
import logging
logging.error(lambda: "hello")

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

ERROR:root:<function <lambda> at 0x7fe166d52b60>
raven ridge
#

that already does something. The change you're proposing would be backwards incompatible.

halcyon trail
#

I wasn't seriously proposing it because lamdas in python are so ugly

#

but if you're proposing a new macro, then you can just as well propose new logger functions as well

raven ridge
#

I guess maybe it could be done if you subclassed Logger...

halcyon trail
#

or a new logger type

#

logger = logging.getLazyLogger(__name__)

#

etc

raven ridge
#

yeah. that could work.

halcyon trail
#

I kinda just think none of these things are actually worth the price of admission though

raven ridge
#

less nice than the macro solution, I think, for being error prone and less succinct, but...

halcyon trail
#

(for python, and in its current state)

raven ridge
#

well, possibly. I don't think there's been any real movement on that syntactic macros PEP in a long time. I'm not sure why it came up again the other day - maybe I'm wrong and it came up here because people were discussing it elsewhere?

halcyon trail
#

if python adds macros then the universe will probably end in a Greenspun's tenth rule explosion though

raven ridge
#

I'm not really sure that it's worth the cost to add macros to Python, but I think this is an interesting example of a place where they'd allow us to do something that's quite ugly without them. Automagically wrapping some code up in a function to delay evaluation is something that macros could do, where the alternative is extra code pushed into every call site.

halcyon trail
#

I am less down on macros since the last time we discussed this, insofar as I think they work well in Rust.
macros in a dynamically typed, non-lisp just fundamentally makes me sad because if I was willing to sacrifice static typing I could already have had so much nicer macros

warm breach
halcyon trail
warm breach
#

you would get the ast of a string

halcyon trail
#

you could probably also define the macro to define a local function and pass it in

#

you'd probably want to do that in fact so you never have any artificial "one line" restriction

#

so "macros as a hack around poor lambdas" is I suppose a legitimate selling point

raven ridge
# warm breach you would get the ast of a string

I haven't thought too hard about it, but I don't see why not? You'd take the AST of that string, and you'd wrap it up in the AST of a function call to construct a type whose __format__ evaluates and returns that string

warm breach
#

oh actually I think it would? but you'd need to provide your own ast I guess

#

!e since python ast would parse it without the field

import ast

print(ast.dump(ast.parse('"Guess what: {expensive_call()}"')))
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

Module(body=[Expr(value=Constant(value='Guess what: {expensive_call()}'))], type_ignores=[])
warm breach
#

but yeah that'd be a nice use case if it works

raven ridge
#

I'm not sure whether the syntax would be nice with PEP 638 as it's proposed, but it's certainly something that macros could do in principle

warm breach
#

yeah rust has f!() that does that

#

I think it's lazy?

raven ridge
#

transforming the AST for f"Guess what: {expensive_call()}" into the AST for LazyFormat(lambda: f"Guess what: {expensive_call()}") doesn't seem like a big lift, as far as AST rewriting goes

raven ridge
warm breach
#

then use whatever f strings inside you want

#

but that's a bit verbose for a lot of inline stuff

#

in any case I think the function call to logging will take longer than any non-lazy f string

raven ridge
#

the performance advice might be more reasonable if it weren't for the fact that arguments get evaluated eagerly anyway - so logging.debug("result: %s", some_expensive_call) saves the cost of the interpolation, but not the cost of the expensive call

warm breach
#

but I don't really see that as too common

raven ridge
#

yeah, it's much more common that the call is expensive than that str() on the result of the call is.

#

well, I dunno. big dicts are slow to stringify, I guess.

grave jolt
#

!e
Speaking of templating, I have invented this hack to emulate jinja-style {% if %}s

class Yes:
    def __format__(self, spec): return spec.strip()
class No:
    def __format__(self, spec): return ""

template = """
thing.on("userLogin", (user) => {{
    {alert_sentry:
      sentry.send(`Login. ${{user.name}}`)}
    {alert_log:
      console.log(`Login. ${{user.name}}`)}
    user.confirmLogin()
}})
"""

print(template.format(alert_sentry=Yes(), alert_log=No()))
fallen slateBOT
#

@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 
002 | thing.on("userLogin", (user) => {
003 |     sentry.send(`Login. ${user.name}`)
004 |     
005 |     user.confirmLogin()
006 | })
grave jolt
#

the power is truly terrifying

lone sun
#

He doesn't actually prove that Python is non-context-free. He just says, "That makes the language context sensitive, in my opinion." I'm not going to deny him his right to an opinion, but either it is or it isn't, and he hasn't given a sufficient argument one way or the other.

However, there is a perfectly simple mathematical proof of a toy version of this. If I remember correctly, a language of the form {s^3 : s in Σ^*}, where Σ is some alphabet, is not context-free (as long as the alphabet has more than one character). (I think the 3 in the exponent is right, but it might be something else?) This is a consequence of the pumping lemma for context-free languages. It follows that similar languages, like {stsusv : s, t, u, v in Σ^*}, are also not context-free. So you can't recognize that three consecutive lines begin with the same string of whitespace using a context-free grammar.

This doesn't mean that we're using the wrong tools. The type of context-sensitivity we need is quite simple. You just need to remember what the leading whitespace of the most recent line was and update it as necessary (pushing or popping; it's a stack). And sure, in principle stacks let you do interesting computations, but in practice we're really not doing much.

deep nova
flat gazelle
plush dragon
#

Genuine question, what do you think are the top most essential peps to know to code collaborate with python? I'm thinking like pep8 and pep20 at least. Since they talk about how python coders think. Do you guys know some more "easter eggs" or must know peps drink and eat and breath all day? I always like pep8/20. Maybe one more to add to my bookmark if you have any

#

Sorry if my English is a bit broken, it's my 2nd language

#

Im still learning

feral island
#

Most PEPs aren't really relevant to normal coding, they are change proposals that were either accepted or rejected. You're usually better off reading the documentation at docs.python.org. PEP 8 and 20 are unusual in that regard

#

Most PEPs are relevant only if you are actually working on the development of the language or interested in language design

plush dragon
#

Ohh. Thank you! @feral island

flat gazelle
#

Ah, thanks

surreal sun
#

reading through PEP 638 (syntactic macros) i never rlly got the purpose of them

#

from what i'm understanding they change the AST and u can do stuff like DSLs and other cool stuff with it right?

#

but how is it even defined bc i'm not rlly understanding it in the PEP, is it just a function that changes the ast based on the ast node

gray galleon
surreal sun
#

ohh

deep nova
#

It's gonna take me a few weeks to understand this grammar of python's

#

But I'm starting to go through it. I'm still a bit curious about the distinction between a compound statement and a simple statement

#

As best I can tell — it all comes down to the semicolon?

rich cradle
#

Compound statements contain (groups of) other statements; they affect or control the execution of those other statements in some way. In general, compound statements span multiple lines, although in simple incarnations a whole compound statement may be contained in one line.
https://docs.python.org/3/reference/compound_stmts.html
A simple statement is comprised within a single logical line. Several simple statements may occur on a single line separated by semicolons.
https://docs.python.org/3/reference/simple_stmts.html

deep nova
#

So you can do things like from some_module import thing; thing.func()

deep nova
#

This makes so much more sense then the other thing I read

#

Its a compound statement because it literal is compounded from multiple other statements

#

Another question, while I'm here

#

I think I understand that this group of rules:

#
single_input: NEWLINE | simple_stmts | compound_stmt NEWLINE;
file_input: (NEWLINE | stmt)* EOF;
eval_input: testlist NEWLINE* EOF;
#

Just so I totally understand...

#

file_input: (NEWLINE | stmt)* ENDMARKER (from the actual python grammar this time, not a knockoff version)

A file consists of any number of statements and newlines, followed by an end marker. How does this relate to the "flattening of simple statements"? Is it that I collect a sequence of semi-colon-delimited simpler statements in a single pass, but in the resulting AST they should not be grouped within simple-statement collections but rather directly as children of the main File node?

#

Ahhh, here we go: file[mod_ty]: a=[statements] ENDMARKER { _PyPegen_make_module(p, a) }

#

In this one, there is no or NEWLINE clause. Does this mean that every statement will have its own rules for consuming a newline at its termination?

deep nova
#

I'm having a hard time understanding star_expressions

gray galleon
#

star expressions = star + expression

#

i think star have the same precedence as unary operators

rose schooner
rose schooner
#

precedence just below a bitwise OR expression

gray galleon
#

wait so [*3+3] is parsed as [*(3+3)]?

#

til

deep nova
#

I've never seen a star used as a unary operator

#

I've seen it used in iterable unpacking, and that's what I assuming it was

gray galleon
raven ridge
#

!e ```py
x = "foo"
print(*x + "bar")

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

f o o b a r
raven ridge
#

so yep, parsed as *(x + "bar")

deep nova
#

I have absolutely no idea what's happening

raven ridge
#

it pretty much has to be, right? Parsing it as (*x) + "bar" wouldn't make sense.

sour thistle
deep nova
#

Ohhhhh

#

So, by way of operator precedence

#

*x + "bar" becomes *(x + bar)

sour thistle
#

pretty much

deep nova
#

So, in the grammar, is the star operator (wrt to unpacking) the lowest precedence operation in the chain and hence, referenced constantly as the go-to for any type of expression in that chain?

#

(Hoping that makes sense)

unkempt rock
#

does anyone knw what this means

#

this keeps popping up randomly for me

#

after i close

sour thistle
deep nova
#

🧠

sour thistle
deep nova
#

O.O There's a more on topic channel than here to ask about python's grammar?

sour thistle
#

redd's questions is not about python grammar at all

deep nova
#
assignment:
    | NAME ':' expression ['=' annotated_rhs ] 
    ...alternatives

annotated_rhs: 
    | yield_expr 
    | star_expressions

yield_expr:
    | 'yield' 'from' expression 
    | 'yield' [star_expressions] 

star_expressions:
    | star_expression (',' star_expression )+ [','] 
    | star_expression ',' 
    | star_expression

star_expression:
    | '*' bitwise_or 
    | expression

bitwise_or:
    | bitwise_or '|' bitwise_xor 
    | bitwise_xor
bitwise_xor:
    | bitwise_xor '^' bitwise_and 
    | bitwise_and
bitwise_and:
    | bitwise_and '&' shift_expr 
    | shift_expr
shift_expr:
    | shift_expr '<<' sum 
    | shift_expr '>>' sum 
    | sum

sum:
    | sum '+' term 
    | sum '-' term 
    | term
term:
    | term '*' factor 
    | term '/' factor 
    | term '//' factor 
    | term '%' factor 
    | term '@' factor 
    | factor

...and so on, all the way down to atomics
#

I just want to make sure I'm interpreting this correctly. NAME ':' expression ['=' annotated_rhs ] translates to:

Name and type-annotation (which may be an expression) optionally followed by = some-kind-of-expression

#

So I can have an "assignment" that doesn't assign anything but rather just declares, such asa: int

#

And then there could be a right hand side to it, the value of which will be some kind of expression. The top-level expression rule in this case seems to be annotated_rhs which degrades into yield or starred, etc

raven ridge
#

that all sounds right

deep nova
#
assignment:
('(' single_target ')' 
         | single_subscript_attribute_target) ':' expression ['=' annotated_rhs ] 

So, the left-hand side could be a single-target between parentheses OR a single_subscript_attribute_target (which I assume to be something like a.b.c or a[6].someattr). But single_target degrades directly into single_subscript_attribute_target as well as into '(' single_target ')'

#

Isn't all that a) wildly confusing and b) pointless? Wouldn't just saying single_target not cover all of this?

raven ridge
#

no, that wouldn't allow parentheses

#

!e ```py
(x): int = 4
print(x)

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

4
deep nova
#

I've never really see putting objects in parentheses for assignment

#

The only use-case I can think of might something like ```
a, (b, c), d = 1, (2, 3), 4

#

Is the point to allow for recursive assignment to nested targets?

raven ridge
#

Black reformats x,=4 as (x,) = 4

#

I've never really seen (x) = val, but I suppose it makes sense to allow it as a degenerate case of (x, y, z) = val

deep nova
#

The grammar showcased here is totally different from the one I see on Github

#

Is it modified for better readability, while the "real" grammar is designed for efficiency or something?

grave jolt
rose schooner
deep nova
#

I'm sure it'll become plain soon enough, but

#

Why are the top-level expressions in assignment yield_expression and starred_expression?

#

Why those two and not some others?

gray galleon
#

because they are the only expressions i think

rose schooner
#

actually it's star_expressions, not starred_expression/starred_expressions

gray galleon
#

but you shouldn't be able to do```
a = *b

grave jolt
#

make it dereference b brainmon

rose schooner
#

yield_expression basically just allows for top-level expression assignment to a yield py a = yield b which is useful when you wanna receive values from outside that uses .send()

rose schooner
gray galleon
rose schooner
#

you can do a = *b,

deep nova
#

That explains what those expressions are, but why are they the top level expressions (the ones which degrade into all others) of all the assignment statements?

gray galleon
rose schooner
gray galleon
rose schooner
rose schooner
#

the actual tuple rule uses parentheses

rose schooner
grave jolt
fallen slateBOT
#

@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.

42
raven ridge
deep nova
#

This is what I needed to know XD

raven ridge
#

Something needs to be at the top level, and the only thing that's special about that top level thing is that it needs to be able to match all the other things

deep nova
#

I was hoping/suspecting as much. I just wanted to check to make sure there wasn't anything particularly special or complicated about those expression categories in particular

#

Personally, I'd favor a top-level-expression rule, or simply reserver the term expression for that purpose

#

I love python, I really do, but it's internals are some of the least semantic code I've seen in my life

raven ridge
deep nova
#

Nope

#

Not even close

#

I had no idea what that was supposed to mean

raven ridge
#

It's the right hand side argument of an annotated assignment

deep nova
#

assignment_rhs or simply expression would have gotten the point across much better

#

annotated_rhs might have made more sense if the rule name was annotated_assignment. In starting in on trying to understand the rule (whose name was assignment) there was no clear indication that that particular part of the rule references annotated assignment specifically. In the context of assignment as a broader set of rules, using the term annotated_rhs was quite confusing. In fact, not use separate rules for the different types of assignment is confusing af

#

That particular set of rules looks more like something a machine would have spit out after having digested and optimized a much clearer, semantically focused equivalent

raven ridge
#

I suspect it's not as easy as you'd imagine to come up with good names for each of the intermediate productions in a grammar

deep nova
#

Oh, I'm sure its a total pain

raven ridge
#

In fact, the rule you posted above had an expression in it as well

#

The type annotation of an annotated assignment is matched by expression

deep nova
#

That proves my point though

#

Why is annoated_rhs (itself an analogue for yield_expression | star_expressions) different from just expression? What about the former is different from the later, and what makes one the required rule for an annotation and but not an assignment right-hand?

raven ridge
#

They match different stuff

deep nova
#

My point is that what they match and why that particular entry point to the expression-fission-chain is used there should be obvious.

raven ridge
#

People would say that both the thing after the : and the thing after the = are expressions. Trying to come up with different words for "ok, this is an expression, but it's not an expression that can start with yield or *" for every one of these hundreds of productions isn't easy. Tack on to that the fact that this grammar evolved over time - it's reasonably likely that at one point expression was at the top level, and then new changes to the grammar required a new production above expression

deep nova
#

Ehhhhhhhhhhhhhhhhhhhhhhhhh

#

I mean, yeah, sure

raven ridge
#

yield was added in, what, 2.5?

deep nova
#

But the grammar is the literal definition of the language in so far as such a thing might exist. Confusion is not an option

raven ridge
#

And * unpacking for assignments was added even later than that, I think

deep nova
#

Besides, we're smart people. We've all written oodles of essays and technical documents. Python is run by a steering committee and is peer reviewed out the wazoo I'm sure. I'm not sure that "its too hard to keep straight" is a reasonable rational for a confusing document

raven ridge
#

I didn't say that it's too hard to keep straight, I said that the names are essentially arbitrary by virtue of the fact that grammars force you to choose way too many names, and that evolution over time accounts for cruft

raven ridge
deep nova
#

That's why I said "in so far as such a thing exists"

#

Anyway, I've got no particular loathing towards the document. I do think it could use a sprucing up, though

raven ridge
#

🤷‍♀️ go for it 🙂

deep nova
#

I mean

#

In writing my own language's grammar, that's basically what I'm doing

#

So gimme a month or two, and I'll get back to you I suppose XD

raven ridge
#

CPython is open source. If you see ways to improve on the grammar, send PRs. If the core devs agree that they're improvements, they'll get merged.

deep nova
#

Hmmmmmmm

#

Rewriting the grammar for readability does sound like a fun time...

#

And it would be a good excuse to learn the beast inside and out

#

While you're here — one quick question

#

Why is the grammar shown here in the "docs" different (very different) from the one in the actual grammar file?

raven ridge
#

Dunno

#

It might be simplified for readability, or it might be a place where docs didn't keep up with changes to implementation details

deep nova
#

Or both?

raven ridge
#

Possibly

deep nova
#

Cool. I just wanted to know if there was a method to the madness

raven ridge
#

You might check if the grammar in the docs matched the old, non PEG, grammar more closely

deep nova
#

I love the peg grammar's syntax

#

Maybe not the actual content, but the syntax is quite graceful

warm breach
#

every python variable access is already an implicit dereference

#

oh though maybe, * of ints dereferences the pyobject at that address?

#

super cursed

#

!e

from einspect.structs import PyObject
from einspect import impl

@impl(int)
def __iter__(self):
    return iter((PyObject.from_address(self).into_object(),))

x = id("hello")

print(x)
print(*x)
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 140716785662064
002 | hello
deep nova
#

WHAT!?

#

When did this happen!?

gray galleon
#

!pypi lark

fallen slateBOT
graceful pelican
#

hope this is the right channel to ask this: does python bind methods when they are accessed, or when the class is instantiated?

class Test:
  def test(self): pass

Test().test # does this bind `test` to `Test()`, or is it already bound?
dusk comet
fallen slateBOT
#

@dusk comet :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | <function Test.test at 0x7f8330ea5c60>
002 | <bound method Test.test of <__main__.Test object at 0x7f8330c643d0>>
003 | <bound method Test.test of <__main__.Test object at 0x7f8330c64410>>
graceful pelican
#

that doesn't answer my question, i guess what i'm trying to ask is does the LOAD_ATTR instruction bind the method to the receiver somehow, or does it load an already bound method?

#

i'm not familiar with cpython source code, so i thought maybe someone already familiar could answer that question or point me in the right direction

graceful pelican
prime estuary
#

It's when they're accessed yes.

#

They're "just" regular functions in the class.

graceful pelican
#

okay, thank you

prime estuary
#

All through the magic of the descriptor protocol, same way @property works.

rose schooner
#

it's specialized for loading an attribute that is to be called

prime estuary
#

That's only when it's immediately called, but it's not really too important - just an optimisation it should change semantics or be observable from your code.

deep nova
#

Hey guys, I'm trying to understand a nuance in Python's Grammar's syntax

#
assignment[statement] ::=
    | targets=( (t=target_list '=' {t})+ ) expr=expression {{ parse_assignment(targets, expr) }}
#

From my own grammar, trying to emulate python's.

The issue stems from needing to repeat (target_list '=') + as so, but, only wanting to actually collect the target_list node, ignoring the consumed =

#

So, if I'm interpreting this right...

#

(t=target_list '=' {t})+ says "each time I collect a target_list followed by an =, bind it to the name t and collect that (as opposed, well, I don't really know)"

#

All the ts get collected via while loop and put into a collect, which is then bound to the name targets?

deep nova
#

Rebuilding python's parser and parser generator is exactly what I want to do. It's absolutely exhilarating, and, it'll look great on my github

#

And I'll have superpowers when I'm done

gray galleon
#

in seriousness
removing punctuations like that should be done at ast generation

#

the job of the parser
is to parse

deep nova
#

_>

#

<_<

#

Yes, I agree. Hence the grammar

gray galleon
deep nova
#

This is a grammar of my own design, but I think I'm sticking pretty close to Python's

gray galleon
#

actually i still don't know what your problem is

deep nova
#

Python's grammar is basically a programming language unto itself

#

It has variables and function calls. Its really quite beautiful

#
some_rule ::=
    | a=sub_rule_1 sub_rule_2, b=sub_rule_3 {{ parse_some_rule(a, b) }}
``` This contains everything the parser generator needs to know to build the parser, including how to bind the results of calling some other rule to a name, and, how to pass the collected child nodes to the desired parsing function
#

But things are a bit weird when you're collecting one-or-more or zero-or-more of something

#
some_rule ::=
    | a=( some_other_rule * ) {{ parse_some_rule(a) }}
``` I *assume* this basically says "collect zero or more of `some_other_rule` and place them in a collection. Bind that collection to the name `a`, and pass that collection on". Alright, easy enough
#

What about this?

#
some_rule ::=
    | a=( ( some_other_rule '=' ) * ) {{ parse_some_rule(a) }}
#

Collect some rule zero or more times, as before. But you're also consuming a token. What, then, are the elements of a? Tuples of the form tuple[some_other_tule, Token]? Does the parser implicitly ignore the collected token? Maybe it will only place into the "result" of a repeated group items that are named ( (x=some_other_rule some_ignored_rule) * )

#

That's what I'm asking about. What I think it's doing is this: ( (x=some_other_rule some_ignored_rule {x}) *)

#

Basically, any parenthesized group can have a return statement {something} at the end. If so, the "result" or "contents" of the parenthesized group will be those items in the return statement.

#

I think.

deep nova
#

I think I get it now. yield_expr and star_expressions are not necessarily on top of the expression fission chain, but they require special handling in the context of assignment (and maybe in a few other cases)

rose schooner
grave jolt
#

🙂

grave jolt
deep nova
#

Oh I see

#

Hehe

deep nova
#

I can certainly see why a yield expression requires special handling. It has "limited usage" in that it can only appear as part of an assignment expression, or, in a yield statement (which is probably just a statement wrapper around a yield expression — but I havn't looked)

feral island
#

!e def f(): (yield x) + a((yield y))

fallen slateBOT
#

@feral island :warning: Your 3.11 eval job has completed with return code 0.

[No output]
feral island
#

this is perfectly legal

deep nova
#

Damn XD

#

This grammar is confusing!

#

Here I though I'd figured it out

feral island
#

I haven't looked at the formal grammar but a weirdness around yield is that it sometimes needs extra parentheses; e.g., f(await y) is allowed but f(yield y) is not

#

Might have something to do with yield without an argument being legal

deep nova
#

My gut instinct is that that's an inconsistency. But I don't really know enough it so say anything

grave jolt
#

!e

def f():
    g(yield 42069)
fallen slateBOT
#

@grave jolt :x: Your 3.11 eval job has completed with return code 1.

001 |   File "<string>", line 2
002 |     g(yield 42069)
003 |       ^^^^^
004 | SyntaxError: invalid syntax
grave jolt
#

yo wtf

feral island
grave jolt
feral island
#

!e def f(): yield - 5 print(list(f()))

fallen slateBOT
#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

[-5]
grave jolt
feral island
#

!e def f(): print((yield - 5)) print(list(f()))

fallen slateBOT
#

@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | None
002 | [-5]
deep nova
#

I don't suppose there's a detail, thorough write up of python's grammar anywhere?

#

Not the just the grammar itself, but a description of how and why it works?

feral island
deep nova
#

Awesome

#

This is a start, at least 🙂

#

Better question — is there any one here who understands the grammar quite deeply from whom I could buy or borrow an hour or two

vernal girder
#

Gang

halcyon trail
#

Might want to start with a question or two instead of asking right out for an hour or two...

deep nova
#

I have zillions of questions, which I've been asking one by one

#

But I feel like its going to take far longer, for all parties, to do it that way

halcyon trail
#

maybe, it's just a huge ask for a place like this to essentially ask someone for a commitment.
generally you just ask questions and whoever feels like answering, answers. If they enjoy the convo and want to keep answering, they'll do it. If not, they stop answering whenever.
Obviously you can still ask for an hour or two but IME in places like this, usually writing something like that results in crickets. Best of luck though.
btw I think you changed it to "buy or borrow" - now I have to ask, what rate are you offering 😛

deep nova
#

Whatever the standard rate for such a thing is, I suppose?

#

Whatever the purveyor of the knowledge thought was fair and appropriate

raven ridge
#

!rule 9 regardless

fallen slateBOT
#

9. Do not offer or ask for paid work of any kind.

deep nova
#

Fair enough :3

#

Sorry

deep nova
#

With respect to specific questipons

#
    | a=('(' b=single_target ')' { b } | single_subscript_attribute_target) ':' b=expression c=['=' d=annotated_rhs { d }]
#

I don't think I understand this. Line 150 of the grammar

#

The assignment target can be either a single target in parentheses or a single_subscript_attribute_target. The latter does pretty much what you'd expect it to do. The former can either be another single_subscript_attribute_target or an identifier, or itself in parentheses

#

Looking closer, this is one of two rules for annotated assignment. It seems to allow for (a) : int = 1, ((a)) : int = 1, a.b.c : int = 1 and such. The seems very strange

raven ridge
#

why?

#

given that (a, b) = (b, a) is allowed, there's no particular reason to disallow (a) = (b)

deep nova
#

I guess I'm a bit confused about parentheses wrapped around assignment targets in the first place

#

I understand it in the case of a (b, c), d = 1, some_iterable, 2

raven ridge
deep nova
#

Well I don't know

#

But I've never seen such a thing before, and I guess I just don't understand the usefulness, except in the case I mentioned earlier

raven ridge
#

Zig allows it... Rust allows it... Nim allows it... I'm not sure that I've ever seen a language that doesn't.

deep nova
#

That line seems to be doing one of two things: wrapping a single_target (a name, a subscripted-target of some kind, another single_target in parens) in parentheses; OR, directly supplying a subscripted-target w/o parens

#

Taken in conjunction with the rule above, which uses only a NAME token as the target, you're able to annotate either a name, a subscripted target, or either nested arbitrarily deeply within parens

#

Why its broken into two rules, I can't see

#

All told, as best I can tell, the two rules seem to specify the lhs-cases of single-target assignment with annotation

raven ridge
deep nova
#

The first two cases here

#
# NOTE: annotated_rhs may start with 'yield'; yield_expr must start with 'yield'
assignment[stmt_ty]:
    | a=NAME ':' b=expression c=['=' d=annotated_rhs { d }] {
        CHECK_VERSION(
            stmt_ty,
            6,
            "Variable annotation syntax is",
            _PyAST_AnnAssign(CHECK(expr_ty, _PyPegen_set_expr_context(p, a, Store)), b, c, 1, EXTRA)
        ) }
    | a=('(' b=single_target ')' { b }
         | single_subscript_attribute_target) ':' b=expression c=['=' d=annotated_rhs { d }] {
        CHECK_VERSION(stmt_ty, 6, "Variable annotations syntax is", _PyAST_AnnAssign(a, b, c, 0, EXTRA)) }
    | a[asdl_expr_seq*]=(z=star_targets '=' { z })+ b=(yield_expr | star_expressions) !'=' tc=[TYPE_COMMENT] {
         _PyAST_Assign(a, b, NEW_TYPE_COMMENT(p, tc), EXTRA) }
    | a=single_target b=augassign ~ c=(yield_expr | star_expressions) {
         _PyAST_AugAssign(a, b->kind, c, EXTRA) }
    | invalid_assignment
raven ridge
#

by "first two cases" you mean this is 1:

    | a=NAME ':' b=expression c=['=' d=annotated_rhs { d }] {
        CHECK_VERSION(
            stmt_ty,
            6,
            "Variable annotation syntax is",
            _PyAST_AnnAssign(CHECK(expr_ty, _PyPegen_set_expr_context(p, a, Store)), b, c, 1, EXTRA)
        ) }

and this is 2:

    | a=('(' b=single_target ')' { b }
         | single_subscript_attribute_target) ':' b=expression c=['=' d=annotated_rhs { d }] {
        CHECK_VERSION(stmt_ty, 6, "Variable annotations syntax is", _PyAST_AnnAssign(a, b, c, 0, EXTRA)) }

?

deep nova
#

Yeah

raven ridge
#

and you're asking why those aren't merged into one case with a more complicated pattern for a?

deep nova
#

Well, I was more just remarking in passing

#

It looks to me like this rule could have been expressed as ```
| a=(NAME | '(' b=single_target ')' | single_subscript_attribute_target)
':' b=expression c=['=' d=annotated_rhs {d}]

raven ridge
#

they seem to pass different values to the _PyAST_AnnAssign - do they result in different ASTs?

deep nova
#

Oh, you're right. I have no idea what those arguments are for (I havn't gotten that far yet)

raven ridge
#

!e ```py
import ast
print(ast.dump(ast.parse("x: int = y")))
print(ast.dump(ast.parse("(x): int = y")))

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | Module(body=[AnnAssign(target=Name(id='x', ctx=Store()), annotation=Name(id='int', ctx=Load()), value=Name(id='y', ctx=Load()), simple=1)], type_ignores=[])
002 | Module(body=[AnnAssign(target=Name(id='x', ctx=Store()), annotation=Name(id='int', ctx=Load()), value=Name(id='y', ctx=Load()), simple=0)], type_ignores=[])
deep nova
#

They seem to differ in their simple argument, which leads me to suspect different handling (likely on account of one target being just an identifier, the other being a compound object)

raven ridge
#

!d ast.AnnAssign

fallen slateBOT
#

class ast.AnnAssign(target, annotation, value, simple)```
An assignment with a type annotation. `target` is a single node and can be a [`Name`](https://docs.python.org/3/library/ast.html#ast.Name "ast.Name"), a [`Attribute`](https://docs.python.org/3/library/ast.html#ast.Attribute "ast.Attribute") or a [`Subscript`](https://docs.python.org/3/library/ast.html#ast.Subscript "ast.Subscript"). `annotation` is the annotation, such as a [`Constant`](https://docs.python.org/3/library/ast.html#ast.Constant "ast.Constant") or [`Name`](https://docs.python.org/3/library/ast.html#ast.Name "ast.Name") node. `value` is a single optional node. `simple` is a boolean integer set to True for a [`Name`](https://docs.python.org/3/library/ast.html#ast.Name "ast.Name") node in `target` that do not appear in between parenthesis and are hence pure names and not expressions.
raven ridge
#

literally just a flag to tell you if it was or wasn't just a name.

deep nova
#

Ahhhhhhh, there it is

deep nova
deep nova
#
yield_expr[expr_ty]:
    | 'yield' 'from' a=expression { _PyAST_YieldFrom(a, EXTRA) }
    | 'yield' a=[star_expressions] { _PyAST_Yield(a, EXTRA) }
#

So, you can yield from a singular expression

#

Or, you can yield many expressions, comma separated, some of which may be starred?

#

Is there any reason for this? The behaviour I'd have expected from yield from 1, 2, 3, 4 would be to automatically convert the multiple values into a tuple, and yield them all

raven ridge
#

yield from 1, 2, 3, 4 isn't valid syntax at all.

#

possibly because it's not obvious whether it should be parsed as (yield from 1), 2, 3, 4 or as yield from (1, 2, 3, 4)

deep nova
#

Hmmmmm

#

That seems to be the consensus on the other server as well

gray galleon
#

yield from expressions?

deep nova
#

Wait, so

#

If yield from 1, 2, 3, 4 is ambiguous because it could mean yield from (1), 1, 2, 3 or yield from (1, 2, 3, 4)

#

Why is yield 1, 2, 3, 4 not ambiguous? It could mean the same thing

deep nova
#
primary[expr_ty]:
    | a=primary '.' b=NAME { _PyAST_Attribute(a, b->v.Name.id, Load, EXTRA) }
    | a=primary b=genexp { _PyAST_Call(a, CHECK(asdl_expr_seq*, (asdl_expr_seq*)_PyPegen_singleton_seq(p, b)), NULL, EXTRA) }
    | a=primary '(' b=[arguments] ')' {
        _PyAST_Call(a,
                 (b) ? ((expr_ty) b)->v.Call.args : NULL,
                 (b) ? ((expr_ty) b)->v.Call.keywords : NULL,
                 EXTRA) }
    | a=primary '[' b=slices ']' { _PyAST_Subscript(a, b, Load, EXTRA) }
    | atom
#

What is this? | a=primary b=genexp { _PyAST_Call(a, CHECK(asdl_expr_seq*,

#

It looks like a.b.c[i for i in range(10)] or some such

gray galleon
fallen slateBOT
#

@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.

(1, 2, 3, 4)
gray galleon
#

yield takes precedence over ,

#

not sure what happened with yield from

radiant garden
#

makes a little more sense when thinking of it as await

raven ridge
fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

<generator object <genexpr> at 0x7f97157d81e0>
raven ridge
#

As a special case, Python lets you pass a generator expression to a callable using one set of parentheses instead of 2, as long as it's the only argument

deep nova
#

Ohhhhhh, yeah, that sounds right

#

Nice

rose schooner
gray galleon
#

why does python raise an exception to indicate the end of iteration?
why don’t iterators have a .done() or .running() method to check the state of the iterator and design for loops (of any other iteration structure) around it?
raising and catching exceptions are expensive but this is just a function call plus a boolean check

dusk comet
gray galleon
#

still raising an exception ig
just that normal iteration doesn’t have to involve exceptions

dusk comet
#

There should be one-- and preferably only one --obvious way to do it.

gray galleon
feral island
fallen slateBOT
#

Objects/listobject.c line 3235

listiter_next(_PyListIterObject *it)```
feral island
#

so the caller only has to check that the return value is not NULL, then call PyErr_Occurred (a few pointer comparisons)

gray galleon
feral island
#

the general case is that 99% of the time the iterable is consumed in C code

#

instead of calling next() directly

gray galleon
#

ok

gray galleon
feral island
#

and now you have a new set of possible bugs where .done() is out of sync with whether __next__() throws

warm breach
gray galleon
#

uh no
that is the pseudocode that cpython has to implement

quick trellis
#

hey sorry to bother you, anyone experimented with building a c extension on nixos?
i can't get it to find the header files and functions

quick trellis
#

inactive

grave jolt
#

A channel won't magically become active if you don't post anything to it 🙂

#

C extensions are indeed not the hottest thing these days

quick trellis
#

oh wow neat observation

#

i did

#

lol

cursive wharf
#

@sacred yew thanks, TIL #c-extensions is a thing on this Discord server.

warm breach
#

there's not really try, the iter function just returns NULL when it's exhausted

warm breach
#

so realistically you would need next() to return StopIteration instead of raising perhaps. But that would complicate most nested code. And since 3.11 trys are quite a bit faster than conditionals

elder blade
spark magnet
elder blade
#

Yeah I would think an extra Python function call is much more expensive than just setting a variable and walking back a stack (calling the function would require working with the stack anyways)

deep nova
#

Do any of y'all know how the parser's packrat-left-recursion hack works? I can't find any good non-academic articles on it

warm breach
#

should python have a mutable string class?

#

!e

from sys import getsizeof


s = "hm🤔" * 100_000

print(getsizeof(s) // 1000, "KB")

ls = list(s)
items = sum(map(getsizeof, ls))
print((items + getsizeof(ls)) // 1000, "KB")
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 1200 KB
002 | 20400 KB
warm breach
#

this list of characters of a 1.2MB str is 20.4MB

pliant tusk
warm breach
#

that wouldn't handle utf-8 chars though 😔

feral island
#

you can write your own library to provide a memory-efficient mutable Unicode string class

#

if it becomes very widely useful it can be added to the stdlib. Personally I haven't often seen a need for it

gray galleon
warm breach
#

!e

from string import ascii_lowercase

s = "🐍🤔" * 1000

ls = list(s)

print(len(set(map(id, ls))))
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

2000
warm breach
#

is there a reason we can't intern non-ascii strings

gray galleon
#

it is not known at compile time
so it can’t intern

warm breach
#

!e

s = eval("'abc123'")
s *= 1000

ls = [s[i:i+5] for i in range(995)]
print(len(set(ls)))
print(len(set(map(id, ls))))
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 6
002 | 995
warm breach
#

hm curious, I guess we don't dynamically intern in this situation

prime estuary
#

This isn't interning, Python has singletons for ASCII characters similar to -1-255 ints. We could have more singletons, but that'd mean a massive array to put them in...
Interning is only done normally for strings that are valid identifiers.

#

And it's done during compile time

warm breach
#

right yeah, but we could dynamically intern strings? as an optimization

prime estuary
#

Well it'd be a pessimisation most of the time.

warm breach
#

you'd be looking at 50 / 80 bytes base for an empty string, then the size of the string bytes for each duplication

prime estuary
#

It's only useful if you can expect that code is going to be doing comparisons or dict lookups with the string, and that there's going to be duplicates elsewhere.

warm breach
#

vs 4 bytes for a reference

prime estuary
#

But it costs a dict lookup to do the interning. So a hash calculation on top of that.

#

Wasteful in random string ops.

#

Better to leave that to the application, since that inows the uses of the string. A html parser for instance probably would want to intern the attribute names, while something parsing user accounts doesn't need to intern usernames...

warm breach
#

hm yeah fair

gray galleon
warm breach
gray galleon
#

again when are generator return value useful

#

coroutines?

warm breach
gray galleon
warm breach
#

looks worse imo

warm breach
#

it makes it easy to make event loops that send and return values while yielding during some events

warm breach
#

you can argue int() can return ValueError instead of raising as well

#

but python was just built around exceptions and making an one-off change like this would be odd and affect pretty much everyone

gray galleon
#

!e ```py
print(StopIteration.mro())

fallen slateBOT
#

@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.

[<class 'StopIteration'>, <class 'Exception'>, <class 'BaseException'>, <class 'object'>]
gray galleon
#

why does it subclass Exception

sour thistle
#

!e ```py
next(iter([]))

fallen slateBOT
#

@sour thistle :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 1, in <module>
003 | StopIteration
sour thistle
#

it is also used when a generator returns something, but even then it is still an exception

raven ridge
fallen slateBOT
#

@raven ridge :warning: Your 3.11 eval job has completed with return code 0.

[No output]
warm breach
#

why 995 new strings with only 6 unique ones

raven ridge
#

interning would work by creating the new string, then looking up a canonical instance of that new string

gray galleon
raven ridge
#

right. There are languages that intern every string, but yeah - that's how it'd be done.

#

interning strings is an optimization that trades off increased CPU usage for decreased memory usage

warm breach
#

a = a | b -> a |= b

swift imp
#

No I understand that but I don't get the context of it

raven ridge
#

I think it's just an example of an operation where you call next twice

rich cradle
sour thistle
#

!e ```py
import unicodedata as u
char = u.lookup('SCRIPT CAPITAL P')
print(char)
exec(f'{char} = 123')
exec(f'{char}abc = 456')
print(eval(char))
exec(f'print({char}abc)')

fallen slateBOT
#

@sour thistle :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | ℘
002 | 123
003 | 456
sour thistle
#

looks like you got it right?.. though I hope for the day I see that character being used in real code base never to come
edit; not sure, the hiragana/katakana marker is not working despite being in Other_ID_Start

violet tendon
#

if anyone has python interview questions, please do send them?

gray galleon
#

this is not where to ask it

cursive wharf
rich cradle
sour thistle
rich cradle
#

yeah, the codepoints. hm, that's interesting.

#

huh. the nfkc normalizations are also within that file (at least, whatever unicodedata.normalize("NFKC", ...) gives me).

#

oh, hmmmm

#

maybe that entire file isn't supported?

#

since that's part of XID_Continue according to these tables i'm looking at, but not XID_Start

#

and it's indeed valid as the second char in an ident

rich cradle
#

does cpython have a parser testsuite somewhere?

fallen slateBOT
#

:incoming_envelope: :ok_hand: applied mute to @north prawn until <t:1676516766:f> (10 minutes) (reason: duplicates rule: sent 4 duplicated messages in 10s).

The <@&831776746206265384> have been alerted for review.

warm breach
#

@chilaxan#3116 do you know if there are places where python unconditionally assumes that some PyMethods exist on specific types?

#

not sure if making those pointers null after allocation is safe

sacred yew
#

@pliant tusk your ping failed

pliant tusk
deep nova
#

I'm a bit confused about python's PEG parser

#

How exactly does backtracking/lookahead work?

#

In the case of lookahead, wouldn't one need to memoize the current parser state, capture a boolean representing whether or not the whatever is parsed properly, and then restore the original state?

#

As well, does python's parser use a streaming lexer, or does it lex the entire source and maintain a list of the tokens?

rose schooner
rose schooner
rose schooner
fallen slateBOT
#

Parser/pegen.c lines 333 to 340

int
_PyPegen_lookahead_with_int(int positive, Token *(func)(Parser *, int), Parser *p, int arg)
{
    int mark = p->mark;
    void *res = func(p, arg);
    p->mark = mark;
    return (res != NULL) == positive;
}```
rose schooner
#

actually it's not a "state"

neat delta
#

I ran into an issue with scientific notation a bit ago, and did some experimenting, finding out that 1eX, which is a float, does not equal 10**X (an int) for X > 22. why 22? Is it because 1e22 is slightly less than 2**74 (~73.1), and 1e23 is above (~76.4)? If so, why 74 and not 64?

#

!e

print(10**22, f'{1e22:f}')
print(10**23, f'{1e23:f}')
fallen slateBOT
#

@neat delta :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 10000000000000000000000 10000000000000000000000.000000
002 | 100000000000000000000000 99999999999999991611392.000000
neat delta
#

the question is why it happens - that's just a example for the curious

raven ridge
#

!e ```py
print(253)
print(2
53 + 1)
print(2**53 + 1.0)

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 9007199254740992
002 | 9007199254740993
003 | 9007199254740992.0
feral island
#

floats can only represent integer values exactly up to some limit

raven ridge
#

namely, 2**53

#

the 53 is because 53 of the 64 bits of a float are used to represent the significand

charred pilot
#

But 1e22 is much larger than 2**53. Shouldn't this error show up before 22?

raven ridge
#

after 2**53, floats can represent every other integer.
after 2**54, they can represent every 4th integer.
after 2**55, they can represent every 8th integer.
etc.

rose schooner
# neat delta the question is *why* it happens - that's just a example for the curious

https://github.com/python/cpython/blob/main/Objects/floatobject.c#L501-L503
if the float has the same amount of bits as the integer, it goes to this line

>>> from math import frexp, modf
>>> frexp(1e22)[1] == len(bin(10**22))-2 # 1e22 passes this check
True
>>> frexp(1e23)[1] == len(bin(10**23))-2 # 1e23 passes this check
True
>>> _, intpart = modf(1e22)
>>> intpart, int(intpart)
(1e+22, 10000000000000000000000)
>>> _, intpart = modf(1e23)
>>> intpart, int(intpart) # here's the problem
(1e+23, 99999999999999991611392) 
fallen slateBOT
#

Objects/floatobject.c lines 501 to 503

/* v and w have the same number of bits before the radix
 * point.  Construct two ints that have the same comparison
 * outcome.```
rose schooner
#

the _ variable here is also checked but it doesn't matter in this case since it'll just be 0.0

neat delta
#

i found a very grokkable answer: the significand for 10**23, which is 5**23, is 54 bits long, and thus cannot fit into a 64-bit float. 5**22 is only 52 bits

raven ridge
#

!e Then here's a tidy proof of why 23 is the cutoff point: ```py
import math

def modulus(power_of_two):
return 2**(max(power_of_two - 52, 0))

for power in range(16, 24):
val = 10power
prev_power_of_two = math.floor(math.log2(val))
difference = val - 2
prev_power_of_two
every_nth = modulus(prev_power_of_two)
print(f"{val=:<25d} {prev_power_of_two=} {difference=:<23d} {every_nth=:<8d} {difference % every_nth=}")

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | val=10000000000000000         prev_power_of_two=53 difference=992800745259008         every_nth=2        difference % every_nth=0
002 | val=100000000000000000        prev_power_of_two=56 difference=27942405962072064       every_nth=16       difference % every_nth=0
003 | val=1000000000000000000       prev_power_of_two=59 difference=423539247696576512      every_nth=128      difference % every_nth=0
004 | val=10000000000000000000      prev_power_of_two=63 difference=776627963145224192      every_nth=2048     difference % every_nth=0
005 | val=100000000000000000000     prev_power_of_two=66 difference=26213023705161793536    every_nth=16384    difference % every_nth=0
006 | val=1000000000000000000000    prev_power_of_two=69 difference=409704189641294348288   every_nth=131072   difference % every_nth=0
007 | val=10000000000000000000000   prev_power_of_two=73 difference=555267034260709572608   every_nth=2097152  difference % every_nth=0
008 | val=100000000000000000000000  prev_power_o
... (truncated - too long)

Full output: https://paste.pythondiscord.com/qifequfata.txt?noredirect

raven ridge
#

it's the first power of 10 where ```py
(10x - 2math.floor(math.log2(10x))) % 2(max(math.floor(math.log2(10**x)) - 52, 0)) != 0

raven ridge
fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | True
002 | True
deep nova
#

Which makes sense — I've looked at it every which way, and that's the only sensible option if you're backtracking

rich cradle
#

a separate lexing step also makes handling semantic whitespace rather nice

#

the parser, which is generally a decent bit more complex, can just deal with tokens like newline, indent, and dedent, rather than messing around with that in the parser + all the rest of the syntax

gray galleon
#

is parsing done in python or in C

feral island
#

otherwise who would parse the Python code for the parser

raven ridge
#

!otn a who parses the parsers

fallen slateBOT
#

:ok_hand: Added who-parses-the-parsers to the names list.

signal river
#

/imagine ja

#

hii

#

(╯°□°)╯︵ ┻━┻

gray galleon
#

if its python

dusk comet
#

Who would compile it? To compile it you first should parse it, but you dont have parser

elder blade
#

Well to be fair a parser is often able to parse itself

#

C language compilers are commonly written in C. You start by writing and using a different compiler, then once the initial compiler is done you can start compiling the compiler.

In the case of the very first C compiler, it was presumably written in Assembly or like Fortan

radiant garden
#

bootstrapping compilers are most useful for, well, compilers

gray galleon
radiant garden
#

you can't self host an entire interpreter since you'd still need some external VM at the core of it

#

and if that VM happens to be your CPU, then, well, congratulations on the compiler

#

the benefits for interpreted languages like Python aren't quite as major, the best you can get is the self-hosted bytecode compiler which is still fine but you'll need to use C to run it

gray galleon
radiant garden
#

Yes

gray galleon
#

but they are not smh

rose schooner
gray galleon
#

does that matter much for parsing and compiling?

gray galleon
#

using python: slow
using C: have to deal with desugared API

rose schooner
deep nova
#

Is there a reason that assignment expressions are bound more loosely than everything else? I think I'd like to try putting them at the bottom of the expression chain instead of at the top

#

You end up with situations like these...

#
if not (something := some_expression):
  ...
```or```py
if something := some_expression and something_else:
    ...
#

The second case is a bit ambiguous 😐

grave jolt
#

||To make walrus expressions even more cluttered||

raven ridge
#

It's the lowest in Java and C#

deep nova
#

Hehe

#

Alrighty then

rich cradle
#

does cpython have a parser test suite somewhere? i would like to make sure i'm doing this right, and don't trust the few tests i'm coming up with.

feral island
rich cradle
#

oh wow, that's really scatted. there's some badsyntax_*.py in there too. but thanks, that's a helpful start.

#

seems like this isn't exactly in a format where i can easily throw it into another parser, but it's plenty helpful nonetheless

grave jolt
#

The meaning of this would change depending on whether <tab> is 4 characters or 8

rich cradle
#

ugh, right, thanks.

grave jolt
#

tl;dr tab bad

grave jolt
#

??

#

no there's no default

#

what kind of default? who sets it?

quick snow
#

It's even stricter than it claims:

if foo():
<tab>    bar()
    <tab>bat()

Would be unambiguous, but is rejected

fallen slateBOT
#

Parser/tokenizer.c line 74

tok->tabsize = TABSIZE;```
`Parser/tokenizer.c` lines 36 to 37
```c
/* Don't ever change this -- it would break the portability of Python code */
#define TABSIZE 8```
grave jolt
#

o

quick snow
#

o_O why?

rose schooner
#

actually by the looks of the comment it's required to be 8 (at least in CPython)

grave jolt
#

TIL

#

well that's cursed

rich cradle
#

Tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix). The total number of spaces preceding the first non-blank character then determines the line’s indentation. Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation.

#

the thing i linked earlier says this

rose schooner
#

well probably not technically "bypass"

#

‫but it's not consistent all the time

deep nova
#

I've tried to implement it a few times

#

And I will never understand this

#

Personally, I think the easier solution would be to restrict it to only tabs, and then enforce that an indentation may only be exactly one tab

#

Though, I've heard it should actually be possible to do the indentation matching right in the parser. Either way, it's a no win scenario

grave jolt
#

I would restrict to only spaces

#

tab = poop

raven ridge
#

for one notable issue with that, it makes it very hard to copy-paste code off the internet

#

granted the arcane error message Make gives doesn't help, but if you make it impossible for people to copy-paste code out of a browser into their editor, even if your language gives a very nice error message about how lines can't start with leading spaces, it'll make things harder for your users.

gray galleon
#

tab bad space good

flat gazelle
#

A tabulator is for making tables, a space is for spacing.

sacred yew
#

space bad tab good

magic rune
#

Does anyone know why doing i & 0x1 is slower than i % 2? I thought bitwise operations should be faster. ( Btw it is faster when i tried it with numpy)

rose schooner
rose schooner
rose schooner
fallen slateBOT
#

@rose schooner You've already got a job running - please wait for it to finish!

#

@rose schooner :white_check_mark: Your 3.11 timeit job has completed with return code 0.

500 loops, best of 5: 574 usec per loop
rose schooner
#

!ti ```py
[n%2 for n in range(10000)]

fallen slateBOT
#

@rose schooner :white_check_mark: Your 3.11 timeit job has completed with return code 0.

500 loops, best of 5: 628 usec per loop
rose schooner
#

@magic rune i don't see it

magic rune
#

This is what i tried:

rose schooner
#

!ti ```py
for i in range(1_000): i & 0x1

fallen slateBOT
#

@rose schooner :white_check_mark: Your 3.11 timeit job has completed with return code 0.

5000 loops, best of 5: 48.6 usec per loop
rose schooner
#

!ti ```py
for i in range(1_000): i % 2

fallen slateBOT
#

@rose schooner :white_check_mark: Your 3.11 timeit job has completed with return code 0.

5000 loops, best of 5: 54.6 usec per loop
rose schooner
#

@magic rune still can't reproduce

#

3.11+ has improved in these areas i think

magic rune
rose schooner
magic rune
#

!ti ```py
for i in range(230, 230 + 1000): i % 2

fallen slateBOT
#

@magic rune :white_check_mark: Your 3.10 timeit job has completed with return code 0.

5000 loops, best of 5: 85.6 usec per loop
magic rune
#

!ti ```py
for i in range(230, 230 + 1000): i & 0x1

fallen slateBOT
#

@magic rune :white_check_mark: Your 3.10 timeit job has completed with return code 0.

5000 loops, best of 5: 72.1 usec per loop
magic rune
#

@rose schooner Yeah, the modulus operator seems to be a bit slower in 3.10. Thanks for the help!

unkempt rock
#

guys what does def do

boreal umbra
warm breach
#

is there a way to stop python from garbage collecting a ctypes.Structure instance

raven ridge
#

keep a reference to it? 😛

warm breach
#

so I'm allocating these pointers to PyMethods structs

#

how does it normally work in C pithink who keeps the reference to them

fallen slateBOT
#

Objects/longobject.c line 6247

static PyNumberMethods long_as_number = {```
warm breach
#

oh they're static? hm

#

I suppose I could just PyMem_Malloc the size of the struct instead of making a ctypes.Structure instance

#

wait no that would never get freed

raven ridge
#

they're static for static types, but they're dynamic for heap types. I'm not sure how that actually works for the heap types - I'm guessing that the type object itself holds pointers to them, and knows to free them when it is garbage collected

warm breach
#

where are python heap types even defined in c

#

PyType_New..?

fallen slateBOT
#

Objects/typeobject.c line 2757

PyHeapTypeObject *et = (PyHeapTypeObject *)type;```
warm breach
#

PyHeapTypeObject apparently

fallen slateBOT
#

Include/internal/pycore_object.h lines 357 to 360

// Access macro to the members which are floating "behind" the object
static inline PyMemberDef* _PyHeapType_GET_MEMBERS(PyHeapTypeObject *etype) {
    return (PyMemberDef*)((char*)etype + Py_TYPE(etype)->tp_basicsize);
}```
raven ridge
#

the members of a heap type are stored on the heap after the type

warm breach
#

hm

#

I guess heap types frees that itself?

#

I suppose I can make a WeakKeyDictionary with type keys and values of lists of PyMethod ctypes.Structure instances

#

so the PyMethods GC should come after the type...?

warm breach
fallen slateBOT
#

fishhook/fishhook.py line 89

def getdict(cls, E=type('',(),{'__eq__':lambda s,o:o})()):```
warm breach
#

that type thing gets the dict of a mapping proxy?

pliant tusk
#

Exploits a bug in mapping proxies to get the wrapped mapping

#

*a bug that has been explicitly marked will not fix

pliant tusk
warm breach
#

interesting 👀

warm breach
#

!e

import sys
from einspect.structs import PyTypeObject

v = vars(sys)["int_info"]
t = PyTypeObject(v)
print(t.tp_name)
print(t.tp_name.decode())
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | b'\x14\xca\x9a;'
002 | ʚ;
warm breach
#

uh

#

what is up with this

fallen slateBOT
#

Objects/longobject.c line 6328

static PyTypeObject Int_InfoType;```
warm breach
#

oh nvm type(v) is <class 'sys.int_info'> here

warm breach
fallen slateBOT
#

src/einspect/views/view_type.py line 142

PY_METHOD_STRUCTS.setdefault(object, []).append(base)```
pliant tusk
raven ridge
fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class 'object'>
pliant tusk
#

fishhook stores it as a default arg at import time

raven ridge
pliant tusk
#

i considered that an acceptable risk. I figure that libraries like fishhook and einspect are being used by users likely to modify things that are normally consistent, but that those changes would typically be facilitated by libs like fishhook or einspect (so any odd state would come after import)

raven ridge
#

🤷‍♂️ Python does't really have "import time" as a concept. Everything that happens in a Python program happens "at import time". Your program kicks off when the interpreter imports your code as __main__, and interpreter finalization starts as soon as __main__ is done being imported

pliant tusk
#

it does conceptually for imported libraries. things that happen in global state of the module I would argue could be considered import time execution, vs things that happen when a user uses the library which I would consider run/use time

#

for example, fishhook calls patch_object() at import time to patch the object type, vs the patches that happen passively to other types as the lib is used

raven ridge
pliant tusk
#

fair enough, I guess it is better thought of as order of execution

warm breach
#

guess not, it's just a normal name lookup

pliant tusk
#

Yea, only upper locals would get captured like that

warm breach
fallen slateBOT
#

src/einspect/views/view_type.py lines 175 to 176

if obj == object:
    _patch_object_base()```
warm breach
#

__eq__ is true for other PyObjects at the same address or python objects at the same address

warm breach
#

might be a bit too implicit I dunno

#

but I was doing a bunch of obj.address == address(object) before

#

so I just added it to eq

spring musk
#

hi>

rancid shadow
#

In search of a sco person help me in India

little marlin
#

Does anyone know why there is not more a expressive description for the error raised when trying to create a file with the same name as an existing directory (on Windows)? is there some limitation that prevents finding out that this is the issue or is it just deemed unimportant?

#

(it results in a PermissionError)

warm breach
little marlin
#

Hmm, I might try and find the relevant source code later, but it's been ages since I've read/written any C

flat gazelle
#

I am pretty sure python just builds the error message it gets from windows into an exception

warm breach
#

same thing in C# as well

#

opening a directory throws UnauthorizedAccessException

little marlin
#

ah, figures that it'd be a windows issue

warm breach
#

we could technically make windows file io explicitly check for directories first

#

not sure if that would have other problems

little marlin
#

well I feel like windows should at least know at the point where it's denying you permission to write to that address

#

maybe there's some security related reason to prevent enumeration?

warm breach
#

well the reason is you just can't open a directory in read mode

#

and how it prevents you is not giving you permission

#

you can modify other metadata attributes of a directory

little marlin
#

yeah but if I try to write a file to the same path as a directory it also just says no permission

feral island
little marlin
#

True

#

I guess it was a more reasonable assumption that it was a windows limitation something the people working on python didn't care to implement

#

I still wonder if there's an OS reason the windows error message isn't more expressive

warm breach
#

which isn't too bad as things go I guess

little marlin
#

since if you tried to open files with all sorts of names in a location you don't have access to you could enumerate the directory structure of the drive

warm breach
#

it would be strange for python io opens to do anything else beyond actually opening the file though

fallen slateBOT
#

Modules/_io/fileio.c lines 450 to 453

/* On Unix, open will succeed for directories.
   In Python, there should be no file objects referring to
   directories, so we need a check.  */
if (S_ISDIR(fdfstat.st_mode)) {```
warm breach
#

we seem to do the same thing for unix (albeit after opening)?

feral island
#

that doesn't do a new syscall does it?

#

or rather it does (it calls fstat on the fd a few lines up), but because it's called on the open fd, not a path, it's not vulnerable to race conditions where someone overwrites the path

warm breach
#

hm

#

does unix guarantee a file can't be deleted when in use?

#

iirc removals are scheduled after all fds are closed but wasn't sure if that was a standard or overridable

feral island
#

a file can be deleted but the fd remains valid

#

this is a common pitfall around disk usage: sometimes your disk shows up as full but the files you can see don't account for all the used disk space

wind helm
#

Hello, hopefully this is the right channel.

I've a question for something I never dealt with before. I've written a module with a single function that might be useful in several projects.
I want to externalise from the project I'm working on for the reason said above, so not to maintain code in several parts.

Shall I necessarily need to go for a package?

I now I can import a module from a different folder via the sys, but that still implies to know my local path, with doesn't sound very elegant.
What's the best approach?

tribal dirge
#

Hi

warm breach
wind helm
#

@warm breach so no need to go for a full package process basically

wind helm
#

Thanks. And when it comes to making updates, what will happen, I will just launch the `pip install --upgrade package'.

dusk comet
lunar harbor
gray galleon
#

which is probably why you can import in function bodies

rose schooner
#

if there's python code running, the interpreter is doing things

flat gazelle
#

well, you can init the python interpreter, then do some random operations without yielding control to python, then later actually use your interpreter

#

at which point there are sort of two distinct times, one is at interpreter setup time, another is at actually using the interpreter time

fallen slateBOT
#

:incoming_envelope: :ok_hand: applied mute to @proud elk until <t:1677197979:f> (10 minutes) (reason: chars rule: sent 4216 characters in 5s).

The <@&831776746206265384> have been alerted for review.

molten elk
# lunar harbor what <@451976922361102357> was trying to say is that import is code execution. ...

I think there is an import phase separate from execution. After all, execution is only a side effect of import.

The import phase would be either the process of sys.modules.__getitem__ or the process of working through the sys.path_hook/sys.meta_paths mechanism. Neither of these are particularly interesting in general.

I've litigated this point before in trying to characterise the distinction between “compile-time” and “run-time” in Python. Some of the same arguments may apply.

In practice, is there not a common and meaningful distinction between the execution of module-level code at something akin to a “compile-time” and execution of everything else at “run-time.” This is a meaningful distinction in practice, despite the former not really being “compile-time” (since the Python compiler historically did only, like, three interesting things.)

raven ridge
molten elk
molten elk
raven ridge
# molten elk It's incorrect to say that importing and programme execution are one and the sam...

Doesn't that presume the standard entry point?
There are other ways into PyEval_EvalFrameEx that do not require passing through the import mechanism.
Fair enough, and that's true for the example about embedding the interpreter into another program as well. But python foo.py and python -m foo, which are the overwhelmingly common ways to run Python code, spend 100% of their time underneath an import call.
It's incorrect to say that importing and programme execution are one and the same.
Which is exactly why I think it is correct to say that importing and program execution are one and the same, or at least so tightly coupled that it's not useful to distinguish between them.

#

After all, when we say “import,” we are generally referring to the mechanisms surrounding import, which perform execution only as a side-effect.
I think that's distinctly not what was being referred to in the comment that I replied to when I kicked this whole conversation off, also.

molten elk
# raven ridge > After all, when we say “import,” we are generally referring to the mechanisms ...

As I understood, the original comment was referring to early-binding something from builtins.

I suppose the implication of the original comment was that there was some separate “import-time” mechanism that occurred like a pre-runtime compilation step and, therefore, preëmpt other runtime changes.

But your comment was that the import mechanism is so closely tied to module execution, and this happens during normal execution, so there is no distinct interval of time during which only import activities occur.

raven ridge
#

right - my point was that nothing stops someone from having messed with builtins before importing the code that wants to early bind something from builtins.

molten elk
molten elk
warm breach
#

instead of just interacting with python code normally that works with any interactive session like repl / jupyter

molten elk
warm breach
#

sure yeah it might be more performant but binaries can be annoying too, non stable ABI and struct attributes regularly change between python versions

#

but if you're just exploring or debugging I've found being able to access internal attributes from live python to be useful

molten elk
#

I don’t know that it will be any faster to run, but it surely should be much easier to write!

molten elk
warm breach
grave jolt
#

I assume you can do a lot of cursed stuff with just patching bytecode of code obejcts

warm breach
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

((...),)
grave jolt
#

Oof

#

How

#

Wh

#

Actually, can you do something like this without using any imports, exec or eval?

warm breach
#

!e well there's this without any imports

def getdict(cls, x=type('',(),{'__eq__':lambda s,o:o})()):
    return cls.__dict__ == x

getdict(list)["wtf"] = "???"

print([].wtf)
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

???
warm breach
#

writing to the type dict of immutable types

#

supposedly this is not a bug (discovered by chilaxan)

prime estuary
#

It's a bug, but not something easy/performant to solve.

rose schooner
prime estuary
#

Problem is that mappingproxy's eq method forwards the call onto the internal dict, thus exposing it. But to solve that you'd need to do a whole new equality method, test it, etc...

warm breach
rose schooner
prime estuary
#

Yeah the problem is if both are proxy objects, it gets hairy and hard to solve without an expensive copy of either mapping,

warm breach
#

also mapping proxy requires GC due to potential recursive references

#

so regardless you can expose the linkage with get_referrers

prime estuary
#

Basically, if you to try hard enough you can get through protection, it's there to stop you accidentally doing the wrong thing.

warm breach
#

pretty much the only thing the interpreter actually prevents you doing is modifying types marked PYTYPE_IMMUTABLE

#

which isn't something you can elect from python either, so things like frozen dataclass are easily mutable with object.__setattr__

rose schooner
prime estuary
#

Well mapping-proxy works with any mapping, not just dicts. The problem is if both are proxies, you have to somehow implement the operator (including handling NotImplemented, the subclass exception, etc) without ever exposing either object to the other one.

grave jolt
warm breach
#

!e optionally you can do the same thing with a Structure memory view and access mapping directly

from einspect import view

view(list.__dict__).mapping["wtf"] = "???"

print([].wtf)
fallen slateBOT
#

@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.

???
warm breach
#

not sure if more or less cursed 😔

rose schooner
# prime estuary Well mapping-proxy works with any mapping, not just dicts. The problem is if bot...

i was thinking something like this but yeah the "any mapping" thing would be a problem ```c
static PyObject *
mappingproxy_richcompare(mappingproxyobject *v, PyObject w, int op)
{
/
v is already guaranteed a mappingproxy */
if (PyDict_CheckExact(w)) {
return PyObject_RichCompare(v->mapping, w, op);
}
if (PyObject_TypeCheck(w, &PyDictProxy_Type)) {
return PyObject_RichCompare(v->mapping, w->mapping, op);
}
return PyObject_RichCompare(v, w, op);
}

warm breach
#

though that's technically a behavior change

rose schooner
# warm breach might not be too crazy just to reimplement comparisons manually with GetItem

actually we might be able to do this for an "any mapping" implementation ```c
static PyObject *
mappingproxy_richcompare(mappingproxyobject *v, PyObject *w, int op)
{
if (PyDict_Check(w)) {
return PyDict_Type.tp_richcompare(v->mapping, w, op);
}
if (PyObject_TypeCheck(w, &PyDictProxy_Type)) {
return PyObject_RichCompare(v->mapping, w->mapping, op);
}
return PyObject_RichCompare(v, w, op);
}

#

just directly use dict_richcompare

warm breach
#

doesn't that still expose mapping

rose schooner
warm breach
#

also

#

isn't that last line an infinite recursion

#

it'll end up calling this slot

rose schooner
#

actually yes

#

it should be Py_RETURN_NOTIMPLEMENTED;

rose schooner
fallen slateBOT
#

@rose schooner :white_check_mark: Your 3.11 eval job has completed with return code 0.

True
rose schooner
#

like that

#
static PyObject *
mappingproxy_richcompare(mappingproxyobject *v, PyObject *w, int op)
{
    if (PyDict_Check(w)) {
        return PyDict_Type.tp_richcompare(v->mapping, w, op);
    }
    if (PyObject_TypeCheck(w, &PyDictProxy_Type)) {
        return PyObject_RichCompare(v->mapping, w->mapping, op);
    }
    Py_RETURN_NOTIMPLEMENTED;
}
``` also fixed
warm breach
#

guess it would be fine if accepting that comparisons will be false for any dict subtypes or other custom mappings

#

not sure how many code usages in the wild depend on mapping proxies passing through custom __eq__ for innocent reasons

#

that being said

prime estuary
#

It's documented and used with any mapping, so it has to support them.

warm breach
#

!d types.MappingProxyType

fallen slateBOT
#

class types.MappingProxyType(mapping)```
Read-only proxy of a mapping. It provides a dynamic view on the mapping’s entries, which means that when the mapping changes, the view reflects these changes.

New in version 3.3.

Changed in version 3.9: Updated to support the new union (`|`) operator from [**PEP 584**](https://peps.python.org/pep-0584/), which simply delegates to the underlying mapping.
warm breach
#

first line does kind of say "read only" here 😔

#

though I guess that just means getitem and not eq?

#

but that's kind of a weird distinction

rose schooner
warm breach
#

Mapping types already guarantee all the methods you need to compare

#

it'll just be slower since you need to call python functions

#

but otherwise just do everything dict does but with python calls

rose schooner
#

oh

#

general Mapping compare

rose schooner
#

unless there are mappings that aren't dict (subclasses)

pliant tusk
molten elk
pliant tusk
#

can you direct me to implementations

#

*ones that don't rely on sandboxing the interpreter, but only restrict python itself, since sandboxed interpreters can allow ctypes-esqu stuff

molten elk
# pliant tusk *ones that don't rely on sandboxing the interpreter, but only restrict python it...

The secured interpreters are not general purpose tools for public use.

They are unlikely to include parts of the standard library like ctypes or include major third parties libraries like pywin32 or numpy or cffi. They won't provide you with any way to install new packages. And, even if you could install packages, they'll probably require code-signing of shared objects and Python .py files. (They'll probably disable bytecode cache.) They'll probably disable -c and -m modes and may even force -S. They may have some additional hardening for sys.meta_path and sys.path_hooks. They might run code execution through anti-malware pattern matching. They'll probably use PEP-578 audit hooks to log everything the interpreter does.

pliant tusk
#

I am able to get full process memory r/w with no imports, no bytecode cache, and easily obfuscatable code. the only audit hooks that are called are compile and exec, I don't think the above would be enough, (still works with -S)

molten elk
pliant tusk
#

it does not use /proc (it works cross platform)

#

no open audit event

molten elk
pliant tusk
#

for example, with full process mem r/w you could disable audit hooks, and probably most if not all of the security harness that you are proposing, and then just use python to do your post exploitation, which would bypass all hypothetical code signing

pliant tusk
molten elk
pliant tusk
#

copy + paste ? I assumed that the purpose of a hypothetical hardened interpreter is running untrusted code

molten elk
pliant tusk
#

if the hypothetical hardened interpreter is by some way evaling untrusted code then that does not matter

#

if it isnt, then why bother using a hardened interpreter

molten elk
pliant tusk
#

whats the point then?

#

if it is closed-system -> closed-secure-interpreter -> closed-system then why bother using a secure interpreter

#

if all of your input is fully trusted then it doe not matter

pliant tusk
#

(most devs running across an implementation would also likely assume that)

molten elk
molten elk
pliant tusk
#

but i am having trouble wrapping my head around an environment where you want to fully prevent untrusted python code, but also do not have any method for running untrusted python code. It feels redundant

molten elk
pliant tusk
#

the only hypothetical system i can think of would involve the following weird cases: the ability to drop arbitrary files with arbitrary file endings -> the ability to perform arbitrary imports

molten elk
pliant tusk
#

would they not have much higher access with console access to the BMC (and if they don't then what is the console for)

#

And if your point is that Console access would not be able to run untrusted python code in this case, then why have it (the console) at all? Presumably, it would take input and then respond "refused to run untrusted code"

#

it would not be super useful, so if that is your threat model, just remove it

#

but if you need console access as the dev or trusted user, then how can you implement that sort of hardening without also hampering the console into unusability?

#

in that case, I would focus on securing console access, not what can be done in the console

molten elk
pliant tusk
#

I'm still confused about what a real world use case for that sort of tooling would be

molten elk
molten elk
molten elk
pliant tusk
#

I am still having trouble envisioning any of those environments where an attack would reach this point without a glaring vulnerability (which would likely be required for some feature, which these mitigations would disable)

molten elk
molten elk
pliant tusk
#

since Bare Metal Computing can mean either

molten elk
pliant tusk
#

ah google did not know that acronym at all

molten elk
#

The OpenBMC project is a Linux Foundation collaborative open-source project whose goal is to produce an open source implementation of the Baseboard Management Controllers (BMC) Firmware Stack. OpenBMC is a Linux distribution for BMCs meant to work across heterogeneous systems that include enterprise, high-performance computing (HPC), telecommuni...

pliant tusk
#

from what I am reading there, those are essentially embedded devices, and speed seems important. I cannot see a situation where you would want to use python

#

and if you did, you would likely use something like circuitpython which would still likely be too slow

molten elk
#

They aren't small machines.

pliant tusk
#

writing code in C or any other directly compilable language would still run vastly faster

#

also wouldnt clock cycle have more influence on speed then RAM?

flat gazelle
#

Huh, how come just wasting CPU cycles with useless computation isn't a threat?

#

Or RAM IG

molten elk
#

I suppose the things you would want to do with Python in these environments may not benefit from the relative efficiency of C over Python.

pliant tusk
#

that would make sense that the C code is faster, the python code is running thousands more lines of code and several dynamic allocations, the C code doesn't allocate any memory dynamically, but both are calling out to a C program systemctl

#

a better example would be like a hashing system or some algorithm implementation

molten elk
molten elk
pliant tusk
#

fair enough

pliant tusk
molten elk
pliant tusk
#

i would assume that a system like that would need to do different things on different threads simultaneously

molten elk
#

I think the idea of using Python in that particular environment is the same reason to run Linux on those devices which is the same reason to stick so much RAM and CPU into those devices. It saves a lot of money in human effort, despite being computationally wasteful.

pliant tusk
#

wouldn't that data collection require live monitoring of multiple systems?

#

imo, in reading about those systems, it seems like they would want to squeeze out efficiency

molten elk
# pliant tusk wouldn't that data collection require live monitoring of multiple systems?

So, that way I imagine it, like, you have a hypervisor host, which is the server that runs all the virtual machines for your clients. And that host is running on an actual physical machine running in some datacenter somewhere. And since you live in Palo Alto and the data center is in Nevada, you can't quite get up from your desk and walk over and power cycle the machine when it gets stuck, right? So you need an out-of-band controller connected to the physical hardware. So I would imagine, but I couldn't say for certain, that probably those devices are mostly idle.

pliant tusk
#

But what you're describing still sounds like a system that you would want to secure externally

molten elk
pliant tusk
#

Wouldn't ad hoc scripting be arbitrary python scripts?

molten elk
molten elk
pliant tusk
#

It just feels unnecessary to add to a system that should 100% be secured externally

#

And if it's secured externally, you can just put normal python on there. It doesn't need to be some secured version.

sour thicket
#

Hi, anyone know any cool projects to do? or any github repo with cool projects, something like that?

#

a beginner to intermediate level

warm breach
#

@feral island (3.10) do I just do PyObject *iter = _PyEval_GetBuiltinId(&PyId_iter); here, it works the same way as the new _PyEval_GetBuiltin?

static PyObject *
bytearrayiter_reduce(bytesiterobject *it, PyObject *Py_UNUSED(ignored))
{
<<<<<<< HEAD
    _Py_IDENTIFIER(iter);
    if (it->it_seq != NULL) {
        return Py_BuildValue("N(O)n", _PyEval_GetBuiltinId(&PyId_iter),
                             it->it_seq, it->it_index);
    } else {
        return Py_BuildValue("N(())", _PyEval_GetBuiltinId(&PyId_iter));
=======
    PyObject *iter = _PyEval_GetBuiltin(&_Py_ID(iter));

    /* _PyEval_GetBuiltin can invoke arbitrary code,
     * call must be before access of iterator pointers.
     * see issue #101765 */

    if (it->it_seq != NULL) {
        return Py_BuildValue("N(O)n", iter, it->it_seq, it->it_index);
    } else {
        return Py_BuildValue("N(())", iter);
>>>>>>> 54dfa14c5a (gh-101765: Fix SystemError / segmentation fault in iter `__reduce__` when internal access of `builtins.__dict__` exhausts the iterator (#101769))
    }
}
warm breach
#

yeah tests seem fine so far python/cpython#102229 python/cpython#102228

gray galleon
#

is python string indexing an O(n) operation (n is the index)
UTF-8 is a variable length encoding so it should be the case
or is it converted into a fixed length form

#

or does it create a lookup table for each character in the string

raven ridge
gray galleon
#

so 'ab' is represented like this?```
61 00 00 00 62 00 00 00

raven ridge
#

it uses the minimum integer size that will fit all of the codepoints for the array

#

so no, that's represented as 61 62

#

but if you add a codepoint with a value above 256, you'd get padding 0's on the 61 and 62

gray galleon
#

so adding a character outside bmp in a long ascii string is an expensive operation bc each character needs to be coerced from 8 bit to 32 bit

raven ridge
#

remember that Python strings are immutable, so every concatenation creates a new string, and needs to copy over every character from each of the original two strings.

lone sun
raven ridge
#

ooh, I didn't know there was a PEP for that. TIL.

subtle phoenix
#

Unicode geek here saying that this is so cool. The calculations for which length to use for storage are quick, and the savings are huge. I bow to whoever thought that up.

gray galleon
#

why is my timing results so inconsistent
sometimes the dict indexing version uses more time, someone it uses less

#

it is more often that tuple indexing wins, but still

cyan raven
#

I'm not sure this is a good channel to ask, but I'll try.
so I have forked/cloned the cpython repository and installed the python native development environment thru Visual Studio.
How can I get the new re-compiled python after the build(PcBuild/build.bat) because it's not changing - so I cant test it out?