#internals-and-peps
1 messages · Page 12 of 1
I mean, yeah, because we're currently in a situation where there's a trade-off between readability and optimization
I also just default to using f-strings for logging
THIIIIIIIIIIIIS
it's only premature optimization if you have to do extra work though or make compromises
if "the way" is both ergonomic and fast then it's not premature anything
I mean, I understand Agda using the literal letter λ for anonymous functions. But that's agda (and λ is actually good-looking and concise unlike lambda)
even haskell uses \x -> x
writing logger.info { f"Hello {user}" } seems pretty nice to me
I like the approach of swift and kotlin where, I imagine, they sat down on the very first day and said "okay, lambdas get the absolute best syntax in the language. Now that's done, let's look at everything else"
Hey, let's do it in the Enterprise Python Style, Extra Clean and Readable ™️ ```py
@logger.info
def log_hello_user_greeting() -> str:
"""
Log a greeting phrase mentioning the user's name, but
only if the logging verbosity is set to :logging.INFO or higher.
"""
user_to_be_greeted = user
greeting = "Hello"
return f"{greeting} {user_to_be_greeted}"
Objects/descrobject.c lines 1268 to 1269
/* This has no reason to be in this file except that adding new files is a
bit of a pain */```
that one makes a lot more sense than reversed in enumobject.c honestly
descrobject.c is for all the internal descriptors
can you make a wrapper object in python?
or is it only for slot methods?
int.__add__ is a "<slot wrapper ..." though
this thing is supposed to be a "<method wrapper ..."?
__add__ is a slot
!e print(type(1 .__add__))
@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.
<class 'method-wrapper'>
Improved
you hardcoded "Hello". Rejected.
Wdym hardcoded?! I can subclass this and implement my own greeting
that's just good design, provide sensible defaults!
smfh. open to extension, closed to modification
if we have learned anything about software engineering in the last 30 years
it's that inheritance is always the best
always
it solves all problems, once and for all
damn. multiple inheritance was always the true way to do things
yeah, it's open to extension!
You won't be able to modify this code because we need to book a meeting to make a change in a docstring. As that's potentially a breaking change for our documentation readers.
Isn't HelloUserGreeterLogger a log message factory?
There's obviously a metaclass mechanism
do you want people to soil themselves with touching a concrete constructor
in LoggerProtocol
😛
in a reddit thread recently people were levelling charges like this unironically at logging and it made me sad
"stinks of Java" 😦
# noqa
from __future__ import annotations
from logrossmeister.utils import MetaLoggerProtocolFactoryProtocolRepositoryProtocolFactory
LogUserReturnTypeT = TypeVar("LogUserReturnTypeT", bound=None)
LOG_USER_SLEEPING_TIME_CONSTANT_SECONDS = 0.217
async def log_user(user_to_be_greeted: UserProtocol | LogUserReturnTypeT) -> LogUserReturnTypeT:
if user_to_be_greeted is None:
return user_to_be_greeted
meta_logger_protocol_factory_protocol_repository =\
await MetaLoggerProtocolFactoryProtocolRepositoryProtocolFactory.get()
meta_logger_protocol_factory =\
await meta_logger_protocol_factory_protocol_repository.get()
await meta_logger = meta_logger_protocol_factory.get(HelloUserGreeterLogger)
async with meta_logger.lock():
await meta_logger.args.clear()
await meta_logger.args.append_("user_to_be_greeted")
await meta_logger.args.user_to_be_greeted = user_to_be_greeted
loggable = await meta_logger.create_loggable(mongodb=True, async_=True, django=True)
await loggable.log()
await meta_logger.args.clear()
await asyncio.sleep(LOG_USER_SLEEPING_TIME_CONSTANT_SECONDS)
yes
now as an exercise, write a test suite for this function
no class? 😔
I feel you growing powerful. Now strike me down
and your journey to the dark side shall be complete
(btw I'm mildly sorry for shitposting in this serious channel)
enterprise means never having to say you're sorry
also what's with your new icon thingie. is that a Rust reference
it's... complicated
weird I thought I asked about your icon not a relationship status. discord are you ok
we were talking about the syntactic macros PEP the other day - it seems that this would be a reasonable use for macros in Python, actually. People want:
a) To be able to sprinkle logging code in their application without slowing it down
b) To be able to use f-strings for forming their log messages
c) To write their log statements in a way that's succinct and readable
If logging was macro-based, we'd be able to accomplish all 3, by wrapping log call arguments in an object that formats lazily automatically, so that writing py info!(logger, f"Guess what: {expensive_call()}") gets translated automagically to ```py
logger.info(LazyLoggingFormatter(lambda: f"Guess what: {expensive_call()}"))
Or hell, we could just do it with a lazy f-string macro in the first place: ```py
import! lazyformat as lf
logger.info(lf!"Guess what: {expensive_call()}")
@warm breach You were asking for examples of places where the syntactic macros proposal might be useful, and I think this is a pretty reasonable one.
or you could just pass in a lambda 🙂
I agree that since lambdas have been botched too badly to be used for this, maybe macros could do instead
but does it justify macros, prob not (but other people will decide that)
it's not as easy as just passing in a lambda - you need to pass an object with a __format__ method...
you pass in a lambda that returns a string when evaluated
i'm not sure why you need a format method
maybe if you want to keep structured data around, I suppose?
pass it in to what?
!e ```py
import logging
logging.error(lambda: "hello")
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
ERROR:root:<function <lambda> at 0x7fe166d52b60>
that already does something. The change you're proposing would be backwards incompatible.
I wasn't seriously proposing it because lamdas in python are so ugly
but if you're proposing a new macro, then you can just as well propose new logger functions as well
I guess maybe it could be done if you subclassed Logger...
yeah. that could work.
I kinda just think none of these things are actually worth the price of admission though
less nice than the macro solution, I think, for being error prone and less succinct, but...
(for python, and in its current state)
well, possibly. I don't think there's been any real movement on that syntactic macros PEP in a long time. I'm not sure why it came up again the other day - maybe I'm wrong and it came up here because people were discussing it elsewhere?
if python adds macros then the universe will probably end in a Greenspun's tenth rule explosion though
I'm not really sure that it's worth the cost to add macros to Python, but I think this is an interesting example of a place where they'd allow us to do something that's quite ugly without them. Automagically wrapping some code up in a function to delay evaluation is something that macros could do, where the alternative is extra code pushed into every call site.
I am less down on macros since the last time we discussed this, insofar as I think they work well in Rust.
macros in a dynamically typed, non-lisp just fundamentally makes me sad because if I was willing to sacrifice static typing I could already have had so much nicer macros
would this even work with macros?
Well, you can basically extend this to all the things folks would like to do with lambdas, that they dont in python because they're just too ugly
you would get the ast of a string
you could probably also define the macro to define a local function and pass it in
you'd probably want to do that in fact so you never have any artificial "one line" restriction
so "macros as a hack around poor lambdas" is I suppose a legitimate selling point
I haven't thought too hard about it, but I don't see why not? You'd take the AST of that string, and you'd wrap it up in the AST of a function call to construct a type whose __format__ evaluates and returns that string
oh actually I think it would? but you'd need to provide your own ast I guess
!e since python ast would parse it without the field
import ast
print(ast.dump(ast.parse('"Guess what: {expensive_call()}"')))
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
Module(body=[Expr(value=Constant(value='Guess what: {expensive_call()}'))], type_ignores=[])
but yeah that'd be a nice use case if it works
I'm not sure whether the syntax would be nice with PEP 638 as it's proposed, but it's certainly something that macros could do in principle
transforming the AST for f"Guess what: {expensive_call()}" into the AST for LazyFormat(lambda: f"Guess what: {expensive_call()}") doesn't seem like a big lift, as far as AST rewriting goes
the argument not to do it for performance reasons is premature optimization, I think - if the string formatting will kill you, odds are that the calls to the logging methods are already too expensive.
But there's another reason, in addition to performance: interpolation failures are caught and reported, without the exception escaping from logging
you could just use surround your log with if __debug__ and that can get compiled away with -o
then use whatever f strings inside you want
but that's a bit verbose for a lot of inline stuff
in any case I think the function call to logging will take longer than any non-lazy f string
the performance advice might be more reasonable if it weren't for the fact that arguments get evaluated eagerly anyway - so logging.debug("result: %s", some_expensive_call) saves the cost of the interpolation, but not the cost of the expensive call
I think the argument here is that str(some_expensive_call) may be very expensive
but I don't really see that as too common
yeah, it's much more common that the call is expensive than that str() on the result of the call is.
well, I dunno. big dicts are slow to stringify, I guess.
!e
Speaking of templating, I have invented this hack to emulate jinja-style {% if %}s
class Yes:
def __format__(self, spec): return spec.strip()
class No:
def __format__(self, spec): return ""
template = """
thing.on("userLogin", (user) => {{
{alert_sentry:
sentry.send(`Login. ${{user.name}}`)}
{alert_log:
console.log(`Login. ${{user.name}}`)}
user.confirmLogin()
}})
"""
print(template.format(alert_sentry=Yes(), alert_log=No()))
@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 |
002 | thing.on("userLogin", (user) => {
003 | sentry.send(`Login. ${user.name}`)
004 |
005 | user.confirmLogin()
006 | })
the power is truly terrifying
He doesn't actually prove that Python is non-context-free. He just says, "That makes the language context sensitive, in my opinion." I'm not going to deny him his right to an opinion, but either it is or it isn't, and he hasn't given a sufficient argument one way or the other.
However, there is a perfectly simple mathematical proof of a toy version of this. If I remember correctly, a language of the form {s^3 : s in Σ^*}, where Σ is some alphabet, is not context-free (as long as the alphabet has more than one character). (I think the 3 in the exponent is right, but it might be something else?) This is a consequence of the pumping lemma for context-free languages. It follows that similar languages, like {stsusv : s, t, u, v in Σ^*}, are also not context-free. So you can't recognize that three consecutive lines begin with the same string of whitespace using a context-free grammar.
This doesn't mean that we're using the wrong tools. The type of context-sensitivity we need is quite simple. You just need to remember what the leading whitespace of the most recent line was and update it as necessary (pushing or popping; it's a stack). And sure, in principle stacks let you do interesting computations, but in practice we're really not doing much.
Just in case y'all are curious: https://hastebin.com/share/penerufeta.py
Hastebin is a free web-based pastebin service for storing and sharing text and code snippets with anyone. Get started now.
according to https://github.com/python/cpython/blob/3.10/Python/ceval.c#L2700-L2718, it would appear that END_ASYNC_FOR simply assumes the stack has a single exception triple and and the async iterable. But https://docs.python.org/3.10/library/dis.html#opcode-END_ASYNC_FOR says that it uses 7 stack elements, which just seems odd to me. Which is true?
Genuine question, what do you think are the top most essential peps to know to code collaborate with python? I'm thinking like pep8 and pep20 at least. Since they talk about how python coders think. Do you guys know some more "easter eggs" or must know peps drink and eat and breath all day? I always like pep8/20. Maybe one more to add to my bookmark if you have any
Sorry if my English is a bit broken, it's my 2nd language
Im still learning
Most PEPs aren't really relevant to normal coding, they are change proposals that were either accepted or rejected. You're usually better off reading the documentation at docs.python.org. PEP 8 and 20 are unusual in that regard
Most PEPs are relevant only if you are actually working on the development of the language or interested in language design
Ohh. Thank you! @feral island
the other popped stack elements must be from UNWIND_EXCEPT_HANDLER()
https://github.com/python/cpython/blob/3.10/Python/ceval.c#L1456-L1475
Ah, thanks
reading through PEP 638 (syntactic macros) i never rlly got the purpose of them
from what i'm understanding they change the AST and u can do stuff like DSLs and other cool stuff with it right?
but how is it even defined bc i'm not rlly understanding it in the PEP, is it just a function that changes the ast based on the ast node
which is what it does
at compiletime
ohh
It's gonna take me a few weeks to understand this grammar of python's
But I'm starting to go through it. I'm still a bit curious about the distinction between a compound statement and a simple statement
As best I can tell — it all comes down to the semicolon?
Compound statements contain (groups of) other statements; they affect or control the execution of those other statements in some way. In general, compound statements span multiple lines, although in simple incarnations a whole compound statement may be contained in one line.
https://docs.python.org/3/reference/compound_stmts.html
A simple statement is comprised within a single logical line. Several simple statements may occur on a single line separated by semicolons.
https://docs.python.org/3/reference/simple_stmts.html
So you can do things like from some_module import thing; thing.func()
Oh
This makes so much more sense then the other thing I read
Its a compound statement because it literal is compounded from multiple other statements
Another question, while I'm here
I think I understand that this group of rules:
single_input: NEWLINE | simple_stmts | compound_stmt NEWLINE;
file_input: (NEWLINE | stmt)* EOF;
eval_input: testlist NEWLINE* EOF;
Just so I totally understand...
file_input: (NEWLINE | stmt)* ENDMARKER (from the actual python grammar this time, not a knockoff version)
A file consists of any number of statements and newlines, followed by an end marker. How does this relate to the "flattening of simple statements"? Is it that I collect a sequence of semi-colon-delimited simpler statements in a single pass, but in the resulting AST they should not be grouped within simple-statement collections but rather directly as children of the main File node?
Ahhh, here we go: file[mod_ty]: a=[statements] ENDMARKER { _PyPegen_make_module(p, a) }
In this one, there is no or NEWLINE clause. Does this mean that every statement will have its own rules for consuming a newline at its termination?
I'm having a hard time understanding star_expressions
star expressions = star + expression
i think star have the same precedence as unary operators
no
it has different precedences depending on the context
actually maybe just one
precedence just below a bitwise OR expression
I've never seen a star used as a unary operator
I've seen it used in iterable unpacking, and that's what I assuming it was
python be like fuck consistency
!e ```py
x = "foo"
print(*x + "bar")
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
f o o b a r
so yep, parsed as *(x + "bar")
I have absolutely no idea what's happening
it pretty much has to be, right? Parsing it as (*x) + "bar" wouldn't make sense.
* can be used for two completely different purposes:
- multiplication, 2*3=6
- unpacking - for example, if
numbers = [1, 2, 3]then doingfoo(*numbers)is the same as doingfoo(1, 2, 3)
pretty much
So, in the grammar, is the star operator (wrt to unpacking) the lowest precedence operation in the chain and hence, referenced constantly as the go-to for any type of expression in that chain?
(Hoping that makes sense)
does anyone knw what this means
this keeps popping up randomly for me
after i close
I'm too tired to properly determine whenever or not that is technically correct 
🧠
if it is about python at all: open a post in #1035199133436354600 with more details
if it isn't: you can try asking in some offtopic channel, but I'd recommend looking for another more on topic server
O.O There's a more on topic channel than here to ask about python's grammar?
redd's questions is not about python grammar at all
assignment:
| NAME ':' expression ['=' annotated_rhs ]
...alternatives
annotated_rhs:
| yield_expr
| star_expressions
yield_expr:
| 'yield' 'from' expression
| 'yield' [star_expressions]
star_expressions:
| star_expression (',' star_expression )+ [',']
| star_expression ','
| star_expression
star_expression:
| '*' bitwise_or
| expression
bitwise_or:
| bitwise_or '|' bitwise_xor
| bitwise_xor
bitwise_xor:
| bitwise_xor '^' bitwise_and
| bitwise_and
bitwise_and:
| bitwise_and '&' shift_expr
| shift_expr
shift_expr:
| shift_expr '<<' sum
| shift_expr '>>' sum
| sum
sum:
| sum '+' term
| sum '-' term
| term
term:
| term '*' factor
| term '/' factor
| term '//' factor
| term '%' factor
| term '@' factor
| factor
...and so on, all the way down to atomics
I just want to make sure I'm interpreting this correctly. NAME ':' expression ['=' annotated_rhs ] translates to:
Name and type-annotation (which may be an expression) optionally followed by = some-kind-of-expression
So I can have an "assignment" that doesn't assign anything but rather just declares, such asa: int
And then there could be a right hand side to it, the value of which will be some kind of expression. The top-level expression rule in this case seems to be annotated_rhs which degrades into yield or starred, etc
that all sounds right
assignment:
('(' single_target ')'
| single_subscript_attribute_target) ':' expression ['=' annotated_rhs ]
So, the left-hand side could be a single-target between parentheses OR a single_subscript_attribute_target (which I assume to be something like a.b.c or a[6].someattr). But single_target degrades directly into single_subscript_attribute_target as well as into '(' single_target ')'
Isn't all that a) wildly confusing and b) pointless? Wouldn't just saying single_target not cover all of this?
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
4
I've never really see putting objects in parentheses for assignment
The only use-case I can think of might something like ```
a, (b, c), d = 1, (2, 3), 4
Is the point to allow for recursive assignment to nested targets?
Black reformats x,=4 as (x,) = 4
I've never really seen (x) = val, but I suppose it makes sense to allow it as a degenerate case of (x, y, z) = val
The grammar showcased here is totally different from the one I see on Github
Is it modified for better readability, while the "real" grammar is designed for efficiency or something?
no touchie the pistol operator!!
it's just like not but it feels "less natural" because usually * has a high precedence
I'm sure it'll become plain soon enough, but
Why are the top-level expressions in assignment yield_expression and starred_expression?
Why those two and not some others?
because they are the only expressions i think
starred_expressions allows for stuff like this py a = *b, *c which makes a tuple of the elements of b and c and assigns it to a
actually it's star_expressions, not starred_expression/starred_expressions
but you shouldn't be able to do```
a = *b
make it dereference b 
yield_expression basically just allows for top-level expression assignment to a yield py a = yield b which is useful when you wanna receive values from outside that uses .send()
nope
so whats the point of including star expressions in assignment
allows to unpack into a tuple
you can do a = *b,
That explains what those expressions are, but why are they the top level expressions (the ones which degrade into all others) of all the assignment statements?
but then thats in tuple syntax
actually nvm you can do that but it errors because the star isn't used anywhere
not toplevel assignment
which leads to this error with not much information ```pycon
a = *b
File "<stdin>", line 1
SyntaxError: can't use starred expression here
it's embedded in the rule
the actual tuple rule uses parentheses
by "isn't used anywhere" i actually mean "doesn't resolve to a single expression"
!e
class ptr:
def __init__(self, obj):
self._obj = obj
def __iter__(self):
yield self._obj
def __imul__(self, other):
self._obj = other
return self
px = ptr(37)
py = ptr(5)
pz = ptr(0)
(x, y) = (*px, *py)
pz *= x + y
print(*pz)
@grave jolt :white_check_mark: Your 3.11 eval job has completed with return code 0.
42
Because they're the ones with the lowest precedence
This is what I needed to know XD
Something needs to be at the top level, and the only thing that's special about that top level thing is that it needs to be able to match all the other things
I was hoping/suspecting as much. I just wanted to check to make sure there wasn't anything particularly special or complicated about those expression categories in particular
Personally, I'd favor a top-level-expression rule, or simply reserver the term expression for that purpose
I love python, I really do, but it's internals are some of the least semantic code I've seen in my life
Isn't annotated_rhs a reasonable name for the stuff on the right side of an =?
It's the right hand side argument of an annotated assignment
assignment_rhs or simply expression would have gotten the point across much better
annotated_rhs might have made more sense if the rule name was annotated_assignment. In starting in on trying to understand the rule (whose name was assignment) there was no clear indication that that particular part of the rule references annotated assignment specifically. In the context of assignment as a broader set of rules, using the term annotated_rhs was quite confusing. In fact, not use separate rules for the different types of assignment is confusing af
That particular set of rules looks more like something a machine would have spit out after having digested and optimized a much clearer, semantically focused equivalent
expression is already used by the grammar to mean something else, though, so you'd need to rename that too
I suspect it's not as easy as you'd imagine to come up with good names for each of the intermediate productions in a grammar
Oh, I'm sure its a total pain
In fact, the rule you posted above had an expression in it as well
The type annotation of an annotated assignment is matched by expression
That proves my point though
Why is annoated_rhs (itself an analogue for yield_expression | star_expressions) different from just expression? What about the former is different from the later, and what makes one the required rule for an annotation and but not an assignment right-hand?
They match different stuff
My point is that what they match and why that particular entry point to the expression-fission-chain is used there should be obvious.
People would say that both the thing after the : and the thing after the = are expressions. Trying to come up with different words for "ok, this is an expression, but it's not an expression that can start with yield or *" for every one of these hundreds of productions isn't easy. Tack on to that the fact that this grammar evolved over time - it's reasonably likely that at one point expression was at the top level, and then new changes to the grammar required a new production above expression
yield was added in, what, 2.5?
But the grammar is the literal definition of the language in so far as such a thing might exist. Confusion is not an option
And * unpacking for assignments was added even later than that, I think
Besides, we're smart people. We've all written oodles of essays and technical documents. Python is run by a steering committee and is peer reviewed out the wazoo I'm sure. I'm not sure that "its too hard to keep straight" is a reasonable rational for a confusing document
I didn't say that it's too hard to keep straight, I said that the names are essentially arbitrary by virtue of the fact that grammars force you to choose way too many names, and that evolution over time accounts for cruft
And Python doesn't have a standard, just a reference implementation. The grammar isn't the definition of the language, "what CPython does" is.
That's why I said "in so far as such a thing exists"
Anyway, I've got no particular loathing towards the document. I do think it could use a sprucing up, though
🤷♀️ go for it 🙂
I mean
In writing my own language's grammar, that's basically what I'm doing
So gimme a month or two, and I'll get back to you I suppose XD
CPython is open source. If you see ways to improve on the grammar, send PRs. If the core devs agree that they're improvements, they'll get merged.
Hmmmmmmm
Rewriting the grammar for readability does sound like a fun time...
And it would be a good excuse to learn the beast inside and out
While you're here — one quick question
Why is the grammar shown here in the "docs" different (very different) from the one in the actual grammar file?
Dunno
It might be simplified for readability, or it might be a place where docs didn't keep up with changes to implementation details
Or both?
Possibly
Cool. I just wanted to know if there was a method to the madness
You might check if the grammar in the docs matched the old, non PEG, grammar more closely
I love the peg grammar's syntax
Maybe not the actual content, but the syntax is quite graceful
can't really dereference it anymore though
every python variable access is already an implicit dereference
oh though maybe, * of ints dereferences the pyobject at that address?
super cursed
!e
from einspect.structs import PyObject
from einspect import impl
@impl(int)
def __iter__(self):
return iter((PyObject.from_address(self).into_object(),))
x = id("hello")
print(x)
print(*x)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 140716785662064
002 | hello
btw you can use lark
!pypi lark
hope this is the right channel to ask this: does python bind methods when they are accessed, or when the class is instantiated?
class Test:
def test(self): pass
Test().test # does this bind `test` to `Test()`, or is it already bound?
!e ```py
class Test:
def test(self): pass
print(Test.test)
print(Test().test)
print(Test().test)
@dusk comet :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | <function Test.test at 0x7f8330ea5c60>
002 | <bound method Test.test of <__main__.Test object at 0x7f8330c643d0>>
003 | <bound method Test.test of <__main__.Test object at 0x7f8330c64410>>
that doesn't answer my question, i guess what i'm trying to ask is does the LOAD_ATTR instruction bind the method to the receiver somehow, or does it load an already bound method?
i'm not familiar with cpython source code, so i thought maybe someone already familiar could answer that question or point me in the right direction
this does show that the addresses are different, but it could also be cloning the bound method for some reason
okay, thank you
All through the magic of the descriptor protocol, same way @property works.
it's actually LOAD_METHOD
it's specialized for loading an attribute that is to be called
That's only when it's immediately called, but it's not really too important - just an optimisation it should change semantics or be observable from your code.
Hey guys, I'm trying to understand a nuance in Python's Grammar's syntax
assignment[statement] ::=
| targets=( (t=target_list '=' {t})+ ) expr=expression {{ parse_assignment(targets, expr) }}
From my own grammar, trying to emulate python's.
The issue stems from needing to repeat (target_list '=') + as so, but, only wanting to actually collect the target_list node, ignoring the consumed =
So, if I'm interpreting this right...
(t=target_list '=' {t})+ says "each time I collect a target_list followed by an =, bind it to the name t and collect that (as opposed, well, I don't really know)"
All the ts get collected via while loop and put into a collect, which is then bound to the name targets?
just use lark 😉
Rebuilding python's parser and parser generator is exactly what I want to do. It's absolutely exhilarating, and, it'll look great on my github
And I'll have superpowers when I'm done
in seriousness
removing punctuations like that should be done at ast generation
the job of the parser
is to parse
what grammar are you looking at
the python grammar spec doesn't look like this
This is a grammar of my own design, but I think I'm sticking pretty close to Python's
I'm basing it more off of this, which looks like a modifier version designed for readability: https://docs.python.org/3/reference/simple_stmts.html#grammar-token-python-grammar-augtarget
But I'm also doing my best to be consistent with this where possible: https://github.com/python/cpython/blob/3.11/Grammar/python.gram
actually i still don't know what your problem is
Python's grammar is basically a programming language unto itself
It has variables and function calls. Its really quite beautiful
some_rule ::=
| a=sub_rule_1 sub_rule_2, b=sub_rule_3 {{ parse_some_rule(a, b) }}
``` This contains everything the parser generator needs to know to build the parser, including how to bind the results of calling some other rule to a name, and, how to pass the collected child nodes to the desired parsing function
But things are a bit weird when you're collecting one-or-more or zero-or-more of something
some_rule ::=
| a=( some_other_rule * ) {{ parse_some_rule(a) }}
``` I *assume* this basically says "collect zero or more of `some_other_rule` and place them in a collection. Bind that collection to the name `a`, and pass that collection on". Alright, easy enough
What about this?
some_rule ::=
| a=( ( some_other_rule '=' ) * ) {{ parse_some_rule(a) }}
Collect some rule zero or more times, as before. But you're also consuming a token. What, then, are the elements of a? Tuples of the form tuple[some_other_tule, Token]? Does the parser implicitly ignore the collected token? Maybe it will only place into the "result" of a repeated group items that are named ( (x=some_other_rule some_ignored_rule) * )
That's what I'm asking about. What I think it's doing is this: ( (x=some_other_rule some_ignored_rule {x}) *)
Basically, any parenthesized group can have a return statement {something} at the end. If so, the "result" or "contents" of the parenthesized group will be those items in the return statement.
I think.
I think I get it now. yield_expr and star_expressions are not necessarily on top of the expression fission chain, but they require special handling in the context of assignment (and maybe in a few other cases)
i am also doing that for my programming language currently put on a hiatus
https://en.wikipedia.org/wiki/Greenspun's_tenth_rule
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
🙂
I don't think I understand
well, it sounds to me like Python's grammar got a lisp inside of it
about this
I can certainly see why a yield expression requires special handling. It has "limited usage" in that it can only appear as part of an assignment expression, or, in a yield statement (which is probably just a statement wrapper around a yield expression — but I havn't looked)
that's not true
!e def f(): (yield x) + a((yield y))
@feral island :warning: Your 3.11 eval job has completed with return code 0.
[No output]
this is perfectly legal
I haven't looked at the formal grammar but a weirdness around yield is that it sometimes needs extra parentheses; e.g., f(await y) is allowed but f(yield y) is not
Might have something to do with yield without an argument being legal
My gut instinct is that that's an inconsistency. But I don't really know enough it so say anything
!e
def f():
g(yield 42069)
@grave jolt :x: Your 3.11 eval job has completed with return code 1.
001 | File "<string>", line 2
002 | g(yield 42069)
003 | ^^^^^
004 | SyntaxError: invalid syntax
yo wtf
this is something I run into a lot because of https://github.com/quora/asynq 🙂
oh I think I kinda get it?
Like, if you have x = yield - 5, it's not clear whether it's ((yield) - 5) or (yield (-5))
!e def f(): yield - 5 print(list(f()))
@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.
[-5]
Yeah, in this form it's alright in order to not break code written before yield expressions were a thing
!e def f(): print((yield - 5)) print(list(f()))
@feral island :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | None
002 | [-5]
I don't suppose there's a detail, thorough write up of python's grammar anywhere?
Not the just the grammar itself, but a description of how and why it works?
there is https://devguide.python.org/internals/parser/, but it's more about how the parser works, not so much why the grammar is the way it is
Awesome
This is a start, at least 🙂
Better question — is there any one here who understands the grammar quite deeply from whom I could buy or borrow an hour or two
Gang
Might want to start with a question or two instead of asking right out for an hour or two...
I have zillions of questions, which I've been asking one by one
But I feel like its going to take far longer, for all parties, to do it that way
maybe, it's just a huge ask for a place like this to essentially ask someone for a commitment.
generally you just ask questions and whoever feels like answering, answers. If they enjoy the convo and want to keep answering, they'll do it. If not, they stop answering whenever.
Obviously you can still ask for an hour or two but IME in places like this, usually writing something like that results in crickets. Best of luck though.
btw I think you changed it to "buy or borrow" - now I have to ask, what rate are you offering 😛
Whatever the standard rate for such a thing is, I suppose?
Whatever the purveyor of the knowledge thought was fair and appropriate
!rule 9 regardless
With respect to specific questipons
| a=('(' b=single_target ')' { b } | single_subscript_attribute_target) ':' b=expression c=['=' d=annotated_rhs { d }]
I don't think I understand this. Line 150 of the grammar
The assignment target can be either a single target in parentheses or a single_subscript_attribute_target. The latter does pretty much what you'd expect it to do. The former can either be another single_subscript_attribute_target or an identifier, or itself in parentheses
Looking closer, this is one of two rules for annotated assignment. It seems to allow for (a) : int = 1, ((a)) : int = 1, a.b.c : int = 1 and such. The seems very strange
why?
given that (a, b) = (b, a) is allowed, there's no particular reason to disallow (a) = (b)
I guess I'm a bit confused about parentheses wrapped around assignment targets in the first place
I understand it in the case of a (b, c), d = 1, some_iterable, 2
C and C++ both allow it. So does Java. What languages don't?
Well I don't know
But I've never seen such a thing before, and I guess I just don't understand the usefulness, except in the case I mentioned earlier
Zig allows it... Rust allows it... Nim allows it... I'm not sure that I've ever seen a language that doesn't.
That line seems to be doing one of two things: wrapping a single_target (a name, a subscripted-target of some kind, another single_target in parens) in parentheses; OR, directly supplying a subscripted-target w/o parens
Taken in conjunction with the rule above, which uses only a NAME token as the target, you're able to annotate either a name, a subscripted target, or either nested arbitrarily deeply within parens
Why its broken into two rules, I can't see
All told, as best I can tell, the two rules seem to specify the lhs-cases of single-target assignment with annotation
I'm not sure I understand the question you're asking here - what two rules are you talking about? single_target and single_subscript_attribute_target?
The first two cases here
# NOTE: annotated_rhs may start with 'yield'; yield_expr must start with 'yield'
assignment[stmt_ty]:
| a=NAME ':' b=expression c=['=' d=annotated_rhs { d }] {
CHECK_VERSION(
stmt_ty,
6,
"Variable annotation syntax is",
_PyAST_AnnAssign(CHECK(expr_ty, _PyPegen_set_expr_context(p, a, Store)), b, c, 1, EXTRA)
) }
| a=('(' b=single_target ')' { b }
| single_subscript_attribute_target) ':' b=expression c=['=' d=annotated_rhs { d }] {
CHECK_VERSION(stmt_ty, 6, "Variable annotations syntax is", _PyAST_AnnAssign(a, b, c, 0, EXTRA)) }
| a[asdl_expr_seq*]=(z=star_targets '=' { z })+ b=(yield_expr | star_expressions) !'=' tc=[TYPE_COMMENT] {
_PyAST_Assign(a, b, NEW_TYPE_COMMENT(p, tc), EXTRA) }
| a=single_target b=augassign ~ c=(yield_expr | star_expressions) {
_PyAST_AugAssign(a, b->kind, c, EXTRA) }
| invalid_assignment
by "first two cases" you mean this is 1:
| a=NAME ':' b=expression c=['=' d=annotated_rhs { d }] {
CHECK_VERSION(
stmt_ty,
6,
"Variable annotation syntax is",
_PyAST_AnnAssign(CHECK(expr_ty, _PyPegen_set_expr_context(p, a, Store)), b, c, 1, EXTRA)
) }
and this is 2:
| a=('(' b=single_target ')' { b }
| single_subscript_attribute_target) ':' b=expression c=['=' d=annotated_rhs { d }] {
CHECK_VERSION(stmt_ty, 6, "Variable annotations syntax is", _PyAST_AnnAssign(a, b, c, 0, EXTRA)) }
?
Yeah
and you're asking why those aren't merged into one case with a more complicated pattern for a?
Well, I was more just remarking in passing
It looks to me like this rule could have been expressed as ```
| a=(NAME | '(' b=single_target ')' | single_subscript_attribute_target)
':' b=expression c=['=' d=annotated_rhs {d}]
they seem to pass different values to the _PyAST_AnnAssign - do they result in different ASTs?
Oh, you're right. I have no idea what those arguments are for (I havn't gotten that far yet)
!e ```py
import ast
print(ast.dump(ast.parse("x: int = y")))
print(ast.dump(ast.parse("(x): int = y")))
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | Module(body=[AnnAssign(target=Name(id='x', ctx=Store()), annotation=Name(id='int', ctx=Load()), value=Name(id='y', ctx=Load()), simple=1)], type_ignores=[])
002 | Module(body=[AnnAssign(target=Name(id='x', ctx=Store()), annotation=Name(id='int', ctx=Load()), value=Name(id='y', ctx=Load()), simple=0)], type_ignores=[])
They seem to differ in their simple argument, which leads me to suspect different handling (likely on account of one target being just an identifier, the other being a compound object)
!d ast.AnnAssign
class ast.AnnAssign(target, annotation, value, simple)```
An assignment with a type annotation. `target` is a single node and can be a [`Name`](https://docs.python.org/3/library/ast.html#ast.Name "ast.Name"), a [`Attribute`](https://docs.python.org/3/library/ast.html#ast.Attribute "ast.Attribute") or a [`Subscript`](https://docs.python.org/3/library/ast.html#ast.Subscript "ast.Subscript"). `annotation` is the annotation, such as a [`Constant`](https://docs.python.org/3/library/ast.html#ast.Constant "ast.Constant") or [`Name`](https://docs.python.org/3/library/ast.html#ast.Name "ast.Name") node. `value` is a single optional node. `simple` is a boolean integer set to True for a [`Name`](https://docs.python.org/3/library/ast.html#ast.Name "ast.Name") node in `target` that do not appear in between parenthesis and are hence pure names and not expressions.
literally just a flag to tell you if it was or wasn't just a name.
Ahhhhhhh, there it is
This, btw, is going to be super useful. Thanks for showing me
yield_expr[expr_ty]:
| 'yield' 'from' a=expression { _PyAST_YieldFrom(a, EXTRA) }
| 'yield' a=[star_expressions] { _PyAST_Yield(a, EXTRA) }
So, you can yield from a singular expression
Or, you can yield many expressions, comma separated, some of which may be starred?
Is there any reason for this? The behaviour I'd have expected from yield from 1, 2, 3, 4 would be to automatically convert the multiple values into a tuple, and yield them all
yield from 1, 2, 3, 4 isn't valid syntax at all.
possibly because it's not obvious whether it should be parsed as (yield from 1), 2, 3, 4 or as yield from (1, 2, 3, 4)
yield from expressions?
Wait, so
If yield from 1, 2, 3, 4 is ambiguous because it could mean yield from (1), 1, 2, 3 or yield from (1, 2, 3, 4)
Why is yield 1, 2, 3, 4 not ambiguous? It could mean the same thing
primary[expr_ty]:
| a=primary '.' b=NAME { _PyAST_Attribute(a, b->v.Name.id, Load, EXTRA) }
| a=primary b=genexp { _PyAST_Call(a, CHECK(asdl_expr_seq*, (asdl_expr_seq*)_PyPegen_singleton_seq(p, b)), NULL, EXTRA) }
| a=primary '(' b=[arguments] ')' {
_PyAST_Call(a,
(b) ? ((expr_ty) b)->v.Call.args : NULL,
(b) ? ((expr_ty) b)->v.Call.keywords : NULL,
EXTRA) }
| a=primary '[' b=slices ']' { _PyAST_Subscript(a, b, Load, EXTRA) }
| atom
What is this? | a=primary b=genexp { _PyAST_Call(a, CHECK(asdl_expr_seq*,
It looks like a.b.c[i for i in range(10)] or some such
!e ```py
def a():
b = yield 1, 2, 3, 4
print(next(a()))
@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.
(1, 2, 3, 4)
makes a little more sense when thinking of it as await
!e That might be for ```py
print(i for i in range(10))
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
<generator object <genexpr> at 0x7f97157d81e0>
As a special case, Python lets you pass a generator expression to a callable using one set of parentheses instead of 2, as long as it's the only argument
or rather below ,
why does python raise an exception to indicate the end of iteration?
why don’t iterators have a .done() or .running() method to check the state of the iterator and design for loops (of any other iteration structure) around it?
raising and catching exceptions are expensive but this is just a function call plus a boolean check
what should happen if iterator with .done() is exhausted but you are still calling next(it)?
still raising an exception ig
just that normal iteration doesn’t have to involve exceptions
There should be one-- and preferably only one --obvious way to do it.
like this is how you would implement for ```py
it = iter(iterable)
while it.running():
x = next(it)
code here
at the C level I'm not sure the exception is actually expensive. e.g. listiter_next doesn't even set the StopIteration exception, it's implicit https://github.com/python/cpython/blob/main/Objects/listobject.c#L3235
Objects/listobject.c line 3235
listiter_next(_PyListIterObject *it)```
so the caller only has to check that the return value is not NULL, then call PyErr_Occurred (a few pointer comparisons)
looks simpler than original for ```py
it = iter(iterable)
while True:
try:
x = next(it)
# code here
except StopIteration:
break
but for the general case?
the general case is that 99% of the time the iterable is consumed in C code
instead of calling next() directly
ok
i think after .done() is introduced catching StopIteration isn’t very obvious anymore and people gravitate towards .done()
and now you have a new set of possible bugs where .done() is out of sync with whether __next__() throws
but why do you need to implement for anyways 
uh no
that is the pseudocode that cpython has to implement
according to this
hey sorry to bother you, anyone experimented with building a c extension on nixos?
i can't get it to find the header files and functions
inactive
A channel won't magically become active if you don't post anything to it 🙂
C extensions are indeed not the hottest thing these days
@sacred yew thanks, TIL #c-extensions is a thing on this Discord server.
but CPython is in C though
there's not really try, the iter function just returns NULL when it's exhausted
but essentially this wouldn't actually be valid safe code because the iterator could be exhausted after you check running(), perhaps by another thread
so realistically you would need next() to return StopIteration instead of raising perhaps. But that would complicate most nested code. And since 3.11 trys are quite a bit faster than conditionals
Python doesn't design their APIs around method name calls like this (all classes with a __iter__ and __next__ would now need these methods). Exceptions come quite naturally as a sentinel value here
are you sure raising exceptions are too expensive?
Yeah I would think an extra Python function call is much more expensive than just setting a variable and walking back a stack (calling the function would require working with the stack anyways)
Do any of y'all know how the parser's packrat-left-recursion hack works? I can't find any good non-academic articles on it
should python have a mutable string class?
!e
from sys import getsizeof
s = "hm🤔" * 100_000
print(getsizeof(s) // 1000, "KB")
ls = list(s)
items = sum(map(getsizeof, ls))
print((items + getsizeof(ls)) // 1000, "KB")
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 1200 KB
002 | 20400 KB
this list of characters of a 1.2MB str is 20.4MB
You can just decode to bytes then use a bytearray
that wouldn't handle utf-8 chars though 😔
you can write your own library to provide a memory-efficient mutable Unicode string class
if it becomes very widely useful it can be added to the stdlib. Personally I haven't often seen a need for it
actually seems like you can use https://docs.python.org/3.10/library/array.html with the u code?
character list and bytearray
!e
from string import ascii_lowercase
s = "🐍🤔" * 1000
ls = list(s)
print(len(set(map(id, ls))))
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
2000
is there a reason we can't intern non-ascii strings
it is not known at compile time
so it can’t intern
!e
s = eval("'abc123'")
s *= 1000
ls = [s[i:i+5] for i in range(995)]
print(len(set(ls)))
print(len(set(map(id, ls))))
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 6
002 | 995
hm curious, I guess we don't dynamically intern in this situation
This isn't interning, Python has singletons for ASCII characters similar to -1-255 ints. We could have more singletons, but that'd mean a massive array to put them in...
Interning is only done normally for strings that are valid identifiers.
And it's done during compile time
right yeah, but we could dynamically intern strings? as an optimization
Well it'd be a pessimisation most of the time.
you'd be looking at 50 / 80 bytes base for an empty string, then the size of the string bytes for each duplication
It's only useful if you can expect that code is going to be doing comparisons or dict lookups with the string, and that there's going to be duplicates elsewhere.
vs 4 bytes for a reference
But it costs a dict lookup to do the interning. So a hash calculation on top of that.
Wasteful in random string ops.
Better to leave that to the application, since that inows the uses of the string. A html parser for instance probably would want to intern the attribute names, while something parsing user accounts doesn't need to intern usernames...
hm yeah fair
is there any case where bubbling up StopIteration are useful?
it also carries the return value of a generator
it's also usually simpler than checking every return result of next (the alternative)
with suppress(StopIteration):
seq.append(next(it))
other |= next(it)
vs
temp = next(it)
if not isinstance(temp, StopIteration):
seq.append(temp)
temp2 = next(it)
if not isinstance(temp, StopIteration):
other |= temp2
and you can just return StopIteration(somevalue)?
yeah this is what that would cause ^
looks worse imo
essentially yeah
it makes it easy to make event loops that send and return values while yielding during some events
that's a more general thing regarding exceptions I imagine
you can argue int() can return ValueError instead of raising as well
but python was just built around exceptions and making an one-off change like this would be odd and affect pretty much everyone
!e ```py
print(StopIteration.mro())
@gray galleon :white_check_mark: Your 3.11 eval job has completed with return code 0.
[<class 'StopIteration'>, <class 'Exception'>, <class 'BaseException'>, <class 'object'>]
why does it subclass Exception
because it is an exception?
!e ```py
next(iter([]))
@sour thistle :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 1, in <module>
003 | StopIteration
it is also used when a generator returns something, but even then it is still an exception
!e Sure? ```py
import sys
sys.intern("🙂")
@raven ridge :warning: Your 3.11 eval job has completed with return code 0.
[No output]
I mean like for something like this @raven ridge
why 995 new strings with only 6 unique ones
interning would work by creating the new string, then looking up a canonical instance of that new string
how would that work
it would require hashing
which might be more expensive
right. There are languages that intern every string, but yeah - that's how it'd be done.
interning strings is an optimization that trades off increased CPU usage for decreased memory usage
What is this |= stuff
a = a | b -> a |= b
No I understand that but I don't get the context of it
I think it's just an example of an operation where you call next twice
i feel like i'm reading this wrong. https://docs.python.org/3/reference/lexical_analysis.html#identifiers
am i correct in my understanding that an identifier may start with any character in XID_Start (or an underscore), followed by any number of characters from XID_Continue?
!e ```py
import unicodedata as u
char = u.lookup('SCRIPT CAPITAL P')
print(char)
exec(f'{char} = 123')
exec(f'{char}abc = 456')
print(eval(char))
exec(f'print({char}abc)')
@sour thistle :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | ℘
002 | 123
003 | 456
looks like you got it right?.. though I hope for the day I see that character being used in real code base never to come
edit; not sure, the hiragana/katakana marker is not working despite being in Other_ID_Start
if anyone has python interview questions, please do send them?
this is not where to ask it
That the Weierstrass elliptic function up there?
thanks! but uh, what's the hiragana marker?
Other_ID_Start - explicit list of characters in PropList.txt to support backwards compatibility
https://www.unicode.org/Public/14.0.0/ucd/PropList.txt
309B..309C ; Other_ID_Start # Sk [2] KATAKANA-HIRAGANA VOICED SOUND MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
if you meant like what they are in the Japanese language, https://en.wikipedia.org/wiki/Dakuten_and_handakuten, but I'll assume that you just wanted to know the code points
yeah, the codepoints. hm, that's interesting.
huh. the nfkc normalizations are also within that file (at least, whatever unicodedata.normalize("NFKC", ...) gives me).
oh, hmmmm
maybe that entire file isn't supported?
since that's part of XID_Continue according to these tables i'm looking at, but not XID_Start
and it's indeed valid as the second char in an ident
does cpython have a parser testsuite somewhere?
:incoming_envelope: :ok_hand: applied mute to @north prawn until <t:1676516766:f> (10 minutes) (reason: duplicates rule: sent 4 duplicated messages in 10s).
The <@&831776746206265384> have been alerted for review.
@chilaxan#3116 do you know if there are places where python unconditionally assumes that some PyMethods exist on specific types?
not sure if making those pointers null after allocation is safe
@pliant tusk your ping failed
I think mapping proxy probably does
I'm a bit confused about python's PEG parser
How exactly does backtracking/lookahead work?
In the case of lookahead, wouldn't one need to memoize the current parser state, capture a boolean representing whether or not the whatever is parsed properly, and then restore the original state?
As well, does python's parser use a streaming lexer, or does it lex the entire source and maintain a list of the tokens?
i think it just looks ahead one token at a time
seems like a streaming lexer
so this makes it simple
https://github.com/python/cpython/blob/main/Parser/pegen.c#L333-L340
but yes they do store the current state, parse, and bring the state back again
Parser/pegen.c lines 333 to 340
int
_PyPegen_lookahead_with_int(int positive, Token *(func)(Parser *, int), Parser *p, int arg)
{
int mark = p->mark;
void *res = func(p, arg);
p->mark = mark;
return (res != NULL) == positive;
}```
actually it's not a "state"
I ran into an issue with scientific notation a bit ago, and did some experimenting, finding out that 1eX, which is a float, does not equal 10**X (an int) for X > 22. why 22? Is it because 1e22 is slightly less than 2**74 (~73.1), and 1e23 is above (~76.4)? If so, why 74 and not 64?
!e
print(10**22, f'{1e22:f}')
print(10**23, f'{1e23:f}')
@neat delta :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 10000000000000000000000 10000000000000000000000.000000
002 | 100000000000000000000000 99999999999999991611392.000000
i guess that's your answer
the question is why it happens - that's just a example for the curious
!e ```py
print(253)
print(253 + 1)
print(2**53 + 1.0)
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 9007199254740992
002 | 9007199254740993
003 | 9007199254740992.0
floats can only represent integer values exactly up to some limit
namely, 2**53
the 53 is because 53 of the 64 bits of a float are used to represent the significand
But 1e22 is much larger than 2**53. Shouldn't this error show up before 22?
after 2**53, floats can represent every other integer.
after 2**54, they can represent every 4th integer.
after 2**55, they can represent every 8th integer.
etc.
https://github.com/python/cpython/blob/main/Objects/floatobject.c#L501-L503
if the float has the same amount of bits as the integer, it goes to this line
>>> from math import frexp, modf
>>> frexp(1e22)[1] == len(bin(10**22))-2 # 1e22 passes this check
True
>>> frexp(1e23)[1] == len(bin(10**23))-2 # 1e23 passes this check
True
>>> _, intpart = modf(1e22)
>>> intpart, int(intpart)
(1e+22, 10000000000000000000000)
>>> _, intpart = modf(1e23)
>>> intpart, int(intpart) # here's the problem
(1e+23, 99999999999999991611392)
Objects/floatobject.c lines 501 to 503
/* v and w have the same number of bits before the radix
* point. Construct two ints that have the same comparison
* outcome.```
the _ variable here is also checked but it doesn't matter in this case since it'll just be 0.0
i found a very grokkable answer: the significand for 10**23, which is 5**23, is 54 bits long, and thus cannot fit into a 64-bit float. 5**22 is only 52 bits
If you're willing to accept
this
!e Then here's a tidy proof of why 23 is the cutoff point: ```py
import math
def modulus(power_of_two):
return 2**(max(power_of_two - 52, 0))
for power in range(16, 24):
val = 10power
prev_power_of_two = math.floor(math.log2(val))
difference = val - 2prev_power_of_two
every_nth = modulus(prev_power_of_two)
print(f"{val=:<25d} {prev_power_of_two=} {difference=:<23d} {every_nth=:<8d} {difference % every_nth=}")
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | val=10000000000000000 prev_power_of_two=53 difference=992800745259008 every_nth=2 difference % every_nth=0
002 | val=100000000000000000 prev_power_of_two=56 difference=27942405962072064 every_nth=16 difference % every_nth=0
003 | val=1000000000000000000 prev_power_of_two=59 difference=423539247696576512 every_nth=128 difference % every_nth=0
004 | val=10000000000000000000 prev_power_of_two=63 difference=776627963145224192 every_nth=2048 difference % every_nth=0
005 | val=100000000000000000000 prev_power_of_two=66 difference=26213023705161793536 every_nth=16384 difference % every_nth=0
006 | val=1000000000000000000000 prev_power_of_two=69 difference=409704189641294348288 every_nth=131072 difference % every_nth=0
007 | val=10000000000000000000000 prev_power_of_two=73 difference=555267034260709572608 every_nth=2097152 difference % every_nth=0
008 | val=100000000000000000000000 prev_power_o
... (truncated - too long)
Full output: https://paste.pythondiscord.com/qifequfata.txt?noredirect
it's the first power of 10 where ```py
(10x - 2math.floor(math.log2(10x))) % 2(max(math.floor(math.log2(10**x)) - 52, 0)) != 0
!e it's a bit trickier than that. That doesn't explain why ```py
print(100000000000000008388608 > 10**23)
print(100000000000000008388608 == 100000000000000008388608.0)
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | True
002 | True
After reading through Guido's blog posts, it looks as though the whole source code is lexed prior to parsing
Which makes sense — I've looked at it every which way, and that's the only sensible option if you're backtracking
a separate lexing step also makes handling semantic whitespace rather nice
the parser, which is generally a decent bit more complex, can just deal with tokens like newline, indent, and dedent, rather than messing around with that in the parser + all the rest of the syntax
is parsing done in python or in C
C
otherwise who would parse the Python code for the parser
!otn a who parses the parsers
:ok_hand: Added who-parses-the-parsers to the names list.
can't the parser code be precompiled?
if its python
Who would compile it? To compile it you first should parse it, but you dont have parser
Well to be fair a parser is often able to parse itself
C language compilers are commonly written in C. You start by writing and using a different compiler, then once the initial compiler is done you can start compiling the compiler.
In the case of the very first C compiler, it was presumably written in Assembly or like Fortan
bootstrapping compilers are most useful for, well, compilers
you compile it with the previous version
you can't self host an entire interpreter since you'd still need some external VM at the core of it
and if that VM happens to be your CPU, then, well, congratulations on the compiler
the benefits for interpreted languages like Python aren't quite as major, the best you can get is the self-hosted bytecode compiler which is still fine but you'll need to use C to run it
the parser and bytecode compiler are the most likely parts to self-host
because they produce and consume python objects (ast nodes)
Yes
but they are not smh
yes but C is faster
does that matter much for parsing and compiling?
yep
using python: slow
using C: have to deal with desugared API
too late to change cpython
despite there being criticisms about C it's still usable and the general rule for programming is that "if it works, it works"
Is there a reason that assignment expressions are bound more loosely than everything else? I think I'd like to try putting them at the bottom of the expression chain instead of at the top
You end up with situations like these...
if not (something := some_expression):
...
```or```py
if something := some_expression and something_else:
...
The second case is a bit ambiguous 😐
||To make walrus expressions even more cluttered||
Where do other languages place assignment expressions in the operator precedence hierarchy? C and C++ put it at nearly the very bottom as well (lower than or equal to everything but the comma operator)
It's the lowest in Java and C#
does cpython have a parser test suite somewhere? i would like to make sure i'm doing this right, and don't trust the few tests i'm coming up with.
it's scattered in Lib/test I think, e.g test_syntax.py
oh wow, that's really scatted. there's some badsyntax_*.py in there too. but thanks, that's a helpful start.
seems like this isn't exactly in a format where i can easily throw it into another parser, but it's plenty helpful nonetheless
the reference mentions this. what does this really mean?
Indentation is rejected as inconsistent if a source file mixes tabs and spaces in a way that makes the meaning dependent on the worth of a tab in spaces; a TabError is raised in that case.
https://docs.python.org/3/reference/lexical_analysis.html#indentation
like ```py
if foo():
if bar():
fizz()
<tab>buzz()
The meaning of this would change depending on whether <tab> is 4 characters or 8
ugh, right, thanks.
tl;dr tab bad
it's by default 8
It's even stricter than it claims:
if foo():
<tab> bar()
<tab>bat()
Would be unambiguous, but is rejected
Parser/tokenizer.c line 74
tok->tabsize = TABSIZE;```
`Parser/tokenizer.c` lines 36 to 37
```c
/* Don't ever change this -- it would break the portability of Python code */
#define TABSIZE 8```
o
o_O why?
actually by the looks of the comment it's required to be 8 (at least in CPython)
Tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix). The total number of spaces preceding the first non-blank character then determines the line’s indentation. Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation.
the thing i linked earlier says this
there are some cases you can somehow bypass this ```py
def foo():
<tab> if bar():
<tab><tab><tab>baz()
well probably not technically "bypass"
but it's not consistent all the time
I've read this document over and over
I've tried to implement it a few times
And I will never understand this
Personally, I think the easier solution would be to restrict it to only tabs, and then enforce that an indentation may only be exactly one tab
Though, I've heard it should actually be possible to do the indentation matching right in the parser. Either way, it's a no win scenario
the Make language requires tabs, and that absolutely confuses the hell out of people.
for one notable issue with that, it makes it very hard to copy-paste code off the internet
https://stackoverflow.com/questions/16931770/makefile4-missing-separator-stop
https://stackoverflow.com/questions/920413/make-error-missing-separator
https://stackoverflow.com/questions/14109724/makefile-missing-separator
https://stackoverflow.com/questions/23927212/makefile2-missing-separator-stop
etc...
granted the arcane error message Make gives doesn't help, but if you make it impossible for people to copy-paste code out of a browser into their editor, even if your language gives a very nice error message about how lines can't start with leading spaces, it'll make things harder for your users.
tab bad space good
A tabulator is for making tables, a space is for spacing.
space bad tab good
Does anyone know why doing i & 0x1 is slower than i % 2? I thought bitwise operations should be faster. ( Btw it is faster when i tried it with numpy)
how small were the numbers that you tested it on?
ok so i can't seem to reproduce this
!ti ```py
[n&1 for n in range(10000)]
@rose schooner You've already got a job running - please wait for it to finish!
@rose schooner :white_check_mark: Your 3.11 timeit job has completed with return code 0.
500 loops, best of 5: 574 usec per loop
!ti ```py
[n%2 for n in range(10000)]
@rose schooner :white_check_mark: Your 3.11 timeit job has completed with return code 0.
500 loops, best of 5: 628 usec per loop
@magic rune i don't see it
This is what i tried:
what python version?
!ti ```py
for i in range(1_000): i & 0x1
@rose schooner :white_check_mark: Your 3.11 timeit job has completed with return code 0.
5000 loops, best of 5: 48.6 usec per loop
!ti ```py
for i in range(1_000): i % 2
@rose schooner :white_check_mark: Your 3.11 timeit job has completed with return code 0.
5000 loops, best of 5: 54.6 usec per loop
Oh i see. I'm using 3.10. I thought maybe it had something to do with the fact that python ints aren't of fixed size compared to numpy's but it don't really know how they're implemented actually
i think they've added a code path to make it faster for ints with an absolute value less than 2**30 (default)
!ti ```py
for i in range(230, 230 + 1000): i % 2
@magic rune :white_check_mark: Your 3.10 timeit job has completed with return code 0.
5000 loops, best of 5: 85.6 usec per loop
!ti ```py
for i in range(230, 230 + 1000): i & 0x1
@magic rune :white_check_mark: Your 3.10 timeit job has completed with return code 0.
5000 loops, best of 5: 72.1 usec per loop
@rose schooner Yeah, the modulus operator seems to be a bit slower in 3.10. Thanks for the help!
guys what does def do
It's the keyword for defining a function
is there a way to stop python from garbage collecting a ctypes.Structure instance
keep a reference to it? 😛
so I'm allocating these pointers to PyMethods structs
how does it normally work in C
who keeps the reference to them
Objects/longobject.c line 6247
static PyNumberMethods long_as_number = {```
oh they're static? hm
I suppose I could just PyMem_Malloc the size of the struct instead of making a ctypes.Structure instance
wait no that would never get freed
they're static for static types, but they're dynamic for heap types. I'm not sure how that actually works for the heap types - I'm guessing that the type object itself holds pointers to them, and knows to free them when it is garbage collected
Objects/typeobject.c line 2757
PyHeapTypeObject *et = (PyHeapTypeObject *)type;```
PyHeapTypeObject apparently
ah, this seems to be the answer: https://github.com/python/cpython/blob/bdc93b8a3563b4a3adb25fa902c0c879ccf427f6/Include/internal/pycore_object.h#L357-L360
Include/internal/pycore_object.h lines 357 to 360
// Access macro to the members which are floating "behind" the object
static inline PyMemberDef* _PyHeapType_GET_MEMBERS(PyHeapTypeObject *etype) {
return (PyMemberDef*)((char*)etype + Py_TYPE(etype)->tp_basicsize);
}```
the members of a heap type are stored on the heap after the type
hm
I guess heap types frees that itself?
I suppose I can make a WeakKeyDictionary with type keys and values of lists of PyMethod ctypes.Structure instances
so the PyMethods GC should come after the type...?
@pliant tusk btw what is even going on here with this lambda 👀 https://github.com/chilaxan/fishhook/blob/master/fishhook/fishhook.py#L89
fishhook/fishhook.py line 89
def getdict(cls, E=type('',(),{'__eq__':lambda s,o:o})()):```
that type thing gets the dict of a mapping proxy?
Exploits a bug in mapping proxies to get the wrapped mapping
*a bug that has been explicitly marked will not fix
It's cause because the order of operations is proxy->wrapped.__eq__(E) which returns NotImplemented, then E.__eq__(proxy->wrapped) which returns whatever it is passed, in this case the wrapped mapping
interesting 👀
!e
import sys
from einspect.structs import PyTypeObject
v = vars(sys)["int_info"]
t = PyTypeObject(v)
print(t.tp_name)
print(t.tp_name.decode())
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | b'\x14\xca\x9a;'
002 | ʚ;
Objects/longobject.c line 6328
static PyTypeObject Int_InfoType;```
oh nvm type(v) is <class 'sys.int_info'> here
@pliant tusk finally have object allocations working now, with your mro patch https://github.com/ionite34/einspect/blob/main/src/einspect/views/view_type.py#L114-L158
also spent an hour wondering where random segfaults coming from before realizing I didn't keep a reference to the new struct and it got GC'd https://github.com/ionite34/einspect/blob/main/src/einspect/views/view_type.py#L142
src/einspect/views/view_type.py line 142
PY_METHOD_STRUCTS.setdefault(object, []).append(base)```
nice, I would recommend caching object for use inside of the custom __base__ descriptor to prevent a user from changing object in __builtins__ which would break your check
!e If you want a way to get object without __builtins__, perhaps ```py
print((1).class.bases[-1])
@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.
<class 'object'>
Oh I was referring to just caching object its fine to retrieve it at import time imo
fishhook stores it as a default arg at import time
its fine to retrieve it at import time imo
What if the user modifies__builtins__before importingfishhook?
i considered that an acceptable risk. I figure that libraries like fishhook and einspect are being used by users likely to modify things that are normally consistent, but that those changes would typically be facilitated by libs like fishhook or einspect (so any odd state would come after import)
🤷♂️ Python does't really have "import time" as a concept. Everything that happens in a Python program happens "at import time". Your program kicks off when the interpreter imports your code as __main__, and interpreter finalization starts as soon as __main__ is done being imported
it does conceptually for imported libraries. things that happen in global state of the module I would argue could be considered import time execution, vs things that happen when a user uses the library which I would consider run/use time
for example, fishhook calls patch_object() at import time to patch the object type, vs the patches that happen passively to other types as the lib is used
sure, but both of those are "at import time" with respect to the imports of other modules
fair enough, I guess it is better thought of as order of execution
huh, I thought object would get captured as a constant
guess not, it's just a normal name lookup
Yea, only upper locals would get captured like that
oh yeah I made mine not import time either, it runs on the first attempted allocation on object
src/einspect/views/view_type.py lines 175 to 176
if obj == object:
_patch_object_base()```
why not is?
obj is a PyTypeObject there
__eq__ is true for other PyObjects at the same address or python objects at the same address
ok
might be a bit too implicit I dunno
but I was doing a bunch of obj.address == address(object) before
so I just added it to eq
hi>
In search of a sco person help me in India
Does anyone know why there is not more a expressive description for the error raised when trying to create a file with the same name as an existing directory (on Windows)? is there some limitation that prevents finding out that this is the issue or is it just deemed unimportant?
(it results in a PermissionError)
I'm not sure windows tells you anything different when you either really lack permissions or the destination is a directory
Hmm, I might try and find the relevant source code later, but it's been ages since I've read/written any C
I am pretty sure python just builds the error message it gets from windows into an exception
ah, figures that it'd be a windows issue
we could technically make windows file io explicitly check for directories first
not sure if that would have other problems
well I feel like windows should at least know at the point where it's denying you permission to write to that address
maybe there's some security related reason to prevent enumeration?
well the reason is you just can't open a directory in read mode
and how it prevents you is not giving you permission
you can modify other metadata attributes of a directory
yeah but if I try to write a file to the same path as a directory it also just says no permission
that would be vulnerable to race conditions, no?
True
I guess it was a more reasonable assumption that it was a windows limitation something the people working on python didn't care to implement
I still wonder if there's an OS reason the windows error message isn't more expressive
yeah, though, it would only be a race condition that changes what type of 2 errors you get
which isn't too bad as things go I guess
since if you tried to open files with all sorts of names in a location you don't have access to you could enumerate the directory structure of the drive
it would be strange for python io opens to do anything else beyond actually opening the file though
Modules/_io/fileio.c lines 450 to 453
/* On Unix, open will succeed for directories.
In Python, there should be no file objects referring to
directories, so we need a check. */
if (S_ISDIR(fdfstat.st_mode)) {```
we seem to do the same thing for unix (albeit after opening)?
that doesn't do a new syscall does it?
or rather it does (it calls fstat on the fd a few lines up), but because it's called on the open fd, not a path, it's not vulnerable to race conditions where someone overwrites the path
hm
does unix guarantee a file can't be deleted when in use?
iirc removals are scheduled after all fds are closed but wasn't sure if that was a standard or overridable
a file can be deleted but the fd remains valid
this is a common pitfall around disk usage: sometimes your disk shows up as full but the files you can see don't account for all the used disk space
Hello, hopefully this is the right channel.
I've a question for something I never dealt with before. I've written a module with a single function that might be useful in several projects.
I want to externalise from the project I'm working on for the reason said above, so not to maintain code in several parts.
Shall I necessarily need to go for a package?
I now I can import a module from a different folder via the sys, but that still implies to know my local path, with doesn't sound very elegant.
What's the best approach?
Hi
sure, sounds fine, define a pyproject.toml and you can pip install -e . it
@warm breach so no need to go for a full package process basically
Thanks. And when it comes to making updates, what will happen, I will just launch the `pip install --upgrade package'.
It is not true if you embed interpreter in another app. You can initialize interpreter, do some stuff and continue to do other things. Interpreter still exists, but it is doing nothing and all imports are finished
what @raven ridge was trying to say is that import is code execution. there is no "import phase" in python that is distinct from execution.
which is probably why you can import in function bodies
wait what does this even mean
if there's python code running, the interpreter is doing things
well, you can init the python interpreter, then do some random operations without yielding control to python, then later actually use your interpreter
at which point there are sort of two distinct times, one is at interpreter setup time, another is at actually using the interpreter time
:incoming_envelope: :ok_hand: applied mute to @proud elk until <t:1677197979:f> (10 minutes) (reason: chars rule: sent 4216 characters in 5s).
The <@&831776746206265384> have been alerted for review.
I think there is an import phase separate from execution. After all, execution is only a side effect of import.
The import phase would be either the process of sys.modules.__getitem__ or the process of working through the sys.path_hook/sys.meta_paths mechanism. Neither of these are particularly interesting in general.
I've litigated this point before in trying to characterise the distinction between “compile-time” and “run-time” in Python. Some of the same arguments may apply.
In practice, is there not a common and meaningful distinction between the execution of module-level code at something akin to a “compile-time” and execution of everything else at “run-time.” This is a meaningful distinction in practice, despite the former not really being “compile-time” (since the Python compiler historically did only, like, three interesting things.)
I think there is an import phase separate from execution.
There isn't, though. Like I said, when you run a Python script, the Python interpreter imports whatever module or file you tell it to, and literally as soon as that module or file finishes being imported, the interpreter starts doing its teardown and getting ready to exit.
Doesn't that presume the standard entry point?
There are other ways into PyEval_EvalFrameEx that do not require passing through the import mechanism.
For example, python -c goes through pymain_run_command which goes straight to the “very high-level embedding” PyRun_SimpleStringFlags which I don't think ever passes through the import machinery itself.
It's incorrect to say that importing and programme execution are one and the same.
After all, when we say “import,” we are generally referring to the mechanisms surrounding import, which perform execution only as a side-effect.
Doesn't that presume the standard entry point?
There are other ways intoPyEval_EvalFrameExthat do not require passing through the import mechanism.
Fair enough, and that's true for the example about embedding the interpreter into another program as well. Butpython foo.pyandpython -m foo, which are the overwhelmingly common ways to run Python code, spend 100% of their time underneath animportcall.
It's incorrect to say that importing and programme execution are one and the same.
Which is exactly why I think it is correct to say that importing and program execution are one and the same, or at least so tightly coupled that it's not useful to distinguish between them.
After all, when we say “import,” we are generally referring to the mechanisms surrounding import, which perform execution only as a side-effect.
I think that's distinctly not what was being referred to in the comment that I replied to when I kicked this whole conversation off, also.
this one 
As I understood, the original comment was referring to early-binding something from builtins.
I suppose the implication of the original comment was that there was some separate “import-time” mechanism that occurred like a pre-runtime compilation step and, therefore, preëmpt other runtime changes.
But your comment was that the import mechanism is so closely tied to module execution, and this happens during normal execution, so there is no distinct interval of time during which only import activities occur.
right - my point was that nothing stops someone from having messed with builtins before importing the code that wants to early bind something from builtins.
In this sense, it's meaningful to deny the presence of an import time, since the majority of module execution, as you note, occurs nested under some hierarchy of PyImport_ImportModule* calls.
I get the thrill of writing these patching/introspection/interception libraries and the gimmick of doing this at runtime, but I don't quite see the advantage of not just writing some (much simpler) C code to patch things (which would probably be even easier if you also swap out the entry point.)
Are C extensions really easier? Don't you have to compile them and then import it
instead of just interacting with python code normally that works with any interactive session like repl / jupyter
You can distribute binary packages via PyPI, so the target machine doesn’t need a compiler tool chain.
You can avoid a lot of the contortions involved with, e.g., finding things, since (the current state of) the Python C-API exposes a lot of the symbols you want.
sure yeah it might be more performant but binaries can be annoying too, non stable ABI and struct attributes regularly change between python versions
but if you're just exploring or debugging I've found being able to access internal attributes from live python to be useful
I don’t know that it will be any faster to run, but it surely should be much easier to write!
The secured interpreter environments already don’t allow ctypes, cffi, pywin32, NumPy, &c. so only true pure-Python approaches (like bytecode or /proc/self) will likely work.
That minimizes the distinction between a pure Python library that uses ctypes for patching and one which just writes a C-extension module.
I agree they're not distinct security wise, yeah
I assume you can do a lot of cursed stuff with just patching bytecode of code obejcts
!e don't really even need to mess with bytecode 🥴
import gc
class Cursed:
def __length_hint__(self):
return 1
def __iter__(self):
for obj in gc.get_objects():
if isinstance(obj, tuple):
try:
0 in obj
except SystemError:
yield obj
break
self_tuple = tuple(Cursed())
print(self_tuple)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
((...),)
Oof
How
Wh
Actually, can you do something like this without using any imports, exec or eval?
!e well there's this without any imports
def getdict(cls, x=type('',(),{'__eq__':lambda s,o:o})()):
return cls.__dict__ == x
getdict(list)["wtf"] = "???"
print([].wtf)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
???
writing to the type dict of immutable types
supposedly this is not a bug (discovered by chilaxan)
It's a bug, but not something easy/performant to solve.
i would write a TIL about comparison mms and the .__bool__() mm but i still have a 1h cooldown
Problem is that mappingproxy's eq method forwards the call onto the internal dict, thus exposing it. But to solve that you'd need to do a whole new equality method, test it, etc...
yeah it's been closed as won't fix after some failed attempts
seems to have included all of the core devs
Yeah the problem is if both are proxy objects, it gets hairy and hard to solve without an expensive copy of either mapping,
also mapping proxy requires GC due to potential recursive references
so regardless you can expose the linkage with get_referrers
Basically, if you to try hard enough you can get through protection, it's there to stop you accidentally doing the wrong thing.
pretty much the only thing the interpreter actually prevents you doing is modifying types marked PYTYPE_IMMUTABLE
which isn't something you can elect from python either, so things like frozen dataclass are easily mutable with object.__setattr__
what about checking for PyDict_CheckExact() or PyMappingProxy_Check() for both operands otherwise delegating to just passing the proxy itself to the comparison
Well mapping-proxy works with any mapping, not just dicts. The problem is if both are proxies, you have to somehow implement the operator (including handling NotImplemented, the subclass exception, etc) without ever exposing either object to the other one.
ok I understand how this works, and this is 🅱️eyond cursed
!e optionally you can do the same thing with a Structure memory view and access mapping directly
from einspect import view
view(list.__dict__).mapping["wtf"] = "???"
print([].wtf)
@warm breach :white_check_mark: Your 3.11 eval job has completed with return code 0.
???
not sure if more or less cursed 😔
i was thinking something like this but yeah the "any mapping" thing would be a problem ```c
static PyObject *
mappingproxy_richcompare(mappingproxyobject *v, PyObject w, int op)
{
/ v is already guaranteed a mappingproxy */
if (PyDict_CheckExact(w)) {
return PyObject_RichCompare(v->mapping, w, op);
}
if (PyObject_TypeCheck(w, &PyDictProxy_Type)) {
return PyObject_RichCompare(v->mapping, w->mapping, op);
}
return PyObject_RichCompare(v, w, op);
}
might not be too crazy just to reimplement comparisons manually with GetItem
though that's technically a behavior change
actually we might be able to do this for an "any mapping" implementation ```c
static PyObject *
mappingproxy_richcompare(mappingproxyobject *v, PyObject *w, int op)
{
if (PyDict_Check(w)) {
return PyDict_Type.tp_richcompare(v->mapping, w, op);
}
if (PyObject_TypeCheck(w, &PyDictProxy_Type)) {
return PyObject_RichCompare(v->mapping, w->mapping, op);
}
return PyObject_RichCompare(v, w, op);
}
just directly use dict_richcompare
doesn't that still expose mapping
eh well this time it's like loading globals
it just directly does that from dict
!e ```py
class B(dict):
def eq(self, other):
print('here')
return False
print(dict.eq({1: 2}, B({1: 2})))
@rose schooner :white_check_mark: Your 3.11 eval job has completed with return code 0.
True
like that
static PyObject *
mappingproxy_richcompare(mappingproxyobject *v, PyObject *w, int op)
{
if (PyDict_Check(w)) {
return PyDict_Type.tp_richcompare(v->mapping, w, op);
}
if (PyObject_TypeCheck(w, &PyDictProxy_Type)) {
return PyObject_RichCompare(v->mapping, w->mapping, op);
}
Py_RETURN_NOTIMPLEMENTED;
}
``` also fixed
guess it would be fine if accepting that comparisons will be false for any dict subtypes or other custom mappings
not sure how many code usages in the wild depend on mapping proxies passing through custom __eq__ for innocent reasons
that being said
It's documented and used with any mapping, so it has to support them.
!d types.MappingProxyType
class types.MappingProxyType(mapping)```
Read-only proxy of a mapping. It provides a dynamic view on the mapping’s entries, which means that when the mapping changes, the view reflects these changes.
New in version 3.3.
Changed in version 3.9: Updated to support the new union (`|`) operator from [**PEP 584**](https://peps.python.org/pep-0584/), which simply delegates to the underlying mapping.
first line does kind of say "read only" here 😔
though I guess that just means getitem and not eq?
but that's kind of a weird distinction
implementations that don't subclass dict but need to compare have these 2 alternatives ```py
def eq(self: Self, other: MappingProxyType) -> bool:
return convert_to_dict_or_dict_subtype(self) == other
or
def eq(self: Self, other: MappingProxyType) -> bool:
for key, value in other.items():
# manually compare, early return False assumed here
...
return True
could just make a PyMapping_RichCompare
Mapping types already guarantee all the methods you need to compare
it'll just be slower since you need to call python functions
but otherwise just do everything dict does but with python calls
i don't get how that would work
oh
general Mapping compare
wouldn't that just be PyDict_Type.tp_richcompare
unless there are mappings that aren't dict (subclasses)
there are secured interpreter environments?
There are secured interpreter environments.
can you direct me to implementations
*ones that don't rely on sandboxing the interpreter, but only restrict python itself, since sandboxed interpreters can allow ctypes-esqu stuff
The secured interpreters are not general purpose tools for public use.
They are unlikely to include parts of the standard library like ctypes or include major third parties libraries like pywin32 or numpy or cffi. They won't provide you with any way to install new packages. And, even if you could install packages, they'll probably require code-signing of shared objects and Python .py files. (They'll probably disable bytecode cache.) They'll probably disable -c and -m modes and may even force -S. They may have some additional hardening for sys.meta_path and sys.path_hooks. They might run code execution through anti-malware pattern matching. They'll probably use PEP-578 audit hooks to log everything the interpreter does.
I am able to get full process memory r/w with no imports, no bytecode cache, and easily obfuscatable code. the only audit hooks that are called are compile and exec, I don't think the above would be enough, (still works with -S)
I am able to get full process memory r/w with no imports, no bytecode cache, and easily obfuscatable code.
Sure, me, too…
with open('/proc/self/mem', 'rw') as f:
pass
But most of this stuff is hardening. The code signing is the core of it.
you can do a lot with just the functions compiled into the binary, and python has lots of ways to control code flow that do not involve running your own assembly code.
for example, with full process mem r/w you could disable audit hooks, and probably most if not all of the security harness that you are proposing, and then just use python to do your post exploitation, which would bypass all hypothetical code signing
fyi, the reason i was asking here was because I wanted to see if real-world implementations fell into the pitfalls that I assume they will fall into
How do you get your payload into the running interpreter?
copy + paste ? I assumed that the purpose of a hypothetical hardened interpreter is running untrusted code
There's no interactive console.
if the hypothetical hardened interpreter is by some way evaling untrusted code then that does not matter
if it isnt, then why bother using a hardened interpreter
The whole point is to prevent execution of code that has not already been signed.
whats the point then?
if it is closed-system -> closed-secure-interpreter -> closed-system then why bother using a secure interpreter
if all of your input is fully trusted then it doe not matter
I figured the purpose of a secure interpreter would be to run untrusted python code
(most devs running across an implementation would also likely assume that)
No, securing environments that run untrusted code is usually handled differently, using lightweight VMs or containers or similar.
I am describing environments where you want to prevent the running of untrusted code.
These would be environments like BMCs or single application containers or hypervisor hosts.
yea i know that is the best way to run untrusted code
but i am having trouble wrapping my head around an environment where you want to fully prevent untrusted python code, but also do not have any method for running untrusted python code. It feels redundant
I think a good example is a BMC, which often have fully-featured software stacks these days. REST APIs and all sorts of bells and whistles.
my point is how would an attacker get to a point where they can try to run untrusted code if you do not explicitly make an endpoint for it
the only hypothetical system i can think of would involve the following weird cases: the ability to drop arbitrary files with arbitrary file endings -> the ability to perform arbitrary imports
By some means, they have managed to get console access into the BMC. You want to then prevent them from bringing a payload with them to further their access (and these payloads are often written in Python.) You can, of course, not deploy Python, but then you can't write BMC tooling that uses Python.
would they not have much higher access with console access to the BMC (and if they don't then what is the console for)
And if your point is that Console access would not be able to run untrusted python code in this case, then why have it (the console) at all? Presumably, it would take input and then respond "refused to run untrusted code"
it would not be super useful, so if that is your threat model, just remove it
but if you need console access as the dev or trusted user, then how can you implement that sort of hardening without also hampering the console into unusability?
in that case, I would focus on securing console access, not what can be done in the console
Well, there's a lot of people doing that part, too.
probably because it makes more sense then neutering the console
I'm still confused about what a real world use case for that sort of tooling would be
The audit hooks PEPs (551 and 578) go into a little bit of detail behind the motivation.
In truth, Guido asked me very similar questions when these PEPs were pending approval, but it turns out that there are environments and situations where this is useful as one among many other lines-of-defence.
It took the better part of an hour over lunch to convince him, too. The BMC or hypervisor host use-cases are probably pretty niche, but I think the single-application container use-case is broadly relevant.
I am still having trouble envisioning any of those environments where an attack would reach this point without a glaring vulnerability (which would likely be required for some feature, which these mitigations would disable)
Niche, as in, regular people don't usually care about this, but not as in “there are not billions of dollars of computing machinery doing this.”
yea i understood that
In terms of actual scope, BMC is probably the widest. After all, think about all the BMCs out in the cloud.
are you referring to a system with no OS or a non-virtual system?
since Bare Metal Computing can mean either
Board Management Controllers.
ah google did not know that acronym at all
The OpenBMC project is a Linux Foundation collaborative open-source project whose goal is to produce an open source implementation of the Baseboard Management Controllers (BMC) Firmware Stack. OpenBMC is a Linux distribution for BMCs meant to work across heterogeneous systems that include enterprise, high-performance computing (HPC), telecommuni...
from what I am reading there, those are essentially embedded devices, and speed seems important. I cannot see a situation where you would want to use python
and if you did, you would likely use something like circuitpython which would still likely be too slow
These things can easily have ≥1GiB of RAM.
They aren't small machines.
writing code in C or any other directly compilable language would still run vastly faster
also wouldnt clock cycle have more influence on speed then RAM?
Huh, how come just wasting CPU cycles with useless computation isn't a threat?
Or RAM IG
Hmm, I ran this with hyperfine, and I guess the C code is faster?
#incude <unistd.h>
int main(int argc, char *argv[]) {
char *args[] = {"/usr/bin/systemctl", "reboot", "-i"};
execv(args[0], args);
}
from subprocess import run
run('/usr/bin/systemctl reboot -i'.split())
I suppose the things you would want to do with Python in these environments may not benefit from the relative efficiency of C over Python.
that would make sense that the C code is faster, the python code is running thousands more lines of code and several dynamic allocations, the C code doesn't allocate any memory dynamically, but both are calling out to a C program systemctl
a better example would be like a hashing system or some algorithm implementation
I imagine that the kind of code that someone might want to use a scripting language like Python for on a BMC or a hypervisor host might not often need one to find cycles in a doubly-linked list. Maybe dynamic programming might not even come up at all.
I genuinely wouldn't know, because I've only ever done the work on the interpreter side of that. I've not actually written the code that runs within the interpreter.
fair enough
I would imagine that python would not even be considered for this kind of thing until subinterpreters and multithreaded python works better
What do you need subinterpreters for if you're, like, power-cycling or, like, calibrating optics?
i would assume that a system like that would need to do different things on different threads simultaneously
A BMC? BMCs are management controllers. Other than maybe data collection, I think they sit mostly idle.
I think the idea of using Python in that particular environment is the same reason to run Linux on those devices which is the same reason to stick so much RAM and CPU into those devices. It saves a lot of money in human effort, despite being computationally wasteful.
wouldn't that data collection require live monitoring of multiple systems?
imo, in reading about those systems, it seems like they would want to squeeze out efficiency
So, that way I imagine it, like, you have a hypervisor host, which is the server that runs all the virtual machines for your clients. And that host is running on an actual physical machine running in some datacenter somewhere. And since you live in Palo Alto and the data center is in Nevada, you can't quite get up from your desk and walk over and power cycle the machine when it gets stuck, right? So you need an out-of-band controller connected to the physical hardware. So I would imagine, but I couldn't say for certain, that probably those devices are mostly idle.
But what you're describing still sounds like a system that you would want to secure externally
And the tasks you might want to do on those machines are monitoring, management, and remediation tasks. So the machine is probably going to do a lot of work that might require moderately ad hoc scripting. It might be really helpful to be able to run Python scripts to do those tasks. But you probably don't want those devices to be able to run arbitrary Python scripts.
Wouldn't ad hoc scripting be arbitrary python scripts?
Not necessarily, because these environments may be more heterogeneous in practice than you expect. So the axis across which these would be ad hoc is not per task but per device or per deployment cycle.
Device of model XYZ with firmware 1.2.3 in data center ABC2 needs something slightly different.
It just feels unnecessary to add to a system that should 100% be secured externally
And if it's secured externally, you can just put normal python on there. It doesn't need to be some secured version.
Hi, anyone know any cool projects to do? or any github repo with cool projects, something like that?
a beginner to intermediate level
@feral island (3.10) do I just do PyObject *iter = _PyEval_GetBuiltinId(&PyId_iter); here, it works the same way as the new _PyEval_GetBuiltin?
static PyObject *
bytearrayiter_reduce(bytesiterobject *it, PyObject *Py_UNUSED(ignored))
{
<<<<<<< HEAD
_Py_IDENTIFIER(iter);
if (it->it_seq != NULL) {
return Py_BuildValue("N(O)n", _PyEval_GetBuiltinId(&PyId_iter),
it->it_seq, it->it_index);
} else {
return Py_BuildValue("N(())", _PyEval_GetBuiltinId(&PyId_iter));
=======
PyObject *iter = _PyEval_GetBuiltin(&_Py_ID(iter));
/* _PyEval_GetBuiltin can invoke arbitrary code,
* call must be before access of iterator pointers.
* see issue #101765 */
if (it->it_seq != NULL) {
return Py_BuildValue("N(O)n", iter, it->it_seq, it->it_index);
} else {
return Py_BuildValue("N(())", iter);
>>>>>>> 54dfa14c5a (gh-101765: Fix SystemError / segmentation fault in iter `__reduce__` when internal access of `builtins.__dict__` exhausts the iterator (#101769))
}
}
yes I think so
yeah tests seem fine so far python/cpython#102229 python/cpython#102228
[cpython] #102229 [3.10] gh-101765: Fix SystemError / segmentation fault in iter __reduce__ when internal access of builtins.__dict__ exhausts the iterator (GH-101769)
[cpython] #102228 [3.11] gh-101765: Fix SystemError / segmentation fault in iter __reduce__ when internal access of builtins.__dict__ exhausts the iterator (GH-101769)
is python string indexing an O(n) operation (n is the index)
UTF-8 is a variable length encoding so it should be the case
or is it converted into a fixed length form
or does it create a lookup table for each character in the string
no, it's constant time
no, internally the string holds an array of codepoints, and a slice just extracts the codepoints at the given offset(s)
so 'ab' is represented like this?```
61 00 00 00 62 00 00 00
it uses the minimum integer size that will fit all of the codepoints for the array
so no, that's represented as 61 62
but if you add a codepoint with a value above 256, you'd get padding 0's on the 61 and 62
so adding a character outside bmp in a long ascii string is an expensive operation bc each character needs to be coerced from 8 bit to 32 bit
hm - no, it's not expensive. String concatenation is O(N + M), where N and M are the lengths of the two strings being concatenated - regardless of whether the characters in the string are ASCII or outside the BMP
remember that Python strings are immutable, so every concatenation creates a new string, and needs to copy over every character from each of the original two strings.
See also https://peps.python.org/pep-0393/.
ooh, I didn't know there was a PEP for that. TIL.
Unicode geek here saying that this is so cool. The calculations for which length to use for storage are quick, and the savings are huge. I bow to whoever thought that up.
why is my timing results so inconsistent
sometimes the dict indexing version uses more time, someone it uses less
it is more often that tuple indexing wins, but still
I'm not sure this is a good channel to ask, but I'll try.
so I have forked/cloned the cpython repository and installed the python native development environment thru Visual Studio.
How can I get the new re-compiled python after the build(PcBuild/build.bat) because it's not changing - so I cant test it out?