#internals-and-peps
1 messages · Page 20 of 1
Since there are a bunch of things to clarify.
I suppose the only good option is to use a single "NAME" after the equal sign.
if it's worth anything, having x= instead of =x syntax would end up mirroring the syntax of f-strings a bit more, for consistency's sake
f(a=*, b=*) I prefer some placeholder on RHS over simply name=
i don't think you should call this directly
wait hmm
what am i even saying
so based on the format and using the x= syntax it should be ```diff
kwarg_or_starred[KeywordOrStarred*]:
| invalid_kwarg
- | a=NAME '=' b=expression {
-
_PyPegen_keyword_or_starred(p, CHECK(keyword_ty, _PyAST_keyword(a->v.Name.id, b, EXTRA)), 1) }
- | a=NAME '=' b=expression? {
-
| a=starred_expression { _PyPegen_keyword_or_starred(p, a, 0) }_PyPegen_keyword_or_starred(p, CHECK(keyword_ty, _PyAST_keyword(a->v.Name.id, b ? b : a, EXTRA)), 1) }
that's about it too
i'll implement that later
What do you mean you will implement it? 😅
Well, I talked with the guy who wants to propose the pep and asked him whether I can implement it so that I can get into CPython internals.
it's not really something that gets too involved in CPython internals
unless the understanding of the functions is included
Well obviously I can't stop you from implementing it but this is what I could find easy enough to give it a try and hopefully, I can improve during the process.
If it's easy for you, you probably can implement more complex/complicated things. 🤷♂️
mhm
good luck with the learning
Would you like to implement it then? I'm not sure what you mean.
like
i hope you can learn something about CPython internals if you try to implement it
it's fun
👍
well, I think it should be a syntactic sugar really.
f(a=a) -> f(a=)
that is it.
so the expression can be ignored -> NAME, = or smt.
I still don't quite understand how grammar actions are processed.
I'm talking about function calls between brackets such as _PyPegen_keyword_or_starred. Other than these functions are defined in the actions_helpers, I can't think of any other use cases.
@memoize
def action(self) -> Optional[str]:
# action: "{" ~ target_atoms "}"
mark = self._mark()
cut = False
if (
(literal := self.expect("{"))
and
(cut := True)
and
(target_atoms := self.target_atoms())
and
(literal_1 := self.expect("}"))
):
return target_atoms
self._reset(mark)
if cut: return None
return None
(this is how they are recognised by the pegen_generator)
a new AST node would mean extending the Python.asdl file, right?
Parser/Python.asdl may need changes to match the grammar. Then run make regen-ast to regenerate Include/internal/pycore_ast.h and Python/Python-ast.c.
struct {
expr_ty name;
asdl_typeparam_seq *typeparams;
expr_ty value;
} TypeAlias;
https://github.com/python/cpython/blob/main/Parser/Python.asdl#L28
like this.
Parser/Python.asdl line 28
| TypeAlias(expr name, type_param* type_params, expr value)```
also:
- is this pep going to be accepted? what is opinion of core devs?
- is it possible to implement it in pure python using untouched CPython3.12 (with ctypes to call already existing api)? (i dont need reliable and robust implementation, i just wanna experiment with that idea)
i found this: https://github.com/ericsnowcurrently/cpython/blob/pep-554/Modules/_interpretersmodule.c
(src of _interpreters module)
im not sure if i can compile it as 3rd party module without recompiling cpython itself. is it possible?
https://github.com/ericsnowcurrently/interpreters uses this interface, I believe. I think that's the PoC for what's proposed for the stdlib in 3.13
the return type tells what node it returns, right?
simple_stmts[asdl_stmt_seq*]
this node exists in pycore_ast.h hmm but I can't see it in the asdl file itself.
was asdl_typeparam_seq also created from the asdl file, or have you added it manually? I don't see any asdl changes there.
hello
i came across this pdf
but find it hard to understand
Create a function called replaceString that takes in three parameters word,
search and replaceWith. The function would replace all instances of the search
parameter with the replaceWith parameter. Example:
replaceString(‘Abdulqudus’, ‘u’, ‘v’) // Should return ‘Abdvlqvdvs’
replaceString(‘javascript’, ‘a’, ‘o’) // Should return ‘jovoscript’
how long do i have to wait for someone to reply lol
Maybe it’s cus you are posting in internals and peps? Just a thought
it's defined in pycore_ast.h which is a generated file
This is not the channel to get help in. Also I'm guessing that the pdf is in JavaScript, not Python, given the examples
from the asdl file?
if so, why cant I see the changes in the pr?
This behavior is unsurprising, but it's never occurred to me that this could happen.
In [8]: stuff = {'a': {'b': 1}}
In [9]: foo = {'c': 2, **stuff}
In [11]: foo['a']['d'] = 3
In [12]: foo
Out[12]: {'c': 2, 'a': {'b': 1, 'd': 3}}
In [13]: stuff
Out[13]: {'a': {'b': 1, 'd': 3}}
yeah, everything being mutable and everything being a shared reference is IMHO a pretty rough combination
I don't mind it, but I don't know any other languages nearly as well as I know Python, so for me it's the default assumption.
well, it can cause a lot of headaches very quickly. Here it's just a harmless little itnerpreter example but it's not hard to imagine how this can get nasty
one thing is like, whenever you write classes, in principle they can't really maintain their invariants unless you do defensive copying everywhere
Whenever I write functions that operate on json-like data, I typically start with something like this
type JsonType = dict[str, JsonType] | list[JsonType] | str | int | float | bool | None
def some_func(data: JsonType) -> JsonType:
data = deepcopy(data)
...
class Foo:
def __init__(self, x: List[int]):
self.x = [e for e in x if e != 0] # class invariant: self.x has no 0's
l = [1,2,3]
f = Foo(l)
l.append(0) # oops
Yeah, I mean you can sprinkle in defensive copies randomly but oviously a) lots of people don't do it, and b) it's a ton of waste that actually pretty quickly adds up
more modern languages tend to be designed with a lot more awareness around controlling mutation
even relatively simple things can make a pretty big difference
do you have an example that isn't from Rust?
sure
Kotlin for example, doesn't really have a very fancy system around controlling mutation per se
but one big difference is that most of the operations "by default" use read-only APIs, and also in practice those instances are often immutable
x = [f(e) for e in old_list] # x is mutable
val x = old_list.map { f(it) } # x is immutable
python has Sequence and MutableSequence but they're a lot more bolted on. People should really use Sequence and MutableSequence more but they often end up using List (or list as of whatever version) a lot more
and it's not the default in many places
Rust and C++ fwiw I wouldn't use as examples anyway; they take a very different appraoch but most importantly they dont' default to shared references. So some of these issues don't even exist to start with.
In this example in fact, you do not need to deep copy at all
All you need to do is
type ReadJsonType = Mapping[str, ReadJsonType] | Sequence[ReadJsonType] | str | int | float | bool | None
At least - in principle
if you do this, then if you mutate data then the type checker will already complain for you
because ReadJsonType doesn't have any mutating operations at all
But you can see even here, it's not your first instinct to use Sequence instead of List. In Kotlin, everyone uses List and MutableList, and people generally default to the former; List is read-only.
It was pretty easy to install, but now i am facing some problems:
- I have no idea what this means:
>>> i = interpreters.create()
Error in sitecustomize; set PYTHONVERBOSE for traceback:
ImportError: Could not find a console implementation for local python version
>>> i
Interpreter(id=1, isolated=True)
^ this happens at every subinterpreter creation (error is raised in subinterpreter, not in the main interpreter). I cant find where this error is raised. I can show you the output i got with sys.settrace(print).
PYTHONVERBOSE causes even more problems:
Python 3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import interpreters
>>> i = interpreters.create()
import _frozen_importlib # frozen
import _imp # builtin
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 1534, in _install
File "<frozen importlib._bootstrap>", line 1523, in _setup
File "<frozen importlib._bootstrap>", line 1489, in _builtin_from_name
File "<frozen importlib._bootstrap>", line 942, in _load_unlocked
File "<frozen importlib._bootstrap>", line 496, in _verbose_message
RecursionError: maximum recursion depth exceeded while calling a Python object
### ^^^ this is in subinterpreter, vvv this is in the main one
ValueError: _PyImport_InitCore: failed to initialize importlib
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Programs\Python\3.12\Lib\site-packages\interpreters_3_12-0.0.1.1-py3.12-win-amd64.egg\interpreters.py", line 25, in create
id = _interpreters.create(isolated=isolated)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: interpreter creation failed
>>> i
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'i' is not defined. Did you mean: 'id'?
>>>
- ctypes doesnt work in subinterpreters:
>>> i.run('''
... try:
... import ctypes
... except BaseException as exc:
... print(repr(exc))
... ''')
ImportError('module _ctypes does not support loading in subinterpreters')
- some weird things happen if subinterpreter raises exception:
>>> i.run('import ctypes')
RunFailedError: script raised an uncaught exception (unable to format exception type name)Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Programs\Python\3.12\Lib\site-packages\interpreters_3_12-0.0.1.1-py3.12-win-amd64.egg\interpreters.py", line 98, in run
_interpreters.run_string(self._id, src_str, channels)
MemoryError
- simple things seems to be working:
>>> i.run('print(1 + 2)')
3
The reason why I've been asking about this for a while is because there isn't any new addition in the ASDL file, yet the header files are re-generated.
in the PEP 695 implementation? that commit did change Python.asdl
I suppose action helpers can't return actual nodes, right?
it seems like the action helper is constructing the node which is returned by the rule.
hmm, I thought they were just helper functions and could not return nodes/objects.
You could also use persistent data structures
!pypi pyrsistent
Surprisingly, they're not that slow. Definitely better than defensive copies sprinkled everywhere
you can, it's just constantly extra work to deal with the fact that they're not the built in types
generally speaking, functions especially should be able to just get by with arguments that are simply things like Mapping and Sequence. They don't really have much reason to care if the argument is immutable or not; they just want to read some data from their arguments, and they want to promise the caller that they're not going to mutate it.
for classes this does not work as well, unfortunately
I think immutability gets a bit easier if you align your program better with the "data in, data out" perspective 🙂
but that's not always a thing
i think immutability by default is quite reasonably easy in principle
it's just not easy in python from a pure mechanical point of view
It's certainly doable but you have to work harder and swim more upstream
the built in data structures are used a lot as type annotations in libraries, and all the comprehensions are "built in" to the standard types
do you have any ideas why all of these happen?
any ideas on what this syntactic sugar should be called?
I mean I cant use syntactic sugar as a node name, or any other object's name.
for inspiration, maybe look up what TypeScript calls the features where { x } is equivalent to { x: x }
will do that, thank you for the idea.
I think I'll go with "shorthand arguments", similar terminology used in javascript(since es6) and Swift.
The shorthand argument names are automatically provided by Swift. The first argument can be referenced by $0, the second argument can be referenced by $1, and so on.
that sounds like a good name!
Does anybody know if CALL_FUNCTION_KW/CALL_KW are ever produced for syntax where the number of positional/keyword arguments cannot be statically analyzed?
hmm, which constructor/field represents the equal sign in the asdl file?
keyword = (identifier? arg, expr value)
I mean I can't really see: NAME, =, some expression here.
the equals sign isn't in the AST
it doesn't need to be
then I only need an identifier.
correct
i've been trying to modify Assign nodes of an ast of a python program to contain annotations, for that, I am replacing the Assign nodes with AnnAssign nodes (since they have annotations, currently ignoring multiple targets), the only problem is converting the modified ast back to source code, Im using astor.to_source and it probably does not support modification of Assign nodes to AnnAssign, am I using the wrong framework for conversion or is there something fundamentally wrong with my approach?
Here is the log when I try to convert the ast back to source https://pastebin.com/KVGLaS34
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
that's not a very informative error, if I were you I would look at the surrounding code in astor to see what's going on
im sorry ,here is the my code @feral island -
def visit_Assign(self, node: AST) -> Any:
# some filler code here
# ..
target = node.targets[0]
if target.id not in self.assigned_vars:
annAssign_node = ast.AnnAssign(
target = target,
annotation = annotation,
)
self.assigned_vars.add(target.id)
print(f"\nNew node: {ast.dump(annAssign_node)}")
ast.copy_location(annAssign_node, node)
ast.fix_missing_locations(annAssign_node)
return annAssign_node```
and here is the main -
def parse_and_assign_types(program):
tree = ast.parse(program)
pretty_print(tree)
modified_tree = assign_types(tree)
return modified_tree
if __name__ == "__main__":
with open("example.py", "r") as f:
program = f.read()
tree = parse_and_assign_types(program)
pretty_print(tree)
print(to_source(tree))```
in astor, the assert is happening in this function -
def set_precedence(value, *nodes):
"""Set the precedence (of the parent) into the children.
"""
if isinstance(value, AST):
value = get_op_precedence(value)
for node in nodes:
# print(f"\n Setting precedence for {ast.dump(node)}")
if isinstance(node, AST):
node._pp = value
elif isinstance(node, list):
set_precedence(value, *node)
else:
assert node is None, node```
and this happens while visiting AnnAssign which was originally ast.Assign node -
Visiting AnnAssign AnnAssign(target=Name(id='result', ctx=Store()), annotation='str')```
annotation should be an ast Node, not a string
Also, not sure what your use case is, but you may want to look into using libcst instead, which would allow you to transform code while preserving comments and other formatting
I want to assign randomly generated types to a untyped python program for some experiment
sure, I'll have a look at libcst, thanks alot
if you don't care about preserving comments or formatting working with the AST should be fine too
how do I convert ast to source in that case? ast.unparse doesnt seem to help
make a correct AST, e.g. with annotation=ast.Name(id="str")
oh like that, yeah makes sense, thanks alot, let me quickly plug that in
I made annotations to Name nodes and tried to do ast.unparse on that , its expecting some posonlyargs, here is the log @feral island - https://pastebin.com/0nDtqDFD
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
is this an issue with my python3 (3.10.12)?
you probably constructed an AST object without passing in all the attributes. The constructor for AST objects makes it really easy to construct broken objects
u mean this? ast.fix_missing_locations
No I mean this ```In [177]: arg=ast.arguments(x=3)
In [178]: arg.x
Out[178]: 3
In [179]: arg.posonlyargs
Traceback (most recent call last):
File "/main_instance_shell/jelle/venv3.9/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-179-4c22e773849e>", line 1, in <cell line: 1>
arg.posonlyargs
AttributeError: 'arguments' object has no attribute 'posonlyargs'
AttributeError Traceback (most recent call last)
Input In [179], in <cell line: 1>()
----> 1 arg.posonlyargs
AttributeError: 'arguments' object has no attribute 'posonlyargs'
I'm hoping to fix that in Python 3.13: https://github.com/python/cpython/pull/105880
is there a quick workaround for now?
yeah, pass in all the arguments when you construct the object
ok makes sense
like ast.arguments(posonlyargs=[]) and all the others
sure, that was really helpful, thanks alot
if you mean functions with arguments like *v/**k then it always produces CALL_FUNCTION_EX
Correct, but in cases where it does produce CALL_FUNCTION_EX, is the arg name tuple always produced by a LOAD_CONST?
not sure
??
!ot
Please read our off-topic etiquette before participating in conversations.
How's all the speedup/parallelization of cpython going right now? First it was just subinterpreters, then faster cpython (if I'm remembering the timeline correctly) and now there's also GIL-less. Are they all getting implemented on their own? GIL-less sounds like it'd make a good mess in what faster cpython is doing, and looking at some pyperformance stats the new 3.12 has a decent portion of tests where it's slower than 3.11
just curious, does anyone know what the time complexity of using [::-1] to invert a list in python is? Would believe it's O(n) but I have no idea how python handles arrays internally.
it is O(N) indeed
it does roughly this:
- allocates new list of size N
- iterates over old list and copies pointers to new list
memory allocation is O(N) operation and iteration through entire list is also O(N)
mhm, so this line:
py digit_tuple = tuple(split_digits(i))[::-1]
is rather innefficient and I should re-write my generator to return an ordered tuple ?
idk what you mean by "ordered tuple"
and linear time complexity doesn't mean that your code is slow
if you have some perfomance issues - use profiler to figure out the bottleneck, and then do optimizations
def split_digits(n:int, base=10):
"""Generator to split number into digits
Parameters:
--------------
n(int): integer to split into digits
base(int)=10 base to split integer to
Return Values:
--------------
function: function to split ints into digits
#! the digits this returns are in descending order !
"""
if n == 0:
yield 0
while n:
n, d = divmod(n, base)
yield d
Currently I'm using this code to get a integer split into a tuple of digits, however the tuple is in reverse order and I'm re-sorting it using [::-1].
i don't know about any plans to parallelize cpython
mhm, I know, working on hyper-optimizing my code and I'm trying to elliminate any unnecessary O(n) operations. This O(n) operation runs for every int in a big for loop.
you are doing O(logN) operations anyway
so copying tuple of size logN is not a big deal
(N is a number you are converting to digits)
if you want to get rid of unnecessary copying, consider doing this:
- preallocate list of needed size
- assign values to it directly
mhm, then perhaps that's not the biggest waste of compute time. Tryna 10x my execution time and I though tthis might have been a big contributor to time
this is more like #algos-and-data-structs conversation
how would I do that in the given example.
my bad, sorry
is there a way to get a list of all dunder-names that have corresponding slot in type object?
They're referring to this, I think. https://peps.python.org/pep-0703/
Python Enhancement Proposals (PEPs)
look at the slotdefs variable in Objects/typeobject.c
Yet another question about the python internals: Is it faster to use enumerate or to use a for loop and then just get the current element using i as the index ?
profile it
I've done some testing and there is almost no difference My testing is dumb
# Script 1
iterable = [i for i in range(10_000_000)]
target = [0 for i in range(10_000_000)]
def test():
for idx, i in enumerate(iterable):
target[idx] = idx + i
return target
``` ```py
# Script 2
iterable = [i for i in range(10_000_000)]
target = [0 for i in range(10_000_000)]
def test():
for i in range(len(iterable)):
target[i] = i + iterable[i]
return target
``` ```py
# Timing script
import timeit
print("Timing with enumerate:")
print(timeit.timeit("to_profile.test()", setup="import to_profile", number=10))
print("Timing with range(len(...)):")
print(timeit.timeit("to_profile2.test()", setup="import to_profile2", number=10))
range(len(...)) appears to be marginally faster quite a bit (see below) faster
# Output:
Timing with enumerate:
5.525693699994008
Timing with range(len(...)):
5.302737100006198
About 4% faster
Although the difference you got with your bench is larger
because you do less in the loop
yep
I was a bit wary with optimizations there, but I guess Python doesn't really optimize this away
Yeah, there isn't much to optimize here while preserving behaviour
Okay, now it's way faster ```
Timing with enumerate:
1.8064987999969162
Timing with range(len(...)):
1.1512878000066848
Doing it your way
That's dumb, I don't like this result 😦
Luckily both are fast enough to not make a huge difference most of the time
In 99% of programs, you could probably just replace the enumerate with this kind of range-len loop and get a performance boost
Is there a reason Python doesn't have this kind of optimization? I can't really think of a case that would change behaviour, if the pattern matching is right
Or does Python also aim to preserve behaviour of internal state
This optimization doesn't apply in all cases, as enumerate() works on any iterable and your version works only on objects that support len() and indexing
however, there might be an opportunity to optimize how enumerate() is executed
What if we restricted it to those cases?
Hm, I guess then it could fail, depending on how the object is implemented
But for inbuilt types, maybe
Good afternoon friends, could anyone help me validate some unit tests?
this is not the good channel.
I found a developed but un-documented feature in the PEG grammar described in pep 617. Would that warrant an update to the pep?
what is it?
'&&'
I can send a mock update of 617 that I made
If needed
I was looking at the whole peg grammar to try to do something funny and realized I didn't know what '&&' did and that it wasn't in pep 617
That means "if it can be parsed".
& is "Succeed if e can be parsed, without consuming any input." in pep 617
&& is nowhere in pep 617, and should probably be documented somewhere.
somewhere besides the depths of git blame
Want me to refind the commit that added it?
it's also not in https://devguide.python.org/internals/parser/
that's probably where it should be documented, the PEP is a historical document at this point
Makes sense.
How would that get updated?
thx!
Hi , what is cstack?
Python is a c binary and when we run python binary just like other c binary it complies into assembly instructions which run on a stack called cstack, and cstack is managed by os, like cpython process asked os that I need a stack to execute my assembly instructions am i correct? or is it a cpython internal thing managed by cpython.
But, I saw in the code base something like frame owned by cstack then I got confused, is my above reasoning correct?
As I understand the term, yeah, it's the call stack C runs on, in contrast to the python call stack.
it's provided by default to any process.
oh sorry, I confused it with the single version of it.
have you found anything?
&& basically "forces" a token to proceed
it produces the SyntaxError: expected '<char>' message
e.g. ```pycon
def a()
File "<stdin>", line 1
def a()
^
SyntaxError: expected ':'
how do you know? where did you find this? also, would it be possible to update the documentation?
while studying the parser source files
i've no clue about updating documentation though
https://github.com/python/cpython/blob/main/Parser/parser.c#L4518 corresponds to &&':'
https://github.com/python/cpython/blob/main/Parser/pegen.c#L396-L397 is the function used
Parser/parser.c line 4518
(_literal_2 = _PyPegen_expect_forced_token(p, 11, ":")) // forced_token=':'```
`Parser/pegen.c` lines 396 to 397
```c
Token *
_PyPegen_expect_forced_token(Parser *p, int type, const char* expected) {```
isnt there something about it in the pegen generator as well?
I will do it.
I was planning to do it but that’s fine. I’ll try to find the original commit for it for you
oh feel free to do it then, I'm occupied with this new shorthand keyword argument thingy.
here’s the commit anyway for anyone interested https://github.com/python/cpython/commit/58fb156edda1a0e924a38bfed494bd06cb09c9a3
def visit_Forced(self, node: Forced) -> FunctionCall:
call = self.generate_call(node.node)
if call.nodetype == NodeTypes.GENERIC_TOKEN:
val = ast.literal_eval(node.node.value)
assert val in self.exact_tokens, f"{node.value} is not a known literal"
type = self.exact_tokens[val]
return FunctionCall(
assigned_variable="_literal",
function=f"_PyPegen_expect_forced_token",
arguments=["p", type, f'"{val}"'],
nodetype=NodeTypes.GENERIC_TOKEN,
return_type="Token *",
comment=f"forced_token='{val}'",
)
else:
raise NotImplementedError(
f"Forced tokens don't work with {call.nodetype} tokens")
https://github.com/python/cpython/blob/main/Python/symtable.c#L1600
where are these "kinds" are coming from?
Python/symtable.c line 1600
symtable_visit_stmt(struct symtable *st, stmt_ty s)```
generated from the asdl file for the AST
hmm, you don't associate symtable_visit_type_param with anything(in the switch-case block).
but you only declare and define it.
do you know what macro?
I think I need to do the same.
VISIT_SEQ(st, type_param, s->v.FunctionDef.type_params);
so I need to check for the "kind" and call the macro, right?
So I need to expose it in Call_kind or should I just handle my node?
haven't looked exactly, but in general, the smaller change to make is probably the right one
is there any documentation on how symtable_add_def works?
I mean, I can go through the source code but, still.....
if it is a private thing, there is unlikely to be any documentation
the only chance is the developer guide
hi
Could someone help me explain what symtable_add_def is for?
https://github.com/python/cpython/blob/main/Python/symtable.c#L1357
and also why it's used when visiting nodes.
Python/symtable.c line 1357
symtable_add_def_helper(struct symtable *st, PyObject *name, int flag, struct _symtable_entry *ste,```
and also, why is ste decreased here?
https://github.com/python/cpython/blob/main/Python/symtable.c#L1315-L1325
Py_DECREF(ste);
Python/symtable.c lines 1315 to 1325
prev = st->st_cur;
/* bpo-37757: For now, disallow *all* assignment expressions in the
* outermost iterator expression of a comprehension, even those inside
* a nested comprehension or a lambda expression.
*/
if (prev) {
ste->ste_comp_iter_expr = prev->ste_comp_iter_expr;
}
/* The entry is owned by the stack. Borrow it for st_cur. */
Py_DECREF(ste);
st->st_cur = ste;```
the symtable's job is essentially to track everywhere a name is defined symtable_add_def records when a name is defined
what do you mean by defined, in what sense?
like how a = 3 defines the name a
but what does symtable_add_def doing here?
I suppose I have to set this up in my visit function.
https://github.com/python/cpython/blob/main/Python/symtable.c#L96-L137
it records the definition of the name in some internal data structure
that's pretty weird code. looks like ste_new saves a reference to ste in the st_blocks dictionary, and then returns another reference
the calling code there essentially gives up its reference to st_cur, so it has to decref its reference
yes, this is what I thought, but to me, it looks more natural to increase the reference there.
it generally does, but the way reference counts are handled can be very specific to individual function calls
basically, what I have to do here is set varkeywords to 1?
no, that's if a function takes **kwargs
probably all you need to do is make your AST node have a child node of type ast.Name with kind ast.Load
the scope should be handled as a normal keyword argument passed in, I suppose.
then visit_expr will take care of putting it in the symtable
so I don't need a custom visit node symtable_visit_shorthand_keyword_arg
this struct might be wrong.
struct _shorthand_keyword_arg {
identifier arg;
int lineno;
int col_offset;
int end_lineno;
int end_col_offset;
};
here, the arg is the name, right?
there are multiple ways to do it
what you have could work too, you just need to call symtable_add_def yourself in this case with that name
something like symtable_add_def(st, something.arg, USE, LOCATION(e))
actually that's probably better than putting an extra layer of Name node inside
My current approach is having a new field in Call_kind
Call(func=Name(id='foo', ctx=Load()), args=[Constant(value=1), Constant(value=2)], keywords=[]), shorthand_keyword_arg...)])
Then I just visit it: VISIT_SEQ(st, shorthand_keyword_arg, e->v.Call.shorthand_keyword_args);
and then record the name, enter the block, exit from the block.
if im not mistaken the block such as TypeVarBoundBlock is for only debugging purposes.
case TypeVarBoundBlock: blocktype = "TypeVarBoundBlock"; break;
typedef enum _block_type {
FunctionBlock, ClassBlock, ModuleBlock,
AnnotationBlock,
TypeVarBoundBlock, TypeAliasBlock, TypeParamBlock
} _Py_block_ty;
I'm not sure if I should extend it or not.
you definitely don't need a new kind of block
that's for new kinds of scopes
what should I pass in then?
pass in where?
symtable_enter_block
oh
should I only add it to the symtable?
how does it know which scope to add it?
the current one
umm can u guys help me when your done?
sorry
static int
symtable_visit_shorthand_keyword_arg(struct symtable *st, shorthand_keyword_arg_ty skwa)
{
if (++st->recursion_depth > st->recursion_limit) {
PyErr_SetString(PyExc_RecursionError,
"maximum recursion depth exceeded during compilation");
VISIT_QUIT(st, 0);
}
if (!symtable_add_def(st, skwa->arg, USE, LOCATION(skwa)))
VISIT_QUIT(st, 0); // return decreased recursion depth and 0
VISIT_QUIT(st, 1);
}
this is what I have.
makes sense, I don't think you need the recursion check here
oh okay, is there a practice when one should check the recursion limit?
haven't looked too hard but I feel we only need to check that depth when adding a new block
I hope I'm not breaking any rules by asking this, but it seems like this is the most recently active channel. Can someone please help me with my assignment lol... it's in python-help as "Harvard CS50 Assignment Help"
read the rules and also #python-discussion it has nothing to do with cpython internals.
now, I moved on to compile.c I think I just have to create a visit function(compiler_shorthand_keyword_args) and use the proper macros for bytecode emission.
wait I might be able to use this.
return compiler_call_helper(c, loc, 0,
e->v.Call.args,
e->v.Call.keywords);
not sure why I have to expose a custom node if I can just access it through v.
yeah somewhere around there. the compilation code for calls is quite complicated
what about this?
I don't even need a custom compiler_xx
sure
but I still need a custom one in symtable to add it to the table, right?
I wouldn't fret too much about whether or not to create a new function. You need code that works, and whether or not that requires a new function is something you decide as you look at the code
the difference between keyword and shorthand_keyword_arg is that the latter doesn't have a value
struct _keyword {
identifier arg;
expr_ty value;
int lineno;
int col_offset;
int end_lineno;
int end_col_offset;
};
struct _shorthand_keyword_arg {
identifier arg;
int lineno;
int col_offset;
int end_lineno;
int end_col_offset;
};
however, in the compiler process, I can't see that value would be used.
it's only using CALL_KW.
well it needs to compile the value at some point. for your case it just needs to put a LOAD_NAME there instead
could you elaborate?
do you know by any chance where the value is accessed?
(btw) call_kw is defined like this:
#define CALL_KW 57 57
because I have to evaluate the value somehow from the name.
to the compiler your shorthand_keyword("X") should be equivalent to keyword("X", Name(X))
and the Name(X) part compiles to a LOAD_NAME
(or LOAD_FAST, or various other options)
so is it going to load the variable which holds the value?
yes
but where can I define this, or is it just an example?
ADDOP_I(c, loc, CALL_KW, n + nelts + nkwa);
I think that represents a series of kwargs at once? Might need some work to figure out how exactly that works
Honestly it might be easier to desugar the shorthand kwargs into normal kwargs before the compiler
not sure there's a good place for that though, maybe the ast optimizer
well, I only have to figure out the value from the name, which is not a big deal??
read the rules please
and delete your post.
compiler_call_helper is just really complicated
but how would I even get the value, LOAD_NAME?
call compiler_nameop
what I was thinking about is to just predict the value from the name, and handle it as a kwarg.
that makes sense, but you will need to call compiler_nameop to load the name
hmm, but isn't that a compiler function as well? not sure how that would return the value of the name.
didn't you mention LOAD_NAME or smt similar?
it doesn't, it just emits the right instruction
but how would I get the value from there, or it doesn't even matter as long as the right instructions are emitted?
sorry if im asking stupid questions.
you don't need to get the value, you just need to get the right instruction emitted
0 0 RESUME 0
1 2 LOAD_NAME 0 (f)
4 PUSH_NULL
6 LOAD_NAME 1 (x)
8 LOAD_CONST 0 (('x',))
10 CALL_KW 1
12 RETURN_VALUE
your job here is to make f(x=) compile to this same set of instructions
oh okay, makes sense.
so would nameop be enough, or do I still have to emit a few stuff?
I can't really match these ADDOP_I instructions with the names(LOAD_NAME, PUSH_NULL) above.
yeah, you'd call compiler_nameop() and that would emit the LOAD_NAME
for normal keyword args, you'll probably see somewhere that it calls some function to visit the value of the keyword
in your case, you instead should call compiler_nameop() there directly with the name of the kwarg
but how do I know what instructions represent RETURN_VALUE | LOAD_CONST?
the RETURN_VALUE doesn't matter, that's not code you need to touch
LOAD_CONST is loading a tuple with the names of all keywords. currently it gets those just from actual keywords, you'll need to make sure it also picks up shorthand keywords when it builds that tuple
that happens in compiler_call_simple_kw_helper
so do I only have to touch LOAD_CONST and LOAD_NAME? CALL_KW is just a macro afaik
you only have to touch compiler code. These names are names of instructions, you need to change code that emits instructions, but not code that executes them
Do you feel you understand what bytecode means and how it works?
yes, I think so.
Since a few visit macros are used(in compiler_call), I might need to expose a custom compiler function for shorthand keyword arguments.
oh basically, the value is emitted by visiting the expression again.
static int
compiler_visit_keyword(struct compiler *c, keyword_ty k)
{
VISIT(c, expr, k->value);
return SUCCESS;
}
Which is going to end up in expr1.
I mean that is where its using k->value
@feral island Thank you very much for helping me out, I'll continue tomorrow. 😄
isn't it simpler to make parser "emit" x=x ast node instead of emitting x= ast node and then handling it in special way further?
how would you make the parser emit an x=x node, wouldn't that be a keyword argument?
yes, that would make the implementation simpler. It feels wrong though if the AST doesn't reflect this syntax at all
yes, it would be almost impossible to see the shorthand keyword argument from the asdl.
wondering how the grammar would look like, if it doesn't return a specific ast node.
With this suggestion it would simply parse x= the same as x=x. There would be no change to the ASDL at all
well, it would be easier, but strange for sure.
it sounds way easier.
i dont think it is a big change
ast anyway doesn't reflect source code, it ignores some information (comments, spaces, brackets), replaces __debug__ with True/False and removes docstrings (i believe there is an option to disable that in recent versions)
so don't showing the fact that it was a shorthand argument is not a big deal in my opinion
just emitting x=x from x= is incredibly simple
so not a very big change really
making y optional in x=y would be somewhat less easier but it does reflect in the AST
the needed files to change (excluding the auto-generated ones and docs) given the approach are as follows
x=x from x=:
Grammar/python.gram
y is optional in x=y:
Grammar/python.gram
Parser/Python.asdl
Python/ast.c
Python/ast_opt.c
Python/symtable.c
Python/compile.c
despite the number of files in the second approach the only "big" change is in Python/compile.c
I mean, if I continue this approach at least we can emit byte codes and it appears in low level layers as well. Not sure whether its an advantage but definitely more customisable at the end.
hmm, not sure. What advantages would we have from emitting low-level byte codes? Would that help in the future to optimise this feature?
What I have so far is this.
.gram
# shorthand keyword arguments
#--------------------------
shorthand_keyword_args[asdl_shorthand_keyword_arg_seq*]: t=shorthand_keyword_seq {
CHECK_VERSION(asdl_shorthand_keyword_arg_seq *, 13, "Shorthand keyword arguments are", t)
}
shorthand_keyword_seq[asdl_shorthand_keyword_arg_seq*]: a[asdl_shorthand_keyword_arg_seq*]=','.shorthand_keyword_arg+ [','] { a }
shorthand_keyword_arg[shorthand_keyword_arg_ty]:
| a=NAME '=' { _PyAST_shorthand_keyword_arg(a->v.Name.id) }
.asdl
| Call(expr func, expr* args, keyword* keywords, shorthand_keyword_arg* shorthand_keyword_args)
...
-- shorthand keyword arguments supplied to call
shorthand_keyword_arg = (identifier? arg)
attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)
.symtable
case Call_kind:
...
VISIT_SEQ(st, shorthand_keyword_arg, e->v.Call.shorthand_keyword_args);
...
static int
symtable_visit_shorthand_keyword_arg(struct symtable *st, shorthand_keyword_arg_ty skwa)
{
if (!symtable_add_def(st, skwa->arg, USE, LOCATION(skwa)))
VISIT_QUIT(st, 0); // return decreased recursion depth and 0
VISIT_QUIT(st, 1);
}
and now I have to emit the same bytecodes as a keyword in compiler_call_helper
what do you guys think would be the best way of implementing this?
could you elaborate?
not sure if I made a good decision when I decided to extend the Call node.
now, I have to go through the grammar and pass in shorthand keyword arguments everywhere.
At this point, I'm not even sure what I'm doing.
well, it works.
-
so I created 2 branches, one of them not even exposing a new AST node, just changing the grammar by invoking some action helpers and literally handling it as a keyword argument from the beginning(see above).
-
In the second case, I'm adding a new AST node, creating a new grammar rule which returns that node, and then adding it to the symbol table. Moving on to the compiler, I'm just passing in the new node to the function which emits byte codes related to keyword arguments and executes the same instruction when a new shorthand keyword argument is passed in(here the former and keyword arguments should behave the same). I haven't finished this one yet.
what do you think, which one should I go with?
first one sounds way easier than second one
it is easier to implement, maintain and understand
(i guess)
that one is literally one line 😄
2nd of course, it's an additional syntactic feature
hmm, it is more a syntactic sugar.
why wouldn't you go with the first one?
every piece of syntactic sugar needs a method of distinguishing from other code after parsing for user purposes 😄
look around the ast module for a while and you'll find that every tiny detail is exposed there
Details like the variable annotation (x): int being distinct from x: int
as an aside it doesnt expose quotes though does it? hows that different syntactically?
yeah not quote type for some reason
all are Constants
only indirectly by span I guess?
I would have achieved the same thing If I were about to implement it in the compiler. If we handle this feature as a keyword argument from the beginning, everything works just fine. Again, what I'd do in the compiler is emit the same bytecode instructions so a "shorthand keyword argument"(this new thingy) would be basically the same thing as a general keyword argument. Why not just do it in the first place?
So it behaves as a keyword argument during the whole process.
huh
those are the files i'd've changed
for proper support yeah
?
because a new syntax feature is more than just making the compiler parse it right lol
it's not really a new syntactic feature.
I don't see why more complexity is needed.
it also just implements the feature in keyword
I don't see how, would you mind telling me what would that look like?
.
I still don't see why changing more parts of the interpreter would help.
it's changing the least amount of parts possible assuming a newly cloned repo
that is, it's only helpful when there's been no changes yet
here's the "optional value" (y is optional in x=y) approach which reflects in the AST ```pycon
from ast import dump, parse
from dis import dis
print(dump(parse("a(b=)", mode='eval'), indent=4))
Expression(
body=Call(
func=Name(id='a', ctx=Load()),
args=[],
keywords=[
keyword(arg='b')]))
dis("a(b=)")
0 0 RESUME 0
1 2 LOAD_NAME 0 (a)
4 PUSH_NULL
6 LOAD_NAME 1 (b)
8 KW_NAMES 0 (('b',))
10 CALL 1
18 RETURN_VALUE
and here's the "substituted value" (x=x from x=) approach which doesn't reflect in the AST (that is, it's the same as if x=x is written instead of x=) ```pycon
from ast import dump, parse
from dis import dis
print(dump(parse("a(b=)", mode='eval'), indent=4))
Expression(
body=Call(
func=Name(id='a', ctx=Load()),
args=[],
keywords=[
keyword(
arg='b',
value=Name(id='b', ctx=Load()))]))
dis("a(b=)")
0 0 RESUME 0
1 2 LOAD_NAME 0 (a)
4 PUSH_NULL
6 LOAD_NAME 1 (b)
8 KW_NAMES 0 (('b',))
10 CALL 1
18 RETURN_VALUE
Consider the users of ast, like Black or various linters. They probably want to see whether a shorthand keyword is used
wait, wrong reply. but the same person
Do linters check the AST node?
Yes, something like a flake8 plugin will work with the AST
Would it be possible for linters to support this without the ast node
that's not really an appropriate solution for the problem
you keep using "appropriate" but don't specify what you mean by that.
how would a static type checker such as mypy go about this?
based on the current version, this is what the ast looks like.
def func(a, b):
pass
a = 12
b = 12
func(a=, b=)
Module(body=[FunctionDef(name='func', args=arguments(posonlyargs=[], args=[arg(arg='a'), arg(arg='b')], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Pass()], decorator_list=[], type_params=[]), Assign(targets=[Name(id='a', ctx=Store())], value=Constant(value=12)), Assign(targets=[Name(id='b', ctx=Store())], value=Constant(value=12)), Expr(value=Call(func=Name(id='func', ctx=Load()), args=[], keywords=[keyword(arg='a', value=Name(id='a', ctx=Load())), keyword(arg='b', value=Name(id='b', ctx=Load()))]))], type_ignores=[])
Name(...) looks interesting ngl.
a=, keyword(arg='a', value=Name(id='a', ctx=Load()))
(this is how it represented in the ast)
would you create a new field in the Call constructor?
Call(expr func, expr* args, keyword* keywords, shorthand_keyword_arg here)
nope
if you want to i can send the github diff
well, okay.
should i send it now?
yes, please.
is this working fine?
yep
is this what you getting for the ast?
yep
this change looks more similar to https://github.com/thatbirdguythatuknownot/cpython4/compare/6eff0fd..ec72c74
I thought its represented in the ast as shorthand_keyword_arg or something similar.
it seems like a redundant temporary storage to me
yes, the second one that you sent is literally equals to what I sent before.
and in the ast, you are making the expression optional rather than creating a new constructor(node).
yep
then I don't understand where you are coming from.
I think Im not sure what "reflects" in the ast means, based on your understanding this one reflects the ast:
keyword(arg='b')]))
but this is not
value=Name(id='b', ctx=Load()))]))
"reflects" for me basically means "the change shows up in the AST, different from just doing it normally (i.e. the AST produced by x= is different from the AST produced by x=x)"
I see. The diff that you sent is changing some code in the compiler as well. I mean in my approach I handle it as a keyword argument from the beginning so such changes are not needed.
The only disadvantage is that it doesn't reflect the ast, I suppose linters and typecheckers are using the ast to figure out what's going on?
not sure about linters and typecheckers since i don't work with them
fair enough. I might just wait and see how other people react, at this point, we have 2-3 solutions.
mypy dont care about exact ast, it doesnt need that information, from mypy perspective x=x and x= behave in exactly the same way
black, on the other hand, needs this information to correctly reformat your code (but they use their own ast implementation, i think)
pylint, flake8 needs this information to issue warnings like "confusing name of shorthand argument" or something like that, that could happen only with shorthand args
pylint, flake8 needs this information to issue warnings like "confusing name of shorthand argument" or something like that, that could happen only with shorthand args
well if the ast differs it is easier to detect I suppose. Not sure if there are different ways of getting this work without the ast change.
@rose schooner in your diff, are the shorthand keyword args exposing the same bytecodes as a keyword argument?
I just realised in the compiler you are adding stuff to compiler_subkwargs What does that even mean?
the main idea is to use k->arg as the value when k->value is NULL
but what does compiler_subkwargs indicate?
basically the keyword arguments in a CALL_FUNCTION_EX call
e.g. a(*b, c=x+2, d=d, **e, f=f)
c, d, and f are the keyword arguments handled by compiler_subkwargs
Tools that do error reporting (so that includes type checkers and linters) use ast information (spans) to place their squiggly lines in the editor
so like
f(x=) can have two different errors, one where x is not defined and another where f doesn't have a x keyword param
pyright for example gives these two
and it's important to make sure they have the right span information (i know pyright rolls its own thing but that is what I have at hand)
It would also be preferable if ast.unparse(ast.parse(src)) was a no-op (or as close to a no-op as possible) if at all doable, though that's a minor thing
I think your solution is better, would you mind if I use yours and extend all the documentation and tests?
sure
that's not generated
yes, that is what I thought. Although it is not clear why it has to be changed.
CALL always visits the thing without a NULL check
now that field is nullable, so we need CALL_OPT instead
so it is not going to call anything when it's a shorthand keyword arg, because the value is null.
yes
I don't have to extend ast.py nor tokenizer.py since nothing has changed related to these parts, right?
I will check that out.
Should I create a new test file for shorthand keywords or extend the current one test_keywordonlyarg.py? what do you reckon?
probably just add to the existing file
well, not sure if we need more complicated test cases 😄
def testShorthandKeywordArgs(self):
def foo(a, b):
return a+b
a = 12
b = 24
try:
foo(a=, b=)
except:
self.fail("shorthand keyword args are not supported")
combine with **kwargs. check that foo(a=) throws the appropriate error if a is not defined
but I don't have to go through the basic steps right?
def func(**kwargs):
print(sum(kwargs.values()))
a = 12
b = 24
func(a=, b=)
I mean all these are working, properly.
more tests is generally better than fewer
I suppose I also need to extend Cpython's language reference with this new feature?
I might need to change this bit here: https://docs.python.org/3/reference/expressions.html#calls
well, I think the glossary change is enough.
definitely needs a change to the language reference too
what part though, I only found the calls section.(linked above)
at least the grammar there, for keyword_item
and the language reference should also describe how the shorthand syntax works
should that be a new title?
I don't know, that section is pretty long
and in tutorial section about functions probably
why do you care about documentation?
i thought you are doing this only as small experiment in your own fork of cpython
I'm actually implementing it for the pep.
I haven't looked at the tutorial much, not sure if it should necessarily cover all language features?
But if this feature is accepted, I suppose it will be fairly common, so probably good to cover in the tutorial
oh, that is interesting!
functions are one of the most important parts of the language, and a lot of their features are described in the tutorial
if there is another way to pass arguments - i think it should be described in tutorial
looks like there are a bunch of things to change.
I thought this was coming from the grammar directly, but no.
I wouldn't worry about the docs too much tbh
I think having a solid implementation is more important in the first stages
Would it be a good idea to write more detailed explanations about certain Cpython topics(developer guide)? For example, the symbol table could be explained more deeply and can serve as a guide for people so they can understand how to use it when needed.
More documentation is generally good, but adding such documentation also means more things to update when you change something. For rarely changed internal interfaces like the symtable, the cost (having to keep it up to date) might be greater than the benefit (the few people who need to change the symtable have it a little easier).
what I was thinking is more about explaining often-used functions and giving a fundamental understanding of the topic. Such as how to extend the symbol table, how it looks, and all the stuff that you need to figure out on your own before you add something to the language.
The page about "changing Cpython's grammar" is quite short.
Sometimes it's better to add words nearer to the code.
Hey folks, do you have a good intro to the Cpython implementation? A series of blog posts or something? Just diving into the repo seems rather overwhelming
there is a developer guide
you also can find pycon talks about cpython internals
Ah nice, I found a pycon talk by Sebastiaan Zeeff, I'll start there, then work through the developers guide. Thank you!
Are there any tools available that might show how long a thread has held on to the GIL? I've been working with async and I use the debug messages that a task too too long, I was hoping for something similar to that but for the GIL.
do I need to add a new TypeObject to the module in the C-API?
typedef struct {
PyObject_HEAD
} MyObject;
static PyTypeObject MyObject_Type = {
PyVarObject_HEAD_INIT(NULL, 0)
.tp_name = "fputs.MyObject",
.tp_basicsize = sizeof(MyObject),
.tp_doc = PyDoc_STR("MyObject objects"),
};
static struct PyModuleDef fputsmodule = {
PyModuleDef_HEAD_INIT,
"fputs",
"Python interface for the fputs C library function2",
-1,
FputsMethods
};
for the shorthand arguments thing? that shouldn't require manual creation of new types
no, I'm just playing around. 😄
this is not related to shorthand keyword arguments. I'm just testing out a few things on my own.
but now you mention it, should I add shorthand keyword arguments to the C API?
or how would that work?
I don't think so, it's purely a Python-level concept
If you want users to be able to access fputs.MyObject, then yes, you need to add MyObject to the fputs module
how?
Using PyModule_AddObject - see https://docs.python.org/3/extending/newtypes_tutorial.html#the-basics
can someone plz help me with ai stuff, its for home work?
hello
hi
I always wondered where these "private" libraries are coming from.
e.g _symtable
anyone knows?
Modules/symtablemodule.c line 125
PyInit__symtable(void)```
generally just search for the name in ""s
thank you, I always confuse it with some Python modules.
I was searching in the Lib folder.
is this part still relevant to Python's grammar?
https://docs.python.org/3/reference/introduction.html#notation
I can't find those notations in the grammar file.
this is about the informal grammar in the docs, not the formal one in the code
oh okay.
no, this applies only to grammar that is described in documentation
docs use different syntax for grammar, and it is simpler than actual one
oh, Jelle already said that 😄
I was wondering what the current state is with this module: https://peps.python.org/pep-0554/#examples (multiple interpreters in the stdlib)
Python Enhancement Proposals (PEPs)
actually: https://discuss.python.org/t/pep-554-multiple-interpreters-in-the-stdlib/24855/31?u=hels15
is there any reason why test_ast.py doesn't contain check for ast.unparse(...)?
#internals-and-peps message
#internals-and-peps message
it works, but there are some problems (on 3.12)
there's a separate test_unparse.py
Well the caller is not tested, only the definitions such as function definition etc...
I don't understand what you mean
def test_function(self):
node = ast.FunctionDef(
name="f",
args=ast.arguments(posonlyargs=[], args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]),
body=[ast.Pass()],
decorator_list=[],
returns=None,
)
ast.fix_missing_locations(node)
self.assertEqual(ast.unparse(node), "def f():\n pass")
this is not
a = ...
b = ...
f(a=, b=)
It tests the function definition, and not the arguments passed in. In fact, in the module, the "caller site" is ignored.
I mean when you added type params, you only checked the definition def f[T]().
oh, you mean unparsing Call nodes is not tested. That does seem to be mostly true, though there's a test for call((yield x)). We should add more test cases covering other kinds of calls.
oh okay, thank you. Would it be fine then to cover call nodes in unparse as well? Because if we do, we should do all of them(not just shorthand keyword args).
yes, we should add more tests there
well, I think I can add a few and send a pr?
sounds good!
Wrong channel, #❓|how-to-get-help
man, no dontasktoask.com?
anyone wanna see code i wrote tired
no well too bad
eh nvm
This channel is for python internals
Has there been any talk about elevating pathlib.Path, like making Path a builtin?
all builtins are lowercase, so maybe path?
what would be the benefit out of curiosity
no reason to use str for paths anymore
is there a reason currently?
some people are too lazy to from pathlib import Path
Paths make so many things easier than string paths, and I want to normalize using them by default.
as a builtin, do you think there would be a possibility for dedicated syntax?
what exactly does "normalizing" mean here
I already use them everywhere
you could have a path literal, or not have to import Path
why would we need dedicated Path syntax if we don't have dedicated regex syntax?
those are the two main potential improvement I could imagine, I suppose
I'm not sure if those are super compelling though
fair enough
I will say I've been putting off, putting from pathlib import Path in my ipython "startup" for.... months now 😛
I really should do it
80% of the time I start up ipython and start messing around with our code, I need to immediately start by importing Path and datetime
so I can construct the Paths and dates I need, to pass into functions and see the outputs
but that also kind of illustrates how this is a rabbit hole, why make Path a built-in, why not date, or datetime, or regex, etc
Path is technically more difficult to make a builtin as it's entirely implemented in Python, while regexes and datetimes are mostly in C already
interesting
Don't think that should be a blocking concern though; if we really wanted Path to be a builtin we could reimplement it in C or find a way to allow builtins to be written in Python
I didn't realize that was a requirement for something to be a builtin
this was my advisor's excuse for hating Python. (she's a perl user.)
It's currently a practical requirement in that all builtins are in C
interesting. Do you think proposing on python-ideas that Path be added as a builtin would be dead-on-arrival?
(I got flamed last time I suggested something, and I'm still emotionally damaged.)
out of curiosity when you say "built-in"
what do you really want
just not having to import it?
I assume it means having Path within the builtins module, which transitively means not having to import it, yeah
Since it's already somewhat built-in by being part of stdlib
It just kind of feels like a tough sell to add anything to builtins, to me
honestly, half the things there don't belong there
but will stay out of inertia
Namespaces are one honking great idea -- let's do more of those!
I've gotten mildly exasperated so many times at the IDE warnings for naming a variable id 😛
i can probably count on one hand how many times I've actually used the builtin id function
Agree there's a few builtins that really don't need to be there
I tried to argue against aiter and anext becoming builtins but no luck
what do those even do? I thought Python always had clear naming 
they call __aiter__ and __anext__.
damn I didn't even know about those dunders, I guess I will need to educate myself
ah yeah makes sense
memoryview is also... pretty sus as a builtin
honestly looking at the list of builtins I feel like 80% of the stuff here I barely use and I don't really see a reason for it to be builtin
and it's fairly arbitrary that things like abs and pow are builtin, but not say exp
I mean 80% is random exception classes 😛
lol
touche
I was just looking at built-in functions
abs and pow I can see since they call dunders
oops
I am not sure i have ever seen ascii() used
I mostly use it via !a in fstrings/format strings
floor and ceil? so you can get them to work on your custom number-like type
oh wait, pow probably doesn't, I may have been wrong about that one. I think it just does the clever exponentiation by squaring thing
ah. dang the whole function forwarding to a dunder thing is awkward
no, it calls __pow__
ah
though there are complexities around the three-argument version I think
I think ultimately the only thing that builtins gives you is "you don't have ot import it" - so I'm not sure if there should be any criterion for builtins other than "super duper commonly used"
there's no problem with something calling a dunder being in math - there's already stuff like that there
Yeah, the builtin selection is pretty strange, upon further review
and yet there is no builtin to call __index__ or __float__
float()?
A few broad categories of builtins:
- data structure returning functions
- math
- itertools
- cpython internal stuff: id, other things probably
- reflective stuff: setattr, getattr, eval, exec
- type hierarchy stuff (isinstance, issubclass, etc)
- IO (print, input)
- pretending to be keywords (classmethod, staticmethod)
let me see what other categories are needed to cover basicallye verything
I guess str and repr don't really fit in any of those
where would you put bin() and hex()?
right, so "type conversions" I guess?
(another two I've basically never seen used)
bin, hex, str, and repr
that doesn't work super well if you don't want str accepted
bin and hex I believe are for repl usage
Also chr and ord, those plus print/input I'd put in the category of convenience for the repl yeah.
I use the python repl for converting between bases all the time
and then there's also hash
ascii() I think is a Py2 remnant?
nope ```% ~/.pyenv/versions/2.7.18/bin/python
Python 2.7.18 (default, Oct 5 2023, 10:23:23)
[GCC Apple LLVM 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
ascii("x")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'ascii' is not defined
ascii definitely should not be a builtin, I agree
repr, hash, id, getattr, setattr are all kinda fundamental operations...
why does that matter that they're fundamental
how many times have you called the hash builtin directly
my guess is people thought early in the 2/3 transition that ascii() was something you'd want to do all the time
but it really isn't
pretty often when implementing a custom __hash__
I do think there are cases where people call repr when they want ascii
how often are you implementing a custom hash these days?
because they forget ascii exists
i mean, you'd need to be implementing a custom hash, that nonetheless uses hashes of other existing python types
but doesn't just do the obvious approach to combining them
because then a dataclass would already give you that
but people also don't really put non-ascii into reprs, so it doesn't really matter
I'm saying that in relative terms, calling the hash built-in directly is very rare.
if we go back to the type that started this convo, lets take from pathlib import Path. My guess is that 80% of the builtins, including hash, are used far less than Path
there are 56 def __hash__ in our internal codebase. Likely many of these should be dataclasses but they're not
Do all of those always call hash ?
Builtin is an impossible thing considering so many people use python vastly different.
too lazy to check that, but most of them probably do
But like, improvmenets could be made
i don't think built-ins are "impossible". You just pick the best possible set you can. Python's current set just happens to be pretty terrible.
Yes sorry. "impossible" was the wrong word
obviously any two people will not agree precisely on what the built-ins should be. But there's still things that will be a lot more popular than others.
I mostly get annoyed over the built-ins that I basically never use, yet occupy an extremely natural variable name. that's why I used id as an example, I think it's one of the most egregious
Yeah, id is definitely the worst offender
i've never used id outside of toy pieces of python demonstrating when new objects are created or not
So always develop for the most popular? It isn't a horrible strategy but it has flaws too. Hence the added complexity. (But I do tend to agree with what you are saying)
idk what you mean by "develop by the most popular" - but this is literally just a list of functions you save people from one line importing, there's not really much depth here
popularity is quite fine
it's not like language design decisions, like whether async should work via A or B, and A is more popular, but is it necessarily better, etc. Nothing that deep 😛
open is also written in python, but it is builtin
we can do similar thing with Path - import pathlib every time and insert Path name into builtins namespace
it's not?
!e print(open, type(open))
@dusk comet :white_check_mark: Your 3.12 eval job has completed with return code 0.
<built-in function open> <class 'builtin_function_or_method'>
hmmm
it has this weird thing where it claims to be in the io module
but it's definitely written in C
yes, that confused me, sorry
ascii is the weirdest one in my opinion
I mean there are tons of them that are weird
the fact that I've been programming python at one level or another for... idk, more than a decade, and there are functions here I've never used ever in a real program 🤷♂️
id, ascii, hex, bin, hex
even if I used ene of these things they would still just be one offs in the implementation of some function
that reminds me that bin is also one of the worst offenders; I've wanted to name variables bin many times, if they were the path to a binary
yes
I appreciate your directness.
why is pow even a thing when it has syntax?
Sorry for not responding here earlier. Agree with @raven ridge that it probably wouldn't go anywhere. My default expectation is that any proposal for new syntax/builtin/very basic behavior is unlikely to pass
For builtins specifically it's not too clear what the inclusion criteria are, so who knows what people would say
(And a technical thing: instead of python-ideas the mailing list, we now have the Ideas category on discuss.python.org. It might be a little less prone to flaming.)
I guess one reason is the three-argument version?
!d pow
pow(base, exp, mod=None)```
Return *base* to the power *exp*; if *mod* is present, return *base* to the power *exp*, modulo *mod* (computed more efficiently than `pow(base, exp) % mod`). The two-argument form `pow(base, exp)` is equivalent to using the power operator: `base**exp`.
interesting
@merry bramble I believe has some traumas about three-argument pow() that he might now be typing out 🙂
what's fun is that builtins.pow and operator.pow work differently
builtins.pow accepts 3 arguments, operator.pow only accepts 2
for Reasons
it sort of makes sense though, right? operator.pow should correspond to **, but ** is only a binary operator
Anyway, yes, look on my Works, ye Mighty, and despair: https://github.com/python/typeshed/blob/359d4c095ddb3cc7a67f418a6685e1b094cb3825/stdlib/builtins.pyi#L1672-L1741
A Path literal is an interesting idea, though. That sounds much more useful to me than just moving Path from pathlib to builtins.
p""? 👀
It's not crazy to imagine a p"..." that behaves like __import__("pathlib").Path(r"...")
I'm not sure it's a good idea, but it seems not totally unreasonable
that'd be nice tbh
Probably the biggest reason for having raw literals is paths, anyway, but now that we have a better data type for paths than str... Hm.
not regexes?
I suppose you'd use raw literals for Windows paths
p"whatever".open() would look a bit weird
True, regexes as well
Why?
Doesn't really seem any weirder than b"...".decode() to me
Well you could just use open(p"whatever") instead.
One problem with making Path a builtin is that you'd also potentially want PurePath, the two OS flavours of each of those, and also now PathBase/PurePathBase for making subclasses...
that seemed to me the best, concrete idea from the start of the discussion
Potentially, but I think Path is used far more than PurePath
you could in that particular case, but speaking personally at least, in the new (i.e. last few years) code we're writing lately, everything takes Path
so if you want to startup ipython and run some things interactively
you need paths
also yeah I'd agree that Path is used vastly more than PurePath
that said I don't really see why Path is a stronger candidate per se to get its own literal than say, dates, or datetimes. those are super common too.
i think at a language level the only things to really do are either a) accept that literals will only be for a handful of the lowest level types, and use functions for other stuff, or b) have a way to define new literals.
Time for macros I guess.
There's been talk about a way to define new literals for Python for quite a while now
I'd say literal, constant paths are much more common than literal, constant dates or timestamps, at least outside of testing code
Do we actually want a literal syntax? What's our actual goal here, saving a few characters and an import, or do we want it because constant forms can be cached?
One argument is that using Paths is a good practice, and having literals for them means the language encourages a good practice
That is a good reason yeah. They're so much nicer than os.path.
I guess that's probably fair
I still think it's pretty ad hoc to start picking and choosing new literals like that
I guess in python it's more justifiable since python has container literals. but a) while it seemed good at the time, I don't think that decision aged well, and b) the langauge is a lot more tied to list/dict/set I think, than it will ever be to Path.
Am I right, that the only reason for pathlib is to try to hide the insanity of handling pathnames on Windows?
No. Handling paths is annoying on every platform
i'm not even sure what is more difficult on windows than on unix, other than the "current directory per disk".
I feel like drives add significant complexity
and junctions
i don't think it rises to the level of "insanity," but idk what @random thistle was thinking of.
Only Windows has
- Single-letter drive names
- “Reserved” file names. Google tried to document how these worked, but even they managed to miss a few cases.
I don't think pathlib does anything about those, does it?
Huh, looks like it does
!d pathlib.PurePath.is_reserved
PurePath.is_reserved()```
With [`PureWindowsPath`](https://docs.python.org/3/library/pathlib.html#pathlib.PureWindowsPath), return `True` if the path is considered reserved under Windows, `False` otherwise. With [`PurePosixPath`](https://docs.python.org/3/library/pathlib.html#pathlib.PurePosixPath), `False` is always returned.
```py
>>> PureWindowsPath('nul').is_reserved()
True
>>> PurePosixPath('nul').is_reserved()
False
``` File system calls on reserved paths can fail mysteriously or have unintended effects.
TIL
the reserved names are definitely odd. I don't see how single-letter drive names is a problem, but i haven't done a lot of windows programming.
UNC paths are odd, too
On every platform, you have weirdness like that a/foo/../bar and a/bar might be different files
Ya know... pathlib doesn't really do much to help with one of the most difficult things to handle about POSIX paths: they're not necessarily textual strings at all. They're sequences of bytes, but they need not be text.
heh
typing out
(typing powers is a pain IIRC)
julia has a thing called non standard string literals which lets you modify the behaviour of custom"foo bar"
these behaviors are not hard-coded into the Julia parser or compiler. Instead, they are custom behaviors provided by a general mechanism that anyone can use: prefixed string literals are parsed as calls to specially-named macros.
Nim does something similar as well
That would be nice to have, not sure about long term effects on readability though
but also python doesnt have macros like julia does so it might be out of place
there's a PEP for it
(with guido support i think)
Sounds nice, hopefully it'll get better treatment than none-aware operators
deferred since 3.8, might as well be rejected it seems
genuine question: why would path"abc.foo" be better than Path("abc.foo") ?
Depends, although it's more of a QoL thing than a really necessary change. If we're assuming it's a built-in prefix, it would eliminate the need to manually import pathlib, promoting the use of paths. If using Path is as easy as just adding p in front of a literal, there'll be little to no reason to notuse real Paths in place of strings in APIs that could support both.
If we're talking "custom prefixes in general, including one for Path", then it depends on how prefix imports would be handled. Best case, for paths specifically, it would only mean that it doesn't matter if you import pathlib, from pathlib import Path or import pathlib as ..., regardless of how you imported it, just p'abc/def.ghi' works, every time. Worst case, it changes effectively nothing, just removes parens
That's what my interpretation is, anyway
i guess i keep coming back to 1) imports are not hard, and 2) we don't have regex literals. There are lots of programs that don't deal with file paths, just like there are lots of programs that don't use regexes.
but i've been on the wrong side of lots of discussions that have added new features to the language, so ¯_(ツ)_/¯
Objectively speaking, it really doesn't change much if at all, especially when using an IDE that autoimports stuff for you (vast majority of use cases)
It would be most noticeable when running something in REPL and/or short, throwaway scripts
for REPL, you can define a startup file to import tons of stuff for you.
I think path"foo" is bad for several reasons, but one showstopper is that currently the string prefixes are all single-letter flags. That pretty much makes custom string prefixes not doable anymore.
How would you combine it with f-strings? fpath"foo{bar}"? pathf"foo{bar}"? Either? pafth"foo{bar}"?
cpp also has something like this, but it works as calls to some functions at runtime
paſth"long-s"
since th = f, we should use paf"abc/def"
Must.... Stop... Procrastinating....
from * import *
someone has to make that work
considering that there are some joke imports guaranteed to give you errors, I'd argue that it already works as well as it would if it were implemented
!e from future import braces
@sour thistle :x: Your 3.12 eval job has completed with return code 1.
001 | File "/home/main.py", line 1
002 | from __future__ import braces
003 | ^
004 | SyntaxError: not a chance
!e from * import *
@sour thistle :x: Your 3.12 eval job has completed with return code 1.
001 | File "/home/main.py", line 1
002 | from * import *
003 | ^
004 | SyntaxError: invalid syntax
it already gives the right error 😉
I think a humorous error might only confuse beginners who try this
my biggest peeve with the python import system is probably the fact that imports show up transitively
like, the typical thing that people write, say import foo, is the "wrong" thing 95% of the time, since technically you are saying that foo is public API of your package
it should be import foo as _foo
Even worse, if you are using, say Path as an implementation detail, you shouldn't write from pathlib import Path
but rather from pathlib import Path as _Path
yuk
eh, you can just use __all__ to exclude it
That just excludes it from * imports afaiu?
last I checked it also removes from the autocomplete suggestions
hmmm nope? weird, my autocomplete is even listing private variables. was it different for the current project vs installed dependencies or something... eh, never mind
I guess it's ultimately convention
Still I would say this is a pretty ugly way to work in most cases
The average file doesn't want to re-export anything
You shouldn't need to re-list every single publicly defined entity in all to achieve that
anyone here that can code videos like this? https://www.tiktok.com/@jdg_creative
i think you meant to send that in one of the ot channels.
ye ye
Python can do token-level macros, with the help of the ast module. I used this in my wrapper for Asterisk, to implement both synchronous and asynchronous variants of all the main API classes from a common code base.
:ok_hand: applied timeout to @unkempt rock until <t:1699484454:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
Hi everyone! I'm getting familiar with the cpython repository for the first time out of hobbyism/interest. I want to implement some syntax changes in a personal version of python (not stuff I ever expect to get integrated at a large scale).
I have a high-level sense of how the interpreter works:
parser tokenizes input code string based on the PEG grammar
then an AST is made from the tokens
then the AST is turned into bytecode
the bytecode is run in the PVM
And so to implement new grammar, I'd need to modify the first three steps. I'd need to modify the grammar file (and maybe something manual to the parser but not sure), the formation of AST nodes to allow for a new one, turning that AST into bytecode
but honestly I'm finding the repo a bit intimidating and see where I'd make some of these changes but am confused about others
Is there some reference PR I can look at where someone else implemented new syntax recently? Like myabe the := syntax or something. I can't find a clean example in the PR history but maybe I'm missing it
This is a really small one - adding shorthand keyword arguments into CPython. Feel free to take a look at it here: https://github.com/Hels15/cpython/commit/bd99a946c83f67bfc8cae3c14ef7b9fbb6f0bd6a
Also note that you shouldn't change the parser manually, and the files related to ast.
A better and more detailed one can be found here: https://github.com/python/cpython/pull/103764
to add to what @cyan raven said, this page is useful: https://devguide.python.org/developer-workflow/grammar/. Depending on the change you're making, you might not need to change the bytecode
Thanks folks -- will check these out. I appreciate it
oh and one minor correction to your initial post: you said "parser tokenizes input code string based on the PEG grammar"
in fact the PEG grammar kicks in at the next step. First the tokenizer turns raw source code into a stream of tokens, then the PEG grammar is what turns the token stream into an AST
Ahh I appreciate it; that makes sense
Other than https://devguide.python.org/developer-workflow/grammar/, is there something you reccomend looking at to learn how CPython works and which files are relevant? E.g. something that says something like "First, input code is tokenized using Parser/tokenizer/string_tokenizer.c. Then, the tokens are run through a PEG parser specified by Grammar/python.gram. Then, an AST is built using {x}", and so on. Basically some way to learn how to think about which files are doing what other than by reading the source. If no such thing exists, no problem, that's probably expected
there are talks about python internals on pycon and other conferences, you might find something useful there
the https://devguide.python.org/internals/ section has a bit more too
wonderful, thank you both
I'm working on a toy change where I want to introduce the ?. syntax, where a?.b is equivalent to None if a is None else b
My sense is that I don't need to change the compiler because all the changes can be done at the AST level: when parsing tokens into the AST, I can parse ?. into AST nodes presenting None if a is None else b instead of making a new type of node
So I think this means that I need to:
- update
Grammar/python.gramto introduce the syntax in the grammar - update
Grammar/Tokensto introduce the new token - run
make regen-tokenandmake regen-pegento regenerate the tokenizer (which I think isParser/tokenizer.c? but not sure) as a function of those two updated files - update
Parser/parser.cto handle the new case, appending AST nodes corresponding toNone if a is none else bfor the new kindSafeAttribute_kind
Then, add tests and such
I understand that this approach -- where I only change the AST and not the compiler -- may mess up things like showing where syntax errors are. I'm actually not sure why this is but I assume it's because there's ambiguity about what source code led to the bytecode that's actually encountering the error when unparsing. But I can live with that for this first project.
Does this sound right to you? Is this missing anything?
Hmm I'm feeling most uncertain about the last step there -- do I update Python/ast.c instead of Python/parser.c? Parser/Python-ast.c got updated through some of the make regen-* functions I think; I see a new _PyAST_SafeAttribute(....) definition in it
It looks like I might want Parser/parser.c ultimately, but it gets generated. So would I really need to change Tools/peg_generator/pegen/c_generator.py to make sure that parser.c emites a production rule for safe_attr that appends a set of AST nodes that represent None if a is None else b instead of _PyAST_SafeAttribute, which it seems to have done by default?
don't ever update Parser/parser.c
you almost certainly don't need to update Tools/peg_generator
phew alright glad I asked the gurus, lol
also i'm pretty sure a?.b does None if a is None else a.b
for your change, if it was a real CPython change, I would recommend adding a new AST node instead of generating a fake one
however, what you suggest could work too. I don't think you should see a new _PyAST_SafeAttribute function though
so i already did this some time ago and i didn't create a new AST node although that would've probably helped
no wait
nvm
create a new AST node or not the amount of writing to do are sort of the same
Hm, so suppose I wasn't making a new node; what's the right place to turn the new token into multiple AST nodes instead of a new AST node that needs to be implemented in Python/compile.c?
If it isn't parser.c
My fear about adding a new node is that I think that means I have to touch Python/compile.c, which I understand is doing something like generating bytecode from the AST, and it looks intimidating af
Though I do understand how it could end up resulting in the same amount of writing, if this^ is accurate @rose schooner
Really appreciate the guidance btw
Python.gram, in the action for the new grammar rule you're adding
e.g. this is the rule for a if b else c ``` | a=disjunction 'if' b=disjunction 'else' c=expression { _PyAST_IfExp(b, a, c, EXTRA) }
you'd have to write a call like that in your rule but hardcode some of the arguments
I see so instead of
primary[expr_ty]:
+ | a=primary '?.' b=NAME { _PyAST_SafeAttribute(a, b->v.Name.id, Load, EXTRA) }
| a=primary '.' b=NAME { _PyAST_Attribute(a, b->v.Name.id, Load, EXTRA) }
I want to do something like (replacing just the added + line)
_PyAST_IfExp( _PyAST_Attribute(value, attr->v.Name.id, Load, EXTRA), attr, _PyAST_Constant(Py_None, NULL, EXTRA))
yeah something like that, I think the attr in the middle is wrong
it should be equivalent to a is not None, I think a Compare node of some sort
is there a pep about intrinsic bytecode instructions?
#esoteric-python message - example of such instruction
they are described in docs, but i thought there is also a pep
no, implementation details don't generally go through PEPs
PEP 659 about the specializing adaptive interpreter is sort of in this area
ok, thanks
I think intrinsics were initially added to free up some bytecodes used for uncommon operations, and then I really jumped on them for PEP 695 🙂
MIgrated from #python-discussion, per @is.alex's recommendation:
On Python, why does the GIL's removal needs to be done over several versions, aside from not causing a shock in the ecosystem? Is there "This One Thing" that needs the GIL so much that its immediate removal wreaks havoc?
there is: thousands of C-extensions on pypi
it is explained in the pep clearly, and also there is discourse thread
What does the GIL have to do with C-extensions? If anything, just the move from 3.10 to 3.11 has broken plenty of C-extensions by deprecating certain names
(Posted for the whole Steering Council.) As we’ve announced before, the Steering Council has decided to accept PEP 703 (Making the Global Interpreter Lock Optional in CPython) . We want to make it clear why, and under what expectations we’re doing so. It is clear to the Steering Council that theoretically, a no-GIL (or free-threaded) Python wo...
i guess removing gil is a lot more breaking change
such as?
yes, i think so
Python Enhancement Proposals (PEPs)
Many C extensions implicitly depend on the GIL for protecting access to data structures. They were designed under the assumption that the GIL was present to make concurrency simpler.
Is there a reason for pyright and mypy to reject this, or is it a bug?
def f(x: int | None):
match [x]:
case [None]: print('not here')
case [y]: print('doubled', y*2)
It works fine if the None/y is not in a nested pattern
it is kinda bug
typechecker doesnt know type of what variable to narrow, so it doesnt narrow anything, so in second case y is still int | None
is there any reasons why you are using [x] instead of x?
In the actual example I ran into this in, yes. I simplified here
I rewrote it to just check for None in the match branch
Does someone know where the lexer entry point is in Cpython? Where the source is passed in(presumably as a string).
neat, ty
there is a family of PyRun_ functions
they call this: https://github.com/python/cpython/blob/main/Parser/pegen.c#L927
Parser/pegen.c line 927
_PyPegen_run_parser_from_string(const char *str, int start_rule, PyObject *filename_ob,```
Parser/tokenizer/string_tokenizer.c line 54
decode_str(const char *input, int single, struct tok_state *tok, int preserve_crlf)```
Parser/tokenizer/string_tokenizer.c line 112
_PyTokenizer_FromString(const char *str, int exec_input, int preserve_crlf)```
are they invoking the lexer functions as well? such as "get next token" and stuff like that?
i think tokenizer is lazy, it is like python generator
parser pulls tokens from it when needed
https://github.com/python/cpython/blob/main/Parser/pegen.c#L756C6-L756C6 - there parser stores tokenizer inside of it
Parser/pegen.c line 756
p->tok = tok;```
im too scared to go further
I'm talking about the first time when the source is fed in and the lexer makes tokens out of it.
lexer.c doesn't seem to have a main loop or smt, so I suppose those are just helper functions.
which is a bit tricky.
this looks like a constructor for tokenizer
what is this for then? https://github.com/python/cpython/blob/main/Parser/lexer/lexer.c
I'd expect the lexer to produce tokens, but it seems like that CPython has a different feature for that.
Parser/lexer/lexer.c line 366
tok_get_normal_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct token *token)```
i guess that returns next token
interesting.
There doesn't seem to be a way to disable the specializing adaptive interpreter, is that right? I have a bizarre situation that would be explained by code objects being mutable.
hmm no i don't think so
If you force a function to re-specialize enough times it eventually gives up, but I don't know if there's any more explicit way to disable it...
thanks, but I've gotten to the point of writing a bug: https://github.com/python/cpython/issues/111984
Oh. That bug breaks a library of mine as well I guess. Just haven't run it yet on Python versions where it matters yet.
I'm shocked that's hashing byte codes at all... I'd have expected function hashing to only be based on identity
that is what Mark Shannon is saying elsewhere.
it only matters if code is being instrumented by sys.monitoring
And sys.monitoring only exists in 3.12+
Oh, I see. Carry on then :D
well, soon, that will be "it only matters if coverage.py is measuring the code," so it will matter 🙂
Hi guys -- thanks for your help earlier. I'm working on a project to implement the ?. syntax, such that a?.b is equivalent to None if a is None else b. This is just a personal project for me to understand python better; don't worry, I don't have any grand hopes of this getting incorporated into the language
When I run a simple test of it I see a failure:
Traceback (most recent call last):
File ".../repos/cpython/Lib/test/test_safeattr.py", line 12, in test_safe_attr
a?.b
AttributeError: 'NoneType' object has no attribute 'b'
Does anyone understand why this is happening? I would expect that my code in compiler_visit_expr1 would cause this to be evaluated as None. Here is the relevant piece of code: https://github.com/nishu-builder/cpython/pull/2/files#diff-ebc983d9f91e5bcf73500e377ac65e85863c4f77fd5b6b6caf4fcdf7c0f0b057R6285
- Visit
e->v.SafeAttribute.value, which is theaina?.bI believe. This puts the value ofaat the top of the stack - Make another copy of it and put that at the top of the stack. So now our stack looks like
[a, a] - Load NONE on to the stack. Stack:
[a, a, None] ADDOP_I(c, LOC(e), IS_OP, 1);, which I believe pops the last two items on the stack, checks if theyIseach other., and puts that result on the stack. Stack:[a, a is None]- Pops the last value off the stack, and if it's true, jumps. Stack:
[a] - a) if a is None: end
This leaves None on the top of the stack, and nowhere do we try to access a.b as far as I can tell, so I'm not sure where the AttributeError is coming from
I think I got @rose schooner and @feral island 's thoughts earlier (for a version of this that potentitally only changed the grammar), but I'm taking their advice and trying to actually make a dedicated AST node and its own bytecode generation
Posted here #1173063440609333269 message in case it's better to have the discussion there
Oh never mind I got it! I just had my comparator flipped for the a is None check. All good
that's a more well documented change than what i did
:incoming_envelope: :ok_hand: applied timeout to @steep wagon until <t:1699812176:f> (10 minutes) (reason: links spam - sent 26 links).
The <@&831776746206265384> have been alerted for review.
is this module just empowering the gc built-in library or actually used along with reference counting as a garbage collector in CPython?
https://github.com/python/cpython/blob/main/Modules/gcmodule.c
that's the implementation of the gc module
where is the garbage collector implemented in CPython then?
oh, I'm wrong - that's the implementation of the gc module, and the implementation of the garbage collector itself - they're both in that file
Modules/gcmodule.c line 1198
gc_collect_main(PyThreadState *tstate, int generation,```
The guy who implemented the module says "Reference counting is still used. The garbage collector only frees the memory that the reference counting does not (ie. reference cycles)." --
So I'm pretty sure that the gc module acts as a garbage collector over the whole cpython and not just as an independent module.
yeah, that's what I meant by my correction - it does contain the implementation of the garbage collector itself
oh okay, thank you.
is it possible to patch something in gc module and use other gc algorithm?
rc is baked into python pretty deeply, but gc works on top of that, so i think it might be possible to change algo
yes, but it's very fiddly code
nogil will change how gc works
Now I'm giving a go at the same idea but for subscripts (e.g. a?[b] is equivalent to None if a is None else a[b]. https://github.com/nishu-builder/cpython/pull/3
Though I'm realizing that this totally messes with parentheses balancing! This would mostly be fine, I think, if ?[ was one character, since lexer.c seems to make the assumption that each token is one character
Anyone have a sense of how to address this?
!pban 783527861238759435 seems like you're just here to spam advertisement
:incoming_envelope: :ok_hand: applied ban to @novel lynx permanently.
specially handle ?[ in the C tokenizer files or just parse ? and [ as separate tokens
also a?[b] *= 4 checking for a would be a nice feature albeit it'll take some more thinking to implement
rc?
what is the algorithm that CPython using(for gc)? "mark-and-sweep"?
If this is the starting point which implements the whole garbage collection into CPython, why isn't it used in the codebase other than its original file?
well:https://github.com/python/cpython/blob/main/Modules/gcmodule.c#L1198
Modules/gcmodule.c line 1198
gc_collect_main(PyThreadState *tstate, int generation,```
reference counting
not just reference counting. That is why the GC module was utilized no?
It doesn't use mark-and-sweep. It uses a bespoke garbage collection algorithm that's implemented in terms of reference counts
is that the same as "tracing garbage collection"?
No
rc = refcounting
it is independent from gc, current gc relies on the fact that rc is happening
i know python uses this distinction but in general, RC wouldn't be considered "independent" from GC; it's just part of the overall GC strategy
i think about it in a different way: GC is a nice addition to RC-based memory deallocation strategy
in python 99% of all deallocations happen because rc dropped to 0
and only tiny amount of objects participate in refcycles
wondering what a bespoke garbage collection is then.
I might just go through the source and see what's happening, can't really go further
devguide explaing garbage collection in details
The docs I linked above are the best explanation that I know of. Definitely start there before reading the source
It's not a tracing GC, because a tracing GC assumes the existence of some roots that it can trace out from, and CPython's GC doesn't
__main__ is the root that references everything else
(and there are other roots: sys, builtins, cached ints)
It does not work that way
then how does it know which part of graph should be deleted and which should not?
The doc I linked above explains that 😉
I mean I get that but it's also just good to use terms in their standard usage
garbage collection is a much broader thing than "the cycle detector for a reference counted language"
CPython tends to say that it's got two different GCs, a naive reference counting one and an optional cycle collecting one
that's a pretty weird take too, I think; if someone said a language had two different GC's I would expect that they both work correctly on t heir own and you choose which to use
if you only use one of python's GC's, then it... doesn't really have correctly working GC in the sense that almost anyone would expect
they're practically speaking two components of python's GC
!zen 13
Although that way may not be obvious at first unless you're Dutch.
😄
it is interesting to imagine disabling python's cycle collector and re-imagining it as a swift like, ARC language
encourage the use of finalizers!
no more context managers
You would use the cycle collector instead as a diagnostic tool to tell you when you'd accidentally created cycles
this seems like a potentially fun wacky project and I'm confident that nobody is actually trying to do this in prod (or at least I hope not)
My takeaway is that we should direct all our further questions on this to @feral island
he is the former prime minister after all
hello! all folks here are python professionals?
Sweet, did that
I originally had no grand delusions of getting my code introduced into Python, but now that I've written the changes I can't help but wonder if there's a shot
I posted this https://discuss.python.org/t/new-syntax-for-safe-attribute-and-safe-subscript-access/38643/2 and someone responded noting that this PEP exists, which has some real overlap: https://discuss.python.org/t/new-syntax-for-safe-attribute-and-safe-subscript-access/38643/2
I'd love to work on that PEP, but I'm not sure (and can't find if I scan the github PRs/etc) if this is already done or claimed. How can I find that out and claim it if it isn't done?
Are you aware of PEP 505 PEP 505 – None-aware operators | peps.python.org ? If you’re looking for an easy way to find prior discussions on this subject, that would be something to search for.
I see, it looks like PEP 505 has been deferred
Well @rose schooner or @feral island (or others; pinging you two since you both provided me help over the last few days) if either of you has interest in being the PEP Delegate for these changes, I'd really appreciate it. No presure at all!
And is it appropriate for me to submit PRs for a Deferred PEP (if I have the understanding that it may no get merged)?
Personally, speaking as not-a-core-dev, I think it would be better to submit a PR for it than not. Sure, it'll probably get closed, but it might provide a starting point for someone else to pick this up if the PEP ever gets un-deferred
you also can mark PR as draft and explicitly state that it is an implementation for existing PEP
Sweet thanks
cool
i have no clue how to answer those questions about the PEP since i'm not a core dev nor a PEP writer but would you perhaps wanna see how i implemented it?
You mean a PEP sponsor, right? A PEP delegate is someone who the SC appoints to accept or reject a PEP for them.
But I don't think that's applicable here either, unless you want to create a new PEP.
I don't want this change in Python so I won't sponsor a PEP for it. It's true that PEP 505 exists but was deferred. I don't know if or when it will ever get revived; since the PEP is so old a new PEP for the idea might make more sense at this point.
:incoming_envelope: :ok_hand: applied timeout to @unborn pelican until <t:1699976366:f> (10 minutes) (reason: emoji spam - sent 129 emojis).
The <@&831776746206265384> have been alerted for review.
For relative imports, does anyone have a go-to resource that explains how the leading-dot notation works, and how to run code involving relative imports with -m?
the first part is easy. if the module name between from and import starts with a leading dot, then:
- raise an error if
__package__isn't set ("attempted relative import with no known parent package") - split
__package__apart on dots - count how many leading dots there are in the import
- raise an error if there aren't at least that many components in the split
__package__("attempted relative import beyond top-level package") - remove one fewer than that many components from the end of the split
__package__ - rejoin whatever's left with dots, and prepend it to whatever came after the import's leading dots with a dot between them
so if __package__ is foo.bar.baz then: ```
from . import bang # from foo.bar.baz import bang
from .a.b import bang # from foo.bar.baz.a.b import bang
from .. import bang # from foo.bar import bang
from ...something import bang # from foo.something import bang
from .... import bang # error ("attempted relative import beyond top-level package")
and for part 2... If you run python -m foo.bar.baz, __package__ will be set to something different depending on whether foo.bar.baz is a package or a non-package module. If it's a package, then the __main__.py will be run and __package__ will be set to "foo.bar.baz". If it's a non-package module, then baz.py will be run and __package__ will be set to "foo.bar" -- either way, relative imports will be resolved as explained above, based on the value of __package__
!rule 7 ad
no cross-channel spamming either
6. Do not post unapproved advertising.
7. Keep discussions relevant to the channel topic. Each channel's description tells you the topic.
I just found a limitation with inspect that I wasn't aware of...
You can't get the sourcecode (using inspect) from a function defined within a shell session or an exec call... This will fail:
code = """
import inspect
def my_fun(n):
return n ** 2
inspect.getsource(my_fun)
"""
exec(code)
The same if you define the function in a shell session.
It kinda makes sense, because the findsource tries to read the lines from the source file[0], which doesn't exist.
But i'm wondering if this limitation could be somehow overcome? 🤔
[0] https://github.com/python/cpython/blob/0ee2d77331f2362fcaab20cc678530b18e467e3c/Lib/inspect.py#L1130
it might be overcome by that change where they put the REPL/shell sessions in temporary files
I think I just found a bug. Can anyone confirm that this is in fact unintentional?
class ConstructsNone(BaseException):
@classmethod
def __new__(*args, **kwargs): return None
raise Exception("Printing this exception raises an exception :3") from ConstructsNone
In Python/ceval.c, in do_raise(), when you raise an object, cpython checks if it's an exception type, and if it is, constructs it by calling it with no arguments. Then it checks to make sure that what was constructed is in fact an exception. Then it does the same thing for the exception's cause. If it's a type, it constructs the cause by calling it with no arguments. But, for the cause, it actually doesn't check to make sure that the result of the call is in fact an exception, it just stores the result without checking.
Then, when the interpreter goes to print the cause, it expects it to be an exception. This leads to yet another exception being raised, telling you that the cause is the wrong type.
!e
class ConstructsNone(BaseException):
@classmethod
def __new__(*args, **kwargs): return None
raise Exception("Printing this exception raises an exception :3") from ConstructsNone
@grave jolt :x: Your 3.12 eval job has completed with return code 1.
001 | TypeError: print_exception(): Exception expected for value, NoneType found
002 |
003 | The above exception was the direct cause of the following exception:
004 |
005 | Traceback (most recent call last):
006 | File "/home/main.py", line 5, in <module>
007 | raise Exception("Printing this exception raises an exception :3") from ConstructsNone
008 | Exception: Printing this exception raises an exception :3
That does sound like a bug
I'll submit a PR then :3
@feral island https://github.com/python/cpython/pull/112216
:3
cool find!
Fancy seeing you here ❤️
I felt the same way 😛 That plus the bug being genuinely cool made me want to reach out
Added the test, and made sure it worked with both a patched and an unpatched version.
Hi
Is this feature implemented in CPython already or not yet? https://peps.python.org/pep-0703/#reference-counting
https://peps.python.org/pep-0703/#garbage-collection-cycle-collection
Python Enhancement Proposals (PEPs)
I think Sam has a PR open for some refcounting changes but it's not merged yet
Could you share the pull request, please?
oh actually biased refcounting was merged, https://github.com/python/cpython/pull/110764
based refcounting
Is there anything left to do -- in terms of implementing the pep?
all the stuff in https://github.com/python/cpython/issues/108219 that's not closed yet
i.e., a lot
I would also highly appreciate if someone would give me an explanation about this issue here:
https://github.com/python/cpython/issues/84752
Eric said this is still an active issue and needs to be resolved.
I suppose they are open to new contributions/contributors.
