#internals-and-peps
1 messages · Page 34 of 1
was just typing that, yes
The mypy Playground is a web service that receives a Python program with type hints, runs mypy inside a sandbox, then returns the output.
the P and R cannot belong to TimerPrinter -- they're function's type args
When you had class TimerPrinter[**P, R], it means that the entire class is generic.
When you instantiate e.g.: tp = TimerPrinter("spam"), what type is tp? I don't know, and mypy also doesn't know (it doesn't do enough bidirectional inference for that)
(btw, we also have #type-hinting)
Thanks @grave jolt, I had erred in assuming as it was a method I should use type parameters on the parent class.
Can confirm this fixes the issue at hand, thanks both!
Both are possible configurations, but they have different meaning
For example, if list had a map method, this would be the signature ```py
class list[A]:
def map[B](self, func: Callable[[A], B], /) -> list[B]:
...
So class-level type parameters are for data held in the class, rather than (only) passing through it? Makes sense.
something like that, yes
hey all
why don't we have a logical "xor" in the python syntax?
not important enough?
of course you can make it work with and and or, but it's more verbose
You can also use != for booleans. Indeed it’s pretty uncommon to need, also you can’t have the short circuiting behaviour of the other logical ops, so it wouldn’t add much?
what would it output for non-booleans?
boolean is not boolean or the above is already an option too
That’s another good point. True xor True gives False so this would have to either return bools or do ~a, would be weird and inconsistent with the other logical ops.
yeah I've been using != with parenthesis
i also know about operator.xor in the stdlib
or even, return bools only in the (truthy, truthy) case 🙀
I don't think returning the function'𝕤 arguments (like and and or do) makes sense for xor because it can't do any short circuiting.
Am I missing something? Yes we do. ^
>>> True ^ True
False
>>> True ^ False
True
>>> False ^ False
False
That's not logical xor
Only in the sense that it can consume more than just booleans and spit out an answer. But otherwise ,,, why is it not sufficient?
logical xor would be False for 1 xor 2 the same way 1 and 2 is not the same as bool(1 & 2)
Yea, fair enough I guess
exactly, ^ is bitwise xor
but i mean, C language doesn't have logical xor as well.
has logical and && and bitwise and &, logical or || and bitwise or |
but xor only has the bitwise operator ^
after some googling it seems there's no benefit in logical xor because there's no short-circuiting with the xor operation - of course, because we need to check both values in order to get the xor result
I was just curious, thanks all =)
fun fact, this is deprecated but ~True is -2 lol
>>> ~True
<python-input-0>:1: DeprecationWarning: Bitwise inversion '~' on bool is deprecated and will be removed in Python 3.16. This returns the bitwise inversion of the underlying int object and is usually not what you expect from negating a bool. Use the 'not' operator for boolean negation or ~int(x) if you really want the bitwise inversion of the underlying int.
-2
>>> ~False
<python-input-1>:1: DeprecationWarning: Bitwise inversion '~' on bool is deprecated and will be removed in Python 3.16. This returns the bitwise inversion of the underlying int object and is usually not what you expect from negating a bool. Use the 'not' operator for boolean negation or ~int(x) if you really want the bitwise inversion of the underlying int.
-1
>>> import numpy as np
>>> ~np.True_
np.False_
>>> ~np.False_
np.True_
~True being truthy is a big footgun
do you think this is worth fixing? i guess not, but maybe...
>>> import os
... os.posix_spawn("/bin/echo", ["echo"], [])
...
Traceback (most recent call last):
File "<python-input-2>", line 2, in <module>
os.posix_spawn("/bin/echo", ["echo"], [])
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'keys'
it's the known PyMapping_Check caveat, it doesn't really bother that much most of the time
i guess in this case posix_spawn should be as fast as possible, so anything other than PyMapping_Check (like, checking for collections.abc.Mapping?) would likely be worse
that function is intentionally low level, even if other functions in cpython use *_Check
Might be reasonable to catch the AttributeError and raise a TypeError instead
That shouldn't make anything noticeably slower, since it's only on the error path
!e
file = open("banana.py", "w")
file.write("""
weird = '''
Multi-line string that contains a stinky surrogate character: \\udead
'''
""".lstrip())
file.close()
import banana
print(len(banana.weird))
:white_check_mark: Your 3.14 eval job has completed with return code 0.
65
!e
file = open("banana.py", "w")
file.write("""
'''
Module-level docstring that contains a stinky surrogate character: \\udead
'''
weird = "totally normal"
""".lstrip())
file.close()
import banana
print(len(banana.weird))
:x: Your 3.14 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File [35m"/home/main.py"[0m, line [35m11[0m, in [35m<module>[0m
003 | import banana
004 | [1;35mUnicodeEncodeError[0m: [35m'utf-8' codec can't encode character '\udead' in position 68: surrogates not allowed[0m
Which of these is a bug?
Or maybe it's documented somewhere that module level docstrings cannot have surrogates?
(Where are module-level docstrings even documented?)
it seems to only happen in 3.13+, this works fine in 3.12
looks like it's this PR https://github.com/python/cpython/issues/81283
I opened an issue: https://github.com/python/cpython/issues/142411
Related, this seems to be out of date: https://docs.python.org/3/tutorial/controlflow.html#documentation-strings
The Python parser does not strip indentation from multi-line string literals in Python, so tools that process documentation have to strip indentation if desired.
Even though it looks like the example after was updated to show the indentation being removed.
good catch
Why does the Python CLA bot say it can "act on my behalf"? 🤨
I think that is shown for all GitHub apps, and should be read as "act on my behalf, under the constraints of the previous permissions", which in many cases isn't really acting on your behalf at all.
Not sure if that's the right channel, I stumbled accross that link https://arctrix.com/nas/python/gc/, but it's very old, is it still how Python internals work?
or is there now mark and sweep?
https://github.com/zpoint/CPython-Internals/blob/master/Interpreter/gc/gc.md this seems to be more recent (although 5 year old at this point) and still only mentions a copied reference count that we decrement to detect unreachable cycle (so no mark and sweep)
is there any more recent resource validating that?
I remembered it being in the devguide, so had a look there, and saw it linked to those docs now (https://devguide.python.org/internals/garbage-collector/)
some other nice docs there that I didn't realise existed, cool
no trust me google is worthless when it comes to github
especially if you're trying to search in specific branches
ah yeah makes sense, it's not only me then
i always have trouble finding .md i remember seeing in the past
there’s been a bit of an effort to improve the devedocs recently
i unfortunately started a refactor of the devguide that is hanging out there confusing people 🙁
I wonder if someone could review this docs-only PR 👀 https://github.com/python/cpython/pull/142413
Not sure if completely removing the unindent paragraph from the tutorial is a good idea. But since the docstrings are now automatically unindented, it seems inappropriate to put these verbose rules in the tutorial (it's probably only interesting in you're doing some metaprogramming and/or making python tooling, so you're way out of the tutorial)
good point about the indentation of docstrings. For compile(), i wouldn't list all the exceptions explicitly. We generally don't in the docs, and the versionadded paragraphs cover it.
(I can put that on the PR)
I think the change is unintentional (but with very fringe impact), so we're kinda trying to find a workaround in the docs... Maybe the "ValueError" clause could be amended to say "if the source contains null bytes or <vague but definitive description of other exceptional conditions that are not syntax errors>"?
(since UnicodeDecodeError happens to be a ValueError)
oh, if it's still a ValueError, then leave it at that. I'd even take out the versionadded clause about UnicodeDecodeError.
The doc currently says
This function raises SyntaxError if the compiled source is invalid, and ValueError if the source contains null bytes.
but now ValueError can appear in some other conditions, that's what I meant
Sure, make that sentence broader
Like "if the source contains null bytes or any docstrings cannot be encoded as UTF-8"? Seems a bit too specific
i think what we need to convey is that if you're calling compile and expect it to sometimes fail, you should really catch both SyntaxError and ValueError
"raises SyntaxError if the source is invalid Python, or ValueError if it cannot be read as source"
🤔 cannot be read as source?
that feels weird, it's valid Python, but compile can't understand it?
The core issue is that it's not defined anywhere that this is invalid (even if it probably doesn't make sense for it to be valid), it's just an implementation quirk
it's not valid: it has null bytes, or surrogates.
!e
This case is different. Originally, ValueError would be raised if the source contained actual null bytes
compile(b"\x00", "test.py", "exec")
``` (well... it seems to raise `SyntaxError` now? lol)
:x: Your 3.14 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File [35m"/home/main.py"[0m, line [35m1[0m, in [35m<module>[0m
003 | [31mcompile[0m[1;31m(b"\x00", "test.py", "exec")[0m
004 | [31m~~~~~~~[0m[1;31m^^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
005 | [1;35mSyntaxError[0m: [35msource code string cannot contain null bytes[0m
!e And for surrogates, having a surrogate in the source code is understandably invalid (seems like this case of raising ValueError was not documented either, would be a good change too) ```py
compile("\udead", "test.py", "exec")
:x: Your 3.14 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File [35m"/home/main.py"[0m, line [35m1[0m, in [35m<module>[0m
003 | [31mcompile[0m[1;31m("\udead", "test.py", "exec")[0m
004 | [31m~~~~~~~[0m[1;31m^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
005 | [1;35mUnicodeEncodeError[0m: [35m'utf-8' codec can't encode character '\udead' in position 0: surrogates not allowed[0m
!e
My issue is dealing with the case where the source is perfectly fine UTF-8 bytes (or a string that's encodable as UTF-8), but a docstring contains a representation of a surrogate code point
source = """
def fn(): "\\udead"
"""
print(source.encode("utf-8")) # source is valid as utf-8
compile(source, "test.py", "exec")
``` (also it reports a position of 0 without saying where in the _source_ it is)
:x: Your 3.14 eval job has completed with return code 1.
001 | b'\ndef fn(): "\\udead"\n'
002 | Traceback (most recent call last):
003 | File [35m"/home/main.py"[0m, line [35m5[0m, in [35m<module>[0m
004 | [31mcompile[0m[1;31m(source, "test.py", "exec")[0m
005 | [31m~~~~~~~[0m[1;31m^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
006 | [1;35mUnicodeEncodeError[0m: [35m'utf-8' codec can't encode character '\udead' in position 0: surrogates not allowed[0m
i don't think we have to be really specific about what exceptions get raised when.
the only reason to mention SyntaxError is that it's an unusual exception.
so just say "this function can raise SyntaxError or ValueError"?
I'm trying to see this from the perspective of a caller. I want to call this function, what exception should I potentially catch? (i.e.: what are the failure modes)
this is a good perspective. If the docs say those two exceptions, what code will you write? How would that code be different than except Exception: ?
well, if I write except Exception:, it will also catch a typo like cmopile, or passing the wrong type, like compile(42, "answer.py", "exec")
For some background, I found about this behaviour because of this IPython code: https://github.com/ipython/ipython/blob/b3edf5e417f209fb9407b9e2ce071aaf2e80b563/IPython/core/async_helpers.py#L149-L155
IPython/core/async_helpers.py lines 149 to 155
try:
code = compile(
cell, "<>", "exec", flags=getattr(ast, "PyCF_ALLOW_TOP_LEVEL_AWAIT", 0x0)
)
return inspect.CO_COROUTINE & code.co_flags == inspect.CO_COROUTINE
except (SyntaxError, MemoryError):
return False```
The authors probably read the docs for compile, concluded that the cell can't contain null bytes, so there's no reason to call ValueError
MemoryError is the odd thing there, but repls are weird.
So if I understand correctly: it would be better if the docs just said This function raises SyntaxError if the compiled source is invalid. and didn't mention any other exceptions?
and IPython should change the except block to be except Exception:?
i think mentioning syntaxerror and valueerror is fine. I'm torn because most functions don't detail all the exceptions they can raise and why.
So "this function will raise SyntaxError or ValueError if the source is invalid" or something like that?
maybe the docs should stay as is, and this new corner case should just be a fun easter egg to discover
(and the fact that null bytes do not in fact raise ValueError)
that'd be the best option, because I can just close my PR 😎
no, the PR has docstring indentation, which is good (should have been two PRs)
I think I'm slowly uncovering some kind of pandora's box
!e
import ast
tree = ast.parse("banana.answer = 42")
tree.body[0].targets[0].value.id = "ban\x00ana"
c = compile(tree, "<string>", "exec")
exec(c)
``` i'm trolling python at this point
:x: Your 3.14 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File [35m"/home/main.py"[0m, line [35m6[0m, in [35m<module>[0m
003 | [31mexec[0m[1;31m(c)[0m
004 | [31m~~~~[0m[1;31m^^^[0m
005 | File [35m"<string>"[0m, line [35m1[0m, in [35m<module>[0m
006 | [1;35mNameError[0m: [35mname 'ban' is not defined[0m
!e
This is even more fun ```py
import ast
tree = ast.parse("banana.answer = 42")
tree.body[0].targets[0].value.id = "ban\udcffana"
c = compile(tree, "<string>", "exec")
exec(c)
``` bizarrely, the variable name is decided to be ban\udcff (from checking c.__code__.co_names)
:x: Your 3.14 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File [35m"/home/main.py"[0m, line [35m6[0m, in [35m<module>[0m
003 | [31mexec[0m[1;31m(c)[0m
004 | [31m~~~~[0m[1;31m^^^[0m
005 | File [35m"<string>"[0m, line [35m1[0m, in [35m<module>[0m
006 | [1;35mUnicodeEncodeError[0m: [35m'utf-8' codec can't encode character '\udcff' in position 3: surrogates not allowed[0m
so it seems like UnicodeDecodeError may appear at various places when you might not expect it
No one expects the Spanish inquisition UnicodeDecodeError
!otn a nobody-expects-the-UnicodeDecodeError
:ok_hand: Added nobody-expects-the-𝖴nicode𝖣ecode𝖤rror to the names list.
it's a UnicodeEncodeError that's happening here, not a UnicodeDecodeError (and tbf, UnicodeEncodeError is way more unexpected)
!otn d nobody-expects-the-UnicodeDecodeError
:ok_hand: Removed nobody-expects-the-𝖴nicode𝖣ecode𝖤rror from the names list.
!otn a nobody-expects-the-UnicodeEncodeError
:ok_hand: Added nobody-expects-the-𝖴nicode𝖤ncode𝖤rror to the names list.
I wonder if this should be reframed: perhaps the interpreter should catch that UnicodeEncodeError itself and raise a SyntaxError instead...
I think the behaviour is an implementation accident (i.e.: a regression), especially since an error is not raised in -OO.
But Guido said it's a feature and not a bug, so there isn't much of a hill to die on
I think it'd be reasonable to go with:
This function will raise a
SyntaxErrorif the source is syntactically invalid. It will raise aValueErrorif the source is syntactically valid but cannot be compiled.
It's weird that there are cases where syntactically valid code can't be compiled, but it is apparently the case...
fun side fact: you can put the ASCII bytes #coding:utf-16 at the start of a Python file, and Python will try to decode the file as UTF-16 and (obviously) fail
presumably with a UnicodeDecodeError in that case
no, SyntaxError
huh.... I bet that internally is catching a UnicodeDecodeError and raising a SyntaxError instead. That's even more argument in favor of catching the UnicodeEncodeError in compile and raising a SyntaxError instead
you can make that argument based just on consistency, based on that...
Yes, that's what I think should've happened. But is it worth changing?
probably, yeah
at least, I think so 🙂
it's very weird to me that the alternative is documenting that compile() sometimes fails to compile syntactically valid Python code.
that's always felt weird to me though I've never tried. I guess coding comments just only work with ASCII-compatible encodings?
the alternative I could think of is to read to the first (ASCII) newline and decode the rest according to the given encoding
but obviously that would give you a file that few other tools would be able to read
I wish we could have deprecated removed the whole # coding: thing with the move to Python 3 (and always assume UTF-8), but I guess Python 3k was a bit too early for that..
given that compile stopped raising ValueError for null bytes in source in python 3.11 and nobody complained, maybe nobody actually cares about the precise interface of compile, and users should just assume to always catch SyntaxError and ValueError when calling compile?
what users really need to know is what exceptions it can raise that can be caused by the source string, versus which ones are not user-controlled. If the user can provide a string which provokes a ValueError, I do think that needs to be documented.
This is the latest change I'm proposing:
https://github.com/python/cpython/pull/142413#discussion_r2606990406
This function raises :exc:
SyntaxErroror :exc:ValueErrorif the compiled source is invalid.
It was already documented that ValueError can be raised if the source contained null bytes (which isn't true since python 3.11, but whatever)
It is a bit unfortunate that passing an incorrect mode (other than "eval", "exec", "single") or incorrect flags will also raise a ValueError... but that's standard library exceptions for you, I guess
Looks like in theory it can also raise OverflowError but perhaps not in any realistic situation
Like if name mangling produces a name that is over Py_SSIZET_MAX long
i'd say that's covered by this clause
MemoryError, too.
I think that's fine. If it prompts a discussion amongst the core devs about whether code that raises a ValueError is really syntactically valid or not, even better.
but even if not, it tells end users what they need to know
I think it's very weird to say that a certain piece of code is syntactically valid even though it's impossible to compile it or include it in a Python module, and I'd hope most core devs would agree with that position, heh
the alternative is kind of saying that the code is syntactically valid with -OO but not without, which seems way weirder to me... surely the syntax of the language doesn't depend on compiler optimization level
I'm with you there
Easy. ```py
def is_ex_parrot():
"check if the parrot is \udead"
This program is `ill-formed, no diagnostic required`.
anyway... I think I'll let the core developers figure it out, I don't really care what the final behavior is
I should probably spend my time fixing the IPython crash
right, I understand the example, I'd just call it syntactically invalid. It can be parsed into an AST, but can't be compiled into a Python module (at least, not with default flags)
compiling strings will always ignore the encoding header
compiling bytes won't
there's PyCF_IGNORE_COOKIE but idk if you can pass it via compile, it may not be in the compile bitmask
agree! though probably not worth it
i think this might be a similar case to the recent https://github.com/python/cpython/issues/142396
Yeah, that is pretty similar.
Wheelnext pep draft ♥️ https://pep-previews--4740.org.readthedocs.build/pep-0817/
Just bought my PyCon ticket
this will also be my first time in California where I leave the airport.
what you've heard is true: california outside the airports is nicer than california inside the airports.
👀
i wonder if i can abuse my club budget to buy myself a pycon ticket
call it a field trip or something
it's only $100 for students. but I guess that's still a lot if you have zero monies.
i spent it all on food 🥀 🥀
are there like discounts if we buy like a bulk amount of tickets and have a huge group come
if there is, it isn't advertized, so you'd have to ask.
the flights and hotels will certainly cost more than the conference tickets
there are travel grants available if the cost is a hardship, though - but only so many to go around.
Yeah. Looks like my company can only cover the conference ticket and my time this year (though the time is the most important part to me)
if it's not too far from the bay I could probably stay with a friend who lives there
You mean the San Francisco Bay? It's in southern California, not realistic to commute there from the Bay Area
~7.5 hour drive, give or take 🙂
😔
if I want to test the cpython jit, do I need to build from source?
answering myself, no, I can do PYTHON_JIT=1 python3.14 -c "import sys; print(sys._jit.is_enabled())"
Huh that's weird I thought the compiler was off in the build system by default
Depends on distributor. On macos and windows its on in the build just off by default. On RHEL and uv it's the same as well
Enjoy! Too bad I just moved out of southern California otherwise I probably would've come
Has this PR never been backported to 3.13? 🤔 a bit confused about the label indicators
https://github.com/python/cpython/pull/134392
Looks like it was not backported
The bot will remove the needs-backport labels once the backport is done
If miss islington is having skill issue and cannot backport automatically, that means I should just create the backport PRs manually?
(unrelated PR)
or is there some bot command magic?
Looks like Łukasz wanted to backport https://github.com/python/cpython/pull/127532 first. But yes, if miss-islington cannot cleanly backport you need to make a PR using the cherry picker CLI as described in this comment https://github.com/python/cpython/pull/134392#issuecomment-2898078744
Backporting the other PR may allow miss-islington to backport the one you linked.
Closes #127529
All good to go. ConnectionAbortedError now continues instead of returning. Improves OpenBSD performance. Full writeup in the issue.
I've left InterruptedError grouped with Bl...
Issue: asyncio.create_unix_server has an off-by-one error concerning the backlog parameter #90871
I'll comment to follow up
yo it worked
I think I got cherry_picker into a werid_state with ctrl+c... but the remove-section thing worked ```
$ cherry_picker f6b6a99aa5d63702b8e8101864ae08e615131702 3.14 3.13
🐍 🍒 ⛏
^C
Aborted!
$ cherry_picker --abort
🐍 🍒 ⛏
Run state cherry-picker.state=FETCHING_UPSTREAM in Git config is not known.
Perhaps it has been set by a newer version of cherry-picker. Try upgrading.
Valid states are: BACKPORT_PAUSED, UNSET. If this looks suspicious, raise an issue at https://github.com/python/cherry-picker/issues/new.
As the last resort you can reset the runtime state stored in Git config using the following command: git config --local --remove-section cherry-picker
Yeah cherry-picker is not atomic, you may want to reset state if it failed
is cherry_picker an alias for git cherry-pick?
No, it's
!pypi cherry-picker
so that's what they use to get bugfixes into all currently-supported python versions, using the same commits?
Usually it's done automatically by the miss-islington bot. But sometimes it can't figure it out due to merge/rebase conflicts
hi everyone .. I just started to learn python ... so that I reatched to until now
in future I wanna work at backend ... so I saw many videos on YouTube that say .. u can get onto backend from python ... so any advice
@worldly jacinth You're in the wrong channel. See #❓|how-to-get-help or #python-discussion
oh .. sorry.. ok .. thank u
Where is the original message?
you can click on forwarded messages
oh, just the very last line, got it.
@spark magnet Hi It's me
Does a ThreadPoolExecutor in a with block such as this one automatically make the main thread wait for all tasks on other threads to finish before continuing execution on the main thread?
with concurrent.futures.ThreadPoolExecutor(max_workers=24) as executor:
sid = 0
for cr in creatures:
futures[future_count] = executor.submit(single_iteration, sid, cr, equivalent_seconds * 240)
print(futures[future_count])
sid += 1
future_count += 1
# When this point is reached, have all other threads already completed execution?
Yes. Exiting the with block behaves exactly as though shutdown() was called.
Thanks!
How do the finalizers run here:
import gc
class A:
def __init__(self):
self.other: None | B = None
self.value = 10
def __del__(self):
if self.other != None:
print("A.__del__ sees other =", self.other.value)
class B:
def __init__(self):
self.other: None | A = None
self.value = 15
def __del__(self):
if self.other != None:
print("B.__del__ sees other =", self.other.value)
a = A()
b = B()
a.other = b
b.other = a
del a
del b
gc.collect()
In the python docs (https://github.com/python/cpython/blob/main/InternalDocs/garbage_collector.md), it says "Call the finalizers (tp_finalize slot) and mark the objects as already finalized to avoid calling finalizers twice if the objects are resurrected or if other finalizers have removed the object first."
What is this notion of resurrected object? Does it apply here?
an object is resurrected if it is taken out of the unreachable reference cycle, i.e. made reachable again
since neither a nor b are made reachable to anywhere other than each other, it doesn't apply here
based on a couple of searches on the internet and reading the gc notes, at least
Ah ok I see
But then here, how does one finalizer refer the other object? Or maybe Python first runs the finalizers and only then after calling all the finalizers it frees the allocated memory?
are you asking because you have code using __del__, or because you are trying to understand the intricacies of Python's implementation?
I'm trying to understand the intricacies of the implementation
Out of curiosity and also because I'm interviewing these days I guess although that probably won't come up
ugh, i hope they don't ask about esoteric trivia like this
Nah for sure they won't
It's just I'm preparing by reading some general stuff about GC and then I got curious
I'm the "make sure the candidate actually knows python" person for my department, and I'd never ask about __del__
tbh, i have asked a ladder of questions that start very simple and get esoteric, but I make clear at the beginning that I expect we'll get to "I don't know", and that is fine, I'm just trying to gauge your level of knowledge, none of it is a deal-breaker.
I wasn't trying to say that'd be necessary to know for my interviews
I was genuinely curious
that's cool too, no problem.
Rust programmers re-wrote a portion of the Linux kernel in Rust. That Rust code had a crashing vulnerability in an "unsafe" chunk of code... which Linux is littered with.
More from The Lunduke Journal:
https://lunduke.com/
...what does this have to do with python?
Python is moving to use Rust instead of C
There's a proposal (not even a PEP yet) to add the ability to write accelerator modules for CPython in Rust.
Just a FYI
Rust allows you to invoke unsafe functions, which require you to carefully uphold invariants to avoid undefined behaviour (like you always have to do in C). That has never been a secret.
this isn't an accurate summary. There's a proposal to write some very small things in Rust. "Instead of" sounds like switching the entire implementation.
(Linux also issues CVEs very conservatively; any CPython issue that involves out of bounds access, data races, use after free or similar would each get a separate CVE)
see: https://social.kernel.org/notice/B1JLrtkxEBazCPQHDM
Rust is is not a "silver bullet" that can solve all security problems, but it sure helps out a lot and will cut out huge swatches of Linux kernel vulnerabilities as it gets used more widely in our codebase.
That being said, we just assigned our first CVE for some Rust code in the kernel: https://lore.kernel.org/all/2025121614-CVE-2025-68260-558d@gregkh/ where the offending issue just causes a crash, not the ability to take advantage of the memory corruption, a much better thing overall.
Note the other 159 kernel CVEs issued today for fixes in the C portion of the codebase, so as always, everyone should be upgrading to newer kernels to remain secure overall.
The Lunduke Journal has a strong anti-Rust bias, calling Rust users a "cult". They also chose to deadname and misgender me in their "reporting" on Rust for CPython...
Sounds like they should not be taken seriously.
What a weird, reactionary take. Rust prevents certain bugs by construction, in the same way as Python prevents certain bugs by construction. This code explicitly opted out of the constraints which provide those guarantees, and it has a bug - the developer's reasoning for why it's safe to bypass those constraints was incorrect. That's... not interesting. Someone could have easily implemented exactly the same bug in C instead of in Rust
Never having heard of The Lunduke Journal before, it's immediately obvious that this is clickbait
clickbait from the wrong side of the cultural tracks.
@crisp locust you might want to find better sources.
Rust is fine, but I think that comment above should be stressed to some people who are a little too into Rust
Rust is is not a "silver bullet" that can solve all security problems
Being insulted for not knowing how to read Rust code by language supremacists (The kind of developers that treat languages as "Which one is the best" instead of seeing them for what they are; Different tools for different purposes) who latched on to Rust as "The best" is unfortunately not a new thing for me
That said, that's only for extension modules. Off the top of my head (I could be wrong) I am pretty sure that they're not actually rewriting Python in Rust, especially not the Interpreter itself, that's a lot more work
Rust is is not a "silver bullet" that can solve all security problems
Certainly true! Even formally verified code can have logic errors. But rates of security issues and crashes have been shown to be a lot lower, across all projects I have seen that have adopted Rust.
The current state of the proposal is that any part of the interpreter could be re-implemented in Rust, with the caveat that a C implementation is needed until a future point when Rust could be made required.
There is however no concrete plan to make Rust required at the moment
I'm sorry this happened to you :/
It's really unfortunate how toxic people can get about programming languages
Definitely Rust does help a lot with those issues :) I don't deny that, I was more talking about people who treat Rust like it's infallible and flawless or something along those lines (Rust aside, we all have met the kind of developer who treats languages like that, I'm willing to bet those interactions most people have is usually negative)
Tribalism unfortunately is quite a thing among some developers sadly
I can't find a "Thanks" emoji to react to that message so have a thumbs up instead
Speaking of that's good to hear, slow adoption always helps people keep up instead of rushing to the next new thing!
Yep! I see this as one great experiment. I hope it is successful like Rust for Linux (or perhaps even more so), but we'll see!
Wish your proposal all the best!
TIL how horribly broken __static_attributes__ is:
self = object()
class Test:
@staticmethod
def test1():
self.a = 1
def test2(self):
if False:
self.b = 2
@staticmethod
def test3():
del self
self.c = 3
@staticmethod
def test4():
self = object()
self.d = 4
def test5(this):
this.e = 5
@staticmethod
def test6():
def inner(self):
self.f = 6
print(Test.__static_attributes__) # ('a', 'b', 'c', 'd', 'f')
does it... literally just tell you all the attributes looked up on a name called self?
attributes assigned on anything called "self", yeah
that's so janky
Reminiscent of the magic of super() which also checks for a string match.
found a good one
(3.13+) run python | xargs and press Ctrl+C for terminal flood
i guess the new repl shouldn't even run if stdout is not a tty?
but it only checks stdin
I'm working on my pycon talk proposal (please don't flame me for waiting til the last minute). should I include code examples that I might refer to in the slides, or is that more granular than is helpful?
If they help the reviewers understand what you are going to say, i would include them. They should be easily skippable if the reviewer doesn't want to read them, right?
also, if my proposal doesn't get accepted this year, do I get any feedback? In either case, would it be rational to re-submit it next year during the mentorship period?
idk if they give feedback. I think it would be fine to re-submit it.
Well it's submitted now. Fingers crossed.
<@&831776746206265384>
!clban 1236408674763669526 scam
:incoming_envelope: :ok_hand: applied ban to @grizzled gazelle permanently.
Kek
This easter egg fills me with joy:
>>> hash(float("inf"))
314159
>>> hash(-float("inf"))
-314159
Include/cpython/pyhash.h line 19
#define PyHASH_INF 314159```
Who needs math.pi when you have PI = hash(float("inf")) / 1e5?
hash(None) is set to 0xfca8420 too, though that's less interesting than pi
How do they guarantee something else doesn't make that hash
Isn't it randomized per interpreter session
The hash seed that is
Hashes are not assumed to be unique. That's impossible within 64 bits
If you have a hash table with an internal capacity of e.g. 64, it's only going to use the last 6 bits of the hash. If two values happen to have the same last 6 bits, it's called a "hash collision", and hash tables must be able to account for them. That's why dict and set use both __hash__ and __eq__
That’s why hash collisions are a thing and need to be handled right
There are special hash tables where hash collisions are impossible. That can be the case when you have a known possible set of inputs (e.g.: keywords in a programming language), so you can pre-compute a hash function in advance that you know will never lead to collisions (called a perfect hash function). But that's a special case
(obviously you can't do that for an arbitrary dict)
Well, don't leave us hanging
because -1 is a signal that something went wrong or something like that?
yeah because a C function that returns ints needs to signal errors
!e
class Apple:
def __hash__(self): return -69
class Banana:
def __hash__(self): return -1
print(hash(Apple()))
print(hash(Banana()))
:white_check_mark: Your 3.14 eval job has completed with return code 0.
001 | -69
002 | -2
I recently came across an old blog that purported to have invented a super fast interpreter dispatch mechanism based on call threading, and when I put it to the test it genuinely seems like the blog wasn't lying, even though the blog was working with 32 bit x86 and I was testing it on 64 bit x86 almost 20 years later with so many things different and probably didn't implement it 100% correctly, it zooms along in test code: https://godbolt.org/z/6or8z6WPj
But the downside is the mechanism invented in the blog was incredibly ugly and messy, and I unfortunately don't know how to improve it to take advantage of newer features that we have now compared to the limitations the blog was working with in 2008
struct instruction;
using handler_t = uintptr_t(*)(instruction *, unsigned int);
struct instruction {
handler_t handler;
unsigned int operands;
};
instruction *pc;
unsigned int stack[12];
unsigned int *ptr = stack;
#define INCREMENT pc = instr + 1; return reinterpret_cast<uintptr_t>(instr[1].handler)
uintptr_t push(instruction *in...
A bit of a shame really if it wasn't so ugly I could genuinely see this being proposed to faster-cpython
Cool. CPython doesn't use switch-case though except on MSVC. On GCC, it uses computed gotos. On new Clang, you can also use a form of indirect call threading via tail calls, which achieves the same thing as what you're describing. If you're on Windows, I just merged indirect call threading for Python 3.15 for VS 2026, you might want to try it out https://fidget-spinner.github.io/posts/no-longer-sorry.html
That's awesome stuff. I lol'd at "This is of course, a hopefully accurate result.".
I also think that entire paragraph is wholesome and exactly the good-faith way we should all act when trying to make something better.
Hehe, thanks for the kind words!
Also, one note: this send me on a 'what is the GHC calling convention' side quest, if you have a good link, might be nice to add
hope you don’t mind that I shared it on lobste.rs: https://lobste.rs/s/5mvar3/python_3_15_s_interpreter_for_windows_x86
someone shared it on hackernews so i think it's fine
oh hey it’s #1 on HN, nice christmas present!
good 'ole hacker news!
How much energy has been wasted worldwide because of a relatively unoptimized interpreter?
There’s certainly a less rude way to make that point. But that is not the way of the orange site. Lobste.rs is, unfortunately, only situationally and marginally better but I still prefer the community there.
i'm not sure there's a point to be made. We'd have to compare the energy use of a full Python development cycle to the energy use of the same software built with some other technology. We'd have to evaluate what "relatively unoptimised" means, we'd have to compare to energy use worldwide, etc, etc. Grumblers be grumbling.
I think it’s fair to say that until recently interpreter performance wasn’t getting a lot of focus and now it is. There’s no need to bring counterfactuals into it though.
imagine how much energy has been expended compiling C++ and Rust 
(ignoring for now the amount of energy expended on using LLMs)
How about how much energy is expended shipping memes around? There's so much we could pick on.
hey nedbat please can you be careful to not pick on so many things we are trying to conserve energy from message sending
I'm sure the total energy expense of generative AI is far, far greater than the extra cost of running python code compared to equivalent code in a compiled language.
that does use a lot of Python though
(but the root cause, of course, is that lot of those models don't really need to exist in the first place)
For generative AI, the amount of execution that's happening in the pure python part is negligible
Python is just telling the NVIDIA chips what to do, and that's the overwhelming majority of the cost.
The blog compared the special call threading it did back in 2008 to token threading (The fancy name for computed goto) and found that it also outperformed computed goto
I can't comment about tail call dispatch though, since I don't know how it's properly implemented
The frustrating thing is I reached out to the author of the blog recently and he revealed that he had vastly improved the original design (Which at the time was working with the limitations of 32 bit) to be significantly faster and take advantage of all the goodies that 64 bit architectures offer, as much as it can be without resorting to handwritten assembly, but he had been hired by several companies to build interpreters for them in the meantime, so he can't share this improved design with me, only give a general overview of how the implementation works
Of course, I'm nowhere near smart enough to understand how to build such an advanced system from a description alone, so I can't work on faster Interpreters for programming language implementations that I care about
I generally don't think it's about smartness for the case of building faste interpreters. Sometimes we just don't have enough context/understanding yet. Keep on rocking!
Haha thanks! That is fair, it does depend on a lot of experience too, which helps understanding. Well, that and you have to know a ton about processors and how instruction sequences interact to yield performance!
I can't help but feel like it's a missed opportunity though, I'm thinking of what could have been if Darek's (The author of the blog) new designs were publicly available
Do you have a link to the blog post?
Sure, http://www.emulators.com/docs/nx25_nostradamus.htm but as mentioned by the author himself it's no longer the fastest dispatch possible on new processors which is why I've stopped mentioning the original code
I will caution you it is an extremely long read
Interesting. The table of function pointers and call to them was actually observed here as well https://lobste.rs/s/5mvar3/python_3_15_s_interpreter_for_windows_x86#c_3bbyse
Also I wonder if the blog post is outdated, modern architectures might behave differently than the old Core 2/Pentiums that they're benching this on.
Also I wonder if the blog post is outdated
It is, the author mentioned you can do a lot more on 64 bit these days in emails, the implementation he uses now is an improved version of the blog post's code that is significantly faster but is also the property of his employers, so isn't in the public domain
Link to the blog?
It's above, here
I want to contribute to the https://github.com/python/cpython/issues/116738 so I can help with PEP 703.. I am asking if someone would be willing to review my PR or show me how they contribute to a project, I can watch a bunch of videos but I am a hands on person. Thanks.
here is my attempt at contribution: https://github.com/python/cpython/pull/143234
i saw someone saying the GIL is getting removed. is that right? i thought it was just going to be optional. did that change course towards a complete gilectomy recently?
I think it's just optional for now no?
the plan has always been to eventually remove it completely if possible, but it will take a long time.
ohh ok i see, is there an official timeline for it somewhere?
or speculative timeline at least
i don't think there is even that. You can download Python now with the gil removed if you want to try it.
my understanding was that the current builds still contain the gil internally and will re-enable it it specific circumstances, iirc importing extension modules that arent marked for being compatible with free-threading. was/is that not the case any more? or changed from 3.13 to 3.14?
oh, yes, that could be, i don't know the internal details of what happens if the extension module isn't ok with it
You'll see this warning: "<frozen importlib._bootstrap>:491: RuntimeWarning: The global interpreter lock (GIL) has been enabled to load module 'XYZ_MODULE', which has not declared that it can run safely without the GIL. To override this behavior and keep the GIL disabled (at your own risk), run with PYTHON_GIL=0 or -Xgil=0."
lots more detail on this here if you’re curious: https://py-free-threading.github.io/porting-extensions/
speculative at best - hopefully in 3.15, 3.16 or 3.17 we will go back to only having one build. Probably not 3.15 but hopefully the new stable ABI planned for 3.15 will help too.
I did ask about this at the LS this year and it seems the plan is to always keep the GIL optional
Wasn't this one ABI for non-ft and another ABI for ft? Or is it one ABI for both?
Details still being worked out but there will likely be an opaque PyObject ABI that lets you build extensions compatible with both builds on 3.15 and newer
will it be possible to run multiple interpreters in the same process where some have gil enabled and some dont?
and then e.g. relegate the work on non ft compatible extensions to the GIL'd subinterpreters
"Hey guys, how's it going?".
!warn 1212383120192438294 Don't advertise in this server
:incoming_envelope: :ok_hand: applied warning to @ocean crest.
i want to make web pages for my hackathon competition, and i am a beginner!
hey there, anyone knows about vibe coding?
Hello @tall raven , your message is off topic for this channel. Try asking in #python-discussion
sorry, i am new here
Welcome 💚
Be sure to read the description of each channel before using it for the first time. That helps us make sure everyone can have interesting conversations.
from now i will be taking care of it!
Is there a way to check if ThreadPoolExecutor is genuinely running threads in parallel on multiple cores
I'm using free threaded Python but it strongly seems like it's still running one thread at a time, even with -Xgil=0
If the threads are somewhat long running, you can run top and see how much CPU your process utilizes. If it's more than 100%, then it definitely is using multiple cores
(might be different on windows, where the task manager reports the percentage relative to all of your cores, not one core? not sure)
you could be contending on some shared resource even without the GIL, i suppose
if, for instance, all of your threads are trying to append results to the same list or the same dict, they'd all be contending for that list or dict's lock
What does this have to do with the channel's topic?
!pban 1456527216287420570 spam
:incoming_envelope: :ok_hand: applied ban to @wild igloo permanently.
profile your code with samply, if the thread pool executors are spending a lot of time waiting to acquire locks or you have shared increfs or decrefs showing up a lot then you have a scaling problem. https://py-free-threading.github.io/profiling/
helps to have builds with debug symbols
can you share more context? you might be hitting a bug in OSS software FWIW
e.g. last week I was fixing a bunch of issues that were fallout from https://stackoverflow.com/questions/79851420/multithreading-becomes-much-slower-than-multiprocessing-in-free-threaded-python#79851420
Thanks, will try that
You mean the Python Interpreter? I built it from source, so I have the debug symbols on hand yeah
I don't know what context I can share, the script is massive. It has a couple of dependencies, numpy and pybullet
numpy built from source with debug symbols too then
and that SO link might be relevant in that case too
if pybullet isn’t pure python, that too
any context you can share will be helpful, doesn’t matter if the script is long
I could drop the script here, but it's a little over 700 lines long... Or maybe just the relevant parts that run in ThreadPoolExecutor is better?
Yeah that might be better
the whole thing in a gist, along with a profile is best
building numpy from my fast-cache branch might be interesting too
I'll see if I can get the profiler working before I make the script available since profile data along with the script is ideal, thanks!
Question about the current state of no gil; is the single threaded performance practically the same, or is it a bit worse because of needing to use atomics or things like that?
A bit worse. Fine grained locks take more work than coarse grained locks, and all of the optimizations that could make it into the with-gil version do
this came up a bit indirectly - someone posted an article comparing free-threaded python parallelism to multiprocessing. The article ended up making a relatively nuanced recommendation about which to use.
this somewhat struck me as strange initially because in principle there's never a reason for multiprocessing to be faster (or at least, practically never).
but then I kind of realized that if the single threaded performance is still worse, then you could still run into situations like that
Interesting. but should be really small, right? Like 1-2% or something like that?
I remember vaguely that earlier gil-ectomy attempts ran into this issue before and (also vaguely) recall bigger numbers for the single threading penalty being thrown around, and it seemed like there was a lot of opposition to making the change if the single threaded penalty was substantial
More than that, I think. It used to be more like 10%, though it might be less now
ah gotcha, okay, so not totally trivial
so it's very much plausible that on the right kind of workload multiprocessing would still beat multithreading
But also for nuance, single threaded performance is also improving version over version
yeah for sure, I know performance has been improving in general in python. This isn't a concern about python's evolution so much as trying to understand how relevant multiprocessing is likely to be in the future
because obviously single computer multiprocessing is extremely niche in most languages
in principle there's never a reason for multiprocessing to be faster
Hm. That might depend on whether you're using the fork spawn method, too. Fork and CoW is fast, and ends with distinct objects that can be access in parallel without contention - but fork is no longer the default multiprocessing spawn method, and the new default makes it more expensive to share data
AFAIU, CoW from forking a process isn't fine-grained, so as soon as you write stuff, won't it copy everything?
if you're just reading something, then you can also just read from multiple threads without contention. but that might not be the case in python as it depends on the interpreter, not sure.
when i say "in principle" I mean for a reasonably written native program basically I suppose
I believe it's normally by memory page
so it won't copy everything, but something like some MBs of data
in Python that often tends to happen quickly because of refcounts
3.14t is about as fast or a little slower in single-threaded use than 3.14
depends on OS and CPU
!cleanban @frosty wadi some sort of scam
:incoming_envelope: :ok_hand: applied ban to @frosty wadi permanently.
there was one library that claimed to be faster for single-threaded code on 3.14t than 3.14, but I can't find it
I think the latest pyperformance is that it's 10% slower on Linux Intel and 5% slower on macOS AArch64 on pyperformance
On AMD it's supposedly cheaper for atomics so might be less there
This is already a miracle compared to previous attempts though, which had it in the 20% range.
I don’t personally notice any difference when I switch back and forth while 3.13t was noticeably slower
Yeah 5% is practically unnoticeable if you're on macOS
Not to mention, if you're using C extensions heavily, the noticeable perf penalty is even lower
right, by page makes sense. but obviously still no win to be had here over multithreading, for a native program at least.
that is true though if you're using C extensions heavily, many of them release the GIL anyway so you might not have needed free threading to start
that’s true but amdahl’s law means the GIL always ends up as a scaling bottleneck
Hi, I’m Zakaria from Algeria, I want to learn web dev!
Hi Zakaria, this is the wrong channel for that. You should find help at #web-development . Good luck!
I haven’t really followed things closely around the GIL. Will making the garbage collector thread-safe, hence allowing unlocking the GIL, always be, by default, not activated? since having a thread-safe collector is likely to be always be more expensive?
Hopefully my question makes sense (I know that today the answer is yes, I’m wondering what are the “plans” for later versions of Python)
the free-threaded build will likely be the default someday, yeah. performance isn’t really the problem at this point; the last hurdle is that free-threaded python is ABI incompatible with GIL-icious python, so the switch will require an ABI break.
I haven't heard of anyone proposing a garage collector that doesn't require stop-the-world
It's tricky because it would likely require an ABI break and inserting barriers around increfs/decrefs
idk about "always", maybe "more likely to relative to the proportion of work it represents"
But then, there's tons of code where the time spent in python (as opposed to the C extension) is totally negligible to start with (otherwise it wouldn't be used), so the GIL isn't likely to ever be a bottleneck
obviously it just depends on the code
the GIL means there’s always going to be a nonzero amount of time running on only one thread at a time; amdahl’s law says scaling is only perfect when there’s no code running on only one execution context. It’s not theoretical either, read the intro to PEP 703.
this is weird
@thorny elm Not the place for this, please read channel descriptions before posting.
what???
Let's port Shenandoah and/or ZGC to cpython :P
me love some arm architecture
https://github.com/python/cpython/issues/108219
I was looking into physics simulations with cpython and that is when I learned about its limitations, someone said rust python can do parallelism, i only know what i have been learning for my projects.
I am not sure how to contribute to the python project so I came to this discord to learn where I can and try not to add any burdens.
thanks for any mentorship
i know i can go without the GIL now but I wanted to help make it optional and contribute to python since its the main language that i know and want to know more, so i am slithering out of my burrow and presenting myself for reality checks.
maybe I misread what you wrote. if by "ends up" you mean "if you increase the number of cores and threads on a single machine enough then eventually the GIL will be the bottleneck", then yes, that's true.
it will not end up always being the bottleneck in real software because machines only have so many cores. If you spend 1% of your time in python (i.e. a given thread only has the GIL for 1% of the time because 99% of the time its inside a C extension that has released the GIL), and you have 20 cores and threads, then the GIL will not end up being a bottleneck in that situation
I think we agree, but in-practice very few workloads really do only spend 1% of their time in extensions
more often you're lucky to get it down to 10% - that's just my personal experience though
and not all extensions always release the GIL
numpy doesn't, for example, for small arrays
yes, that's fair
At work, we (like a lot of finance shops) use computational graph engines a lot for modelling, and this is exactly one of the selling points of those approaches
the graph can be wired in python, but once you start telling it to execute, it just goes through the entire graph in C++ and never kicks back to python again until its complete
numpy is not great because it kicks back into python a lot, unless you do a single massive matrix operation or something like that all at once
Has the idea of a zero-copy one-way conversion from list to tuple ever come up? With subinterpreters and free-threading it seems like something that would be nice for a lot of use-cases.
like the new take_bytes
I don't see how it could be done in a zero-copy way... tuple and list have different memory layouts
a tuple contains an array of pyobject pointers.
a list contains a pointer to an array of pyobject pointers.
the only way to turn a list into a tuple would either be to
a) make a new type of tuple that contains a pointer to an array of pyobject pointers, plus a flag indicating whether the tuple is using the traditional contains-an-array layout or the new contains-a-pointer-to-an-array layout, or
b) prepend the tuple's fixed fields at the start of the array (which isn't going to be possible, in general - there's not padding before every array we could grow into)
you could leave the tuple alone and have the list point at the tuple's storage
and then somehow (?) make it work with GC and such
that would be for converting from tuple to list, not from list to tuple
but that would indeed work for zero-copy conversions from tuple to list, assuming the ownership problem could be solved
I guess that'd force that either there are no references to the tuple or that the list would need a copy if you mutate it
but anyway, yeah, that's the opposite of what @clear hill wants
If list->tuple was a really important problem, trumping any extra memory usage, the storage of a list could work like this... ```
list: len cap ptr
69 420 |
|
V
tuple: len item0 item1 item2 ... item68 item69 ... item419
* * * * UNINIT UNINIT
basically, point to a mutable tuple
yep, that'd work
would be pretty silly
yep 😄
does it have to be a list? lists really are everywhere, but we could have a speciallist_that_turns_into_box
like an ugly caterpillar eventually turning into a colorful butterfly
In Rust, a similar-ish operation (but involving a copy) would be Vec<T>.into() -> Rc<[T]> or Vec<T>.into() -> Arc<[T]>. When you build a vector up and then put the slice directly behind a reference countered pointer to shrink it and avoid a level of indirection. Not sure what the C++ equivalent is even
Rc<[*mut PyObject]> is prety much a CPython tuple
apparently there were 1015 pycon talk proposals
and there's how many slots?
https://pycon.blogspot.com/2024/02/pycon-us-2024-schedule-launch.html says they accepted 14% of 990 proposals.
std::vector<T> -> std::shared_ptr<T[]>
(it doesn't actually exist, largely because of allocator related subtleties probably, but it's what would be the equivalent)
this seems to suggest it's a thing (or was to become a thing?) https://stackoverflow.com/a/44950554/10295729, is that outdated
std::shared_ptr<T[]> became a thing in 17
"it doesn't actually exist" is in reference to the conversion
there's just no way in C++ currently to get std::vector to "release" its data
so, you cannot implement a conversion like the one above yourself, and AFAIK it doesn't already exist anyway
maybe we want a tuple builder API or something like that
it's much more ergonomic (and faster!) to build a list from a list generator or by appending in-place than to build a tuple in the same way
but I think you answered my question: the memory layout issue makes this complicated
maybe frozenlist, actually
I always thought frozenlist isn't worth doing because tuple is a thing, but maybe having list.to_frozen that returns a frozenlist and takes the storage from the list is worth it because of these memory layout issues
it would be gnarly but I think you could do that in an extension too, although you wouldn't be able to have a nice list method
are we talking about from C? Because _PyTuple_Resize and PyTuple_SetItem should let you append in-place to a tuple exactly as efficiently as appending to a list. You can get amortized constant time appends for both
granted it's less ergonomic than appending to a list because you need to manage the resizes by the growth factor yourself, but it is as efficient, just at the cost of a bit more work
No, I'm talking about Python
let's say someone wants to do something like
import numpy as np
l = [i for i in range(100_000)]
arr = np.array(l)
I'm thinking about what happens if another thread has a reference to l and mutates it while the array is getting created. So we need to add some locking, which we only need to do for mutable data containers.
I want to give people an escape hatch, but right now it's kind of annoying to write Python code that creates tuples for things like this
I want to give people an escape hatch
why? shouldn't uncontended access to the critical section be very close to free?
I'd expect np.array to just do something like
PyObject *make_array(PyObject *lst)
{
Py_BEGIN_CRITICAL_SECTION(lst);
PyObject *ret_array = iterate_elements_and_make_array(lst);
Py_END_CRITICAL_SECTION();
return ret_array;
}
What if multiple threads are trying to simultaneously create an array from the same input data? You could imagine a simulation that takes a seed and a set of initial conditions and then runs that many times in a thread pool.
there's no notion of a reader-writer critical section
so concurrent reads block each other
sure, that's true - but is that actually a common use case? common enough that the obvious workaround of telling people to store a reference to tuple(the_list) and pass that to array instead wouldn't work?
or convert it to an array first
right
I guess converting to a tuple before farming out to a thread pool works, but to me it seems nicer not to copy unnecessarily
fwiw, memcpy is very fast. I wouldn't worry too much about the "zero copy" aspect of what you're asking about. I'd expect that copying the array of PyObject* is pretty cheap, and the slow and expensive part is updating the refcount for each of them, incrementing it to indicate that the tuple now holds a reference to each of those objects, and then decrementing it whenever the list dies
which is to say, if this is worth optimizing, a middle ground would be something that doesn't get rid of the memcpy, but does get rid of the reference count manipulation, by having the newly created tuple steal the references from the list
right - same deal, the frozenlist could steal the references from the original list (a la a C++ move constructor)
that’s what I was getting at with a list.freeze() method
the frozenlist would steal the storage and refences and the original list would be empty
copying the big-ish contiguous array should be pretty cheap, updating all of those refcounts all over the place with no locality of reference much less so
you could only do it for a uniquely referenced list
I think
but that makes sense for the builder pattern
nah, you could do it for any list - it's just lock the list's critical section, memcpy the list's array into the new tuple or frozenlist, set the list's size to 0, and then unlock the list
(and notably, numpy could provide a method that does this! it wouldn't necessarily need to be in cpython)
I think this? ```c
PyObject *list_to_tuple(PyObject *lst)
{
PyObject *ret = 0;
Py_BEGIN_CRITICAL_SECTION(lst);
Py_ssize_t size = PyList_Size(lst);
if (size >= 0) { // otherwise an exception has been set
ret = PyTuple_New(PyList_GET_SIZE(lst));
if (ret) { // otherwise an exception has been set
for (Py_ssize_t i = 0; i < size; ++i) {
PyTuple_SET_ITEM(ret, i, PyList_GET_ITEM(lst, i));
}
Py_SET_SIZE((PyVarObject*)lst, 0);
}
}
Py_END_CRITICAL_SECTION();
return ret;
}
Pretty sure that accomplishes the goal and uses only public APIs. The one part of that that's tricky is that I don't think it's documented that PyListObject is a PyVarObject... it is and always has been, but in theory that could change, and this code would break if it did
it's probably frowned upon to poke in the list's internal state like this, heh
Same but pandas does weird stuff and requires it
That may have gotten rid of the frozen list idr but tuples have specific semantics
want to help me write a PyUnicode subtype so StringDType can have a proper scalar? It’s kind of a nightmare.
it requires deep knowledge of how python stores string data
I’m kidding about that ask but also sort of not kidding 🙂
I know very little about numpy but I do know a fair bit about how CPython stores strings (and how that changes from version to version!) - I could try to help
could start here https://github.com/numpy/numpy/pull/28196
right now StringDType just returns Python strings when you access a scalar, which technically breaks numpy’s type system
^ this PR did it by implementing a numpy scalar that isn’t a PyUnicode subtype but it really needs to be a PyUnicode subtype otherwise the transition period is annoying for users
optimally, it would be a subclass of both str and np.generic, which is the base class for all the numpy scalar types
hm, so, it looks like you can subclass from both str and np.generic, which sort of surprises me - but given that you can, what's the problem?
what do you need to add to str in your subclass to make it a valid np.generic and to make it work with numpy?
and why do you need to know the layout of the data inside the str? I don't think I follow that yet...
I can see why you'd need to get at the UTF-8 bytes inside the str, but that's just a call to PyUnicode_AsUTF8AndSize
Doesn’t that make a copy?
I’m having trouble remembering all the details about why I needed to know about the internal representation, it’s been a while since I looked closely
if we don’t then great
for any str, you can get at a UTF-8 representation of that str with PyUnicode_AsUTF8AndSize, and the returned pointer has the same lifetime as the str itself
and that'll work just fine for subclasses of str
and hopefully sometime in the future Python can store the data internally as UTF-8 🙂
it does, sometimes
ascii strings are internally stored as UTF-8
since ascii is a subset of UTF-8, the one buffer is used for both the array-of-codepoints representation and the UTF8-byte-array representation
and most strings in most programs are ascii, so most of the time there's only one buffer
basically: a str internally has an array of unicode codepoint values. If every codepoint in the string is less than 256 that's an array of 1 byte numbers, otherwise if every codepoint in the string is less than 65536 it's an array of 2 byte numbers, otherwise it's an array of 4 byte numbers.
a str also has an array of utf-8 bytes. If every codepoint in the string is less than 128 that's a pointer to the array of codepoints, because ascii is a subset of UTF-8. Otherwise, it's a pointer to a separate heap allocated array owned by the str and lazily populated the first time it's needed
wasn't there talk of moving to plain utf-8 with an array of character indices
Now I just need to figure out how to do multiple inheritance in C…
Inheritance is legitimized code obfuscation
I don't think you can do it with a static type, I think you need to make a heap type, with https://docs.python.org/3/c-api/type.html#c.PyType_FromSpecWithBases
yeah, one of the big drawbacks of static types is that you cannot use multiple inheritance with them
I’d like to see a proper static type deprecation someday (with the exception of builtin types) but way too many people refuse to migrate to heap types right now
hi guys does anyone know how to open a python file in visual studio? i dont quite understand
This is the wrong channel. Ask in #python-discussion or open a help thread plz. #❓|how-to-get-help
how can I propose a change to the struct module to help with typing?
I want the API for struct unpack to add a param for the return type.
for example:
foo1, foo2, foo3 = struct.unpack("!iHi", bin, type=tuple[int. int, int]) # foo1, foo2, foo3 are automatically inferred as ints
you can suggest it here: https://discuss.python.org/c/ideas/6
though I think you can achieve the desired behavior without a language change.
you can use casting. but thats really tedious and noisy. idk how else you'd do it.
I also don't know where to ask. a pep feels overkill for this, but I didnt see another channel here to ask
this only needs a change in type checkers. The return type is implied by the format string, they could already do this if they cared to (assuming the format string is a literal)
and it's better to infer the return type from the format string than to have the user supply it separately, because otherwise they could go out of sync. You want the type checker to catch it if you update the format string and don't update the unpacking, for instance
this proposal would be bad for the same reason as having max(values, type=int) would be bad
Your syntax is:
foo1, foo2, foo3 = struct.unpack("!iHi", bin, type=tuple[int. int, int])
and casting would be:
foo1, foo2, foo3 = cast(tuple[int, int, int], struct.unpack("!iHi", bin))
wouldn't it? They don't seem that different.
I suppose you're right. that makes sense
!clban 1450903571691339871 Only here to advertise
:incoming_envelope: :ok_hand: applied ban to @split parcel permanently.
There's so many of them
This came up in a help post, but dataclasses currently has some unexpected behaviour around how it can 'inherit' default values:
from dataclasses import dataclass
class A:
a = 42
@dataclass
class B(A):
a: int
print(B()) # B(a=42)
Currently all of mypy, pyright, ty and pyrefly think this will error. I think it probably should be an error but I'm not sure if this would be considered a bug or a 'feature' - it's not documented and no tests rely on it.
the combination of dataclasses and inheritance is almost always puzzling to me
This kind of just falls through because dataclasses is using getattr(cls, a_name, MISSING) to get the default values, so it'll pick up things from any parent class
Notably, attrs doesn't do this
This means it also picks up properties for instance
!e 🥴
from dataclasses import dataclass
class A:
def a(self): pass
@dataclass
class B(A):
a: int
print(B())
:white_check_mark: Your 3.14 eval job has completed with return code 0.
B(a=<function A.a at 0x7f6ac3f2fed0>)
You can get the same thing if you use a property if you make the dataclass slotted
If you don't make it slotted you can't create an instance because the property has no setter, if you do make it slotted the slot has replaced the property so it "works"
The type checkers all give some variant of: main.py:10: error: Missing positional argument "a" in call to "B" [call-arg] (mypy for example)
Kotlin largely bans it for a reason
Though I will say it's definitely useful at times
If you do combine dataclasses and inheritance you should probably always turn off equality and turn on kw_only
there's also __replace__ which broke type checking on generic data classes
I guess it's just the question of is this a bug to be fixed, or if there's an argument to be had over changing the behaviour or documenting it
I'm not sure if there's really anything to change. the code is a really bad idea but it's not wrong per se, and dataclasses have always been very "loose" with inheritance
Well this specific instance is clearly undesirable, but it's more to illustrate what the problem is when you have something more like the original example where someone was trying to override a property with a field
!e
from dataclasses import dataclass
@dataclass
class B:
@property
def p(self) -> int:
...
@dataclass
class C(B):
p: int
q: int
Ah the actual full error is too long and got truncated
You get this: TypeError: non-default argument 'q' follows default argument
It also leads to (further) inconsistent behaviour between slotted and unslotted classes
!e
from dataclasses import dataclass
for slotted in [True, False]:
@dataclass(slots=slotted)
class A:
answer: int = 42
@dataclass(slots=slotted)
class B(A):
answer: int
print(f"slots={slotted}")
try:
print(B())
except TypeError as e:
print(e)
print()
:white_check_mark: Your 3.14 eval job has completed with return code 0.
001 | slots=True
002 | B.__init__() missing 1 required positional argument: 'answer'
003 |
004 | slots=False
005 | B(answer=42)
I think this is also undesirable. Like, even if this worked exactly the way someone wanted, I wouldn't write it this way - just my 0.02
I mostly want it to behave the way all of the type checkers already think it behaves.
so, lets say you actually have a property in the base class, and its computed so there's no backing field
in the derived dataclass I would have a field with a different name, and use that to back the property
My expectation is just that this should error, but it should error when it tries to set self.p because the property is there and has no setter.
It shouldn't inconsistently appear to inherit default values based on whether there are slots
it’s probably not changeable, it looks enough like a feature that people are probably relying on it somewhere
I wouldn't propose just snap changing the behaviour - it would need to be a warning first if it was going to be changed.
I'm aware there are people who have relied on it, or tried to. See this thread about adding it to dataclass_transform - https://discuss.python.org/t/dataclass-transform-add-inherit-defaults-option/89531
However, they're assuming it's "dataclasses inherits defaults" where it's actually "dataclasses will assume any value present as a class attribute is intended as the default, unless it's a slot".
Interestingly Pydantic's "dataclasses" used to behave like Python's ones in this respect, but no longer do.
(No they still do I just missed something when testing)
I assume they're just adding things onto regular dataclasses? BaseModel doesn't behave like dataclasses (neither does attrs).
can anyone explain me what is use of if name == "main" in code
!ifmain
This is a statement that is only true if the module (your source code) it appears in is being run directly, as opposed to being imported into another module. When you run your module, the __name__ special variable is automatically set to the string '__main__'. Conversely, when you import that same module into a different one, and run that, __name__ is instead set to the filename of your module minus the .py extension.
Example
# foo.py
print('spam')
if __name__ == '__main__':
print('eggs')
If you run the above module foo.py directly, both 'spam'and 'eggs' will be printed. Now consider this next example:
# bar.py
import foo
If you run this module named bar.py, it will execute the code in foo.py. First it will print 'spam', and then the if statement will fail, because __name__ will now be the string 'foo'.
Why would I do this?
- Your module is a library, but also has a special case where it can be run directly
- Your module is a library and you want to safeguard it against people running it directly (like what
pipdoes) - Your module is the main program, but has unit tests and the testing framework works by importing your module, and you want to avoid having your main code run during the test
not really on topic for this channel though, use #python-discussion or #1035199133436354600 the next time
See above, this is not on-topic for this channel.
Good evening; I am VERY new to programing (no edu. and failing at being self taught in a program that I MUST work within (REDCap)) does anyone know REDCap and is willing to answer a few specific questions? not even sure if this place is the right one to ask. Couldn't find a REDCap specific place to ask. Also, also I looked up what language REDCap uses basically I am told all the languages.
Any how, I am looking how to turn a value into a NUL Value.
- Can that be done
- If so how
- If wrong forum, I am sorry and am willing to accept any guidance
says I am new here and need to say "hi" so hi and stuff
if([item_6]=0,somethingsomethingnulvalue)
Hello all. I have a CPython PR up to add custom header support to the stdlib HTTP module. It's undergone several rounds of review - I've made many requested changes and the PR is up for re-review, but there's been no activity since before the Holiday season. I realize things are perhaps a little slow right now, but are there any core devs that are able to give it a look? https://github.com/python/cpython/pull/135057 . Thanks for the help!
one of the offtopic channels would be more appropriate
This function I just stumbled across in NumPy's sphinx configuration is deeply cursed: https://github.com/numpy/numpy/blob/fba740c0a0f7ba9ff70be71a506bd6dd4a7efa65/doc/source/conf.py#L20-L62
amazingly, this works on the GIL-enabled build somehow
I ran into it today because on the free-threaded build it causes a bus error
tp_new for a static type is in the data segment of the binary, isn't it?
It is, yes - but the data section is mapped into read/write memory. It's the text section that's read only
tp_new and ob_refcnt are both in the data section
I wonder why it bus errors on 3.14t then
ohhh I know
lol its PyObject layout is wrong
Ah, yeah. I should've said, I didn't realize you didn't know that
I wasn’t paying attention to that bit
but of course doing this via ctypes requires a model for PyObject
!warn 759279650828058708 We have filters for a reason. Do not post surveys here.
:incoming_envelope: :ok_hand: applied warning to @fallen tiger.
I need a person who can apply for me.
Apply what to what? And what does this have to do with #internals-and-peps ?
Test
internet connection
It worked! But please do tests and stuff in #bot-commands
Sorry, I'm just really excited! Getting international internet in Iran is so tough, I couldn't believe my VPN actually worked
stay safe
it would be really useful to hook into changes to sys.path....
What's the use case?
coverage.py tries to avoid measuring third-party code because it slows things down needlessly. But finding third-party is a bit ad-hoc, and depends on what is in sys.path. But sys.path is changed by test runners.
I think I have it all fixed now, maybe?
i keep a copy of sys.path, and when a decision has to be made, if sys.path has changed, i re-consider the world to decide where third-party code is.
are you worried about changes to the sys.path attribute itself, or changes to the list? if it's the latter, I think you can make a list type that tracks modifications and set it to sys.path
if it's the former, I think it's still possible, just more difficult: in C, you can call PyDict_AddWatcher/PyDict_Watch on sys.__dict__ and then watch modifications to path
I thought about adding a proxy object, but this is simpler, and not expensive, so I think i have what I need.
Hello
hi
Hi
how do you do ? where are you from
From Ethiopia 🇪🇹
Ok
so are you programmer or what ...?
Hello, this channel is about the internals of Python, not an off-topic channel.
Python
what do you mean off-topic , i'm just asking is he programmer
are you beginner or intermediate level
Use #python-discussion please for generally programming discussion.
!ot
Please read our off-topic etiquette before participating in conversations.
Or off topic for off topic.
okay sorry
I noticed that the https://www.python.org/downloads/ page got updated! nice
looks so much better now
the logos could use some love
All thanks to Hugo!
not just look, but information
Is it a virus
Hello
Hello
@civic acorn Your message was removed for violating rules 6 and 9 regarding advertising and paid work.
If python was be documented inside its source code, it would be much easier for me to explore the 1st party libraries. But today, showing the signature with lsp.hover or looking at the definition with lsp.goto_definition yields next to nothing. Why?
hover on argparse.ArgumentParser().parse_args() method
goto_definition takes me to /usr/lib/python3.13/argparse.py, but there is no docstring:
It would greatly increase our maintenance burden if we had to maintain both online docs and doctorings (to the same extent).
I think that's just a pyright thing, and the stdlib stubs don't include docs
based-pyright can merge them
(I'm not using pyright but python-lsp-server)
Some modules do IIRC, although it’s more annoying to do with argument clinic.
The online doc is great, it's just a bummer I work on an air-gapped machine which makes accessing those much more cumbersome than the local soruce code.
You can download copies, although we have stopped updating them as of recent. (Edit: PDF specifically)
There is also pydoc3 -b, but that seems to show the same content as the source code does
For example only the signature:
add_mutually_exclusive_group(self, **kwargs)
It depends on the docstrings.
Could I build the real documentation as a local collection of html pages?
right: that method has no docstring.
yes
You can also download it here: https://docs.python.org/3/download.html
Our guide for building: https://devguide.python.org/documentation/start-documenting/#building-the-documentation
This is what I mean. 🙂
Is there any world where the awesome online doc content and the method docstring could be delivered from the same source of information. (To avoid the duplicated work @stan mentioned)
Ah ok, html can be downloaded but not pdf?
the core team is historically and philosophically opposed to using docstrings for end-user docs.
Old PDF archives are still available.
Also note that we can’t copy the online docs source directly, it uses rST markup.
Even as part of it?
I can see that the docstring and documentation perhaps should be aimed at different audiences or contain different information. Or atleast would cause the author of the doc/docstring to take different considerations and actions
Is rST in this case ReStructuredText or something else?
ReStructured Text, yes.
i'm not sure what you want, "as part of it"?
LIke, have a basic docstring, and include that as part of the html documentation. But I guess that would just split the documentation text into 2 parts and actually cause a lot of maintenance burden anyway
Seems like I should download the html docs and add a binding in my editor to open that.
that wouldn't solve your problem, right? You have the docstrings already.
THere are no docstrings for many functions, I was hoping they are documented in the html docs ( I haven't checked add_mutually_exclusive_group though)
Implementing a thing like that would be a very very big task, for little practical benefit.
right. So adding the docstrings to the HTML wouldn't help.
I was just hoping such a thing would cause *some part" of the html doc to move into the docstring, so that it appears in both places.
But anyway, as @deep dirge says, I don't think this is the way to go.
I ended up doing a mapping to directly search the python documentation for the word under my cursor:
autocmd FileType python nnoremap <expr> <leader>D '<cmd>Open https://docs.python.org/3/search.html?q='.expand('<cword>').'<cr>'
Do you know why this is/have a link discussing this?
i don't have a link. The idea is that docstrings are for the developers maintaining the stdlib. Full user docs would be too bulky and get in the way of the code.
I mean, most editors can minimize those comments
personally I would love documentation generated from docstrings but I won't fight that battle right now
wise
!res
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
But your message is almost off topic for this channel
Ask in #python-discussion
(and BTW, don't bother with the CPython internals book)
why not?
since the suggestion was a guess at what they were interested in, i was also willing to take a guess. I think perhaps 1% of new learners will be interested in the internals.
esp after "basic python"
ah. I thought maybe it was a general rebuke of the book.
no, i'm sure it's a fine book, though it's necessarily stuck at 3.9.
Have internals changed a lot since 3.9?
some important files split and some important processes got rennovated so people from the book will have a harder time looking for certain things
other than that it's probably alright
i haven't read the book though
I read the book. The internals are pretty different now. The interpreter got revamped due to PEP 659, threading due to free threading, GC due to incremental GC and FT GC,
I don't recommend reading the book if you want to dive into modern cpy dev, the devguide and internaldocs are more up to date. However, it is a good snapshot of 3.9 at that point of time.
I read the book a few years before I got into cpython development. it's great for piquing interest, but yeah, wouldn't recommend for actually learning about the codebase.
imagine if that went on the back cover of the next edition as a review.
I wonder what percentage of readers become core devs. probably less than 1% or something, right? unless the book has less sells than I thought
Must be way less
What are thoughts on allowing set literals in type expressions to allow writing Literal[X, Y] as {X, Y}?
Basically all of the same ways that the set union operator, |, can be used in annotating types, I'd like for set literals to do the same.
So instead of a verbose
from typing import Literal
def get_fruit(fruit: Literal['apple', 'banana', 'cherry']) -> str:
return fruit
you can have
from typing import Literal
def get_fruit(fruit: {'apple', 'banana', 'cherry'}) -> str:
return fruit
My main motivation behind this is how verbose annotating instances of generics or aliasing them is.
ColorName = Literal["red", "yellow", "green", "cyan", "blue", "magenta"]
class Color[C: ColorName]:
def __init__(self, name: C) -> None: ...
Red = Color[{"red"}] # instead of Color[Literal["Red"]]
Cmy = Color[{"cyan", "magenta", "yellow"}]
Only caveat is if i'm aliasing a literal, I still need to use Literal in the assignment. Otherwise, I'd be assigning a set[str] and not aliasing a type.
But in any instance of type expression, it should act as a 1 for 1 stand-in.
I'm not a huge fan. Currently for defined types a | b is Union[a, b]at runtime. This can't be the case for {'apple', 'banana', 'cherry'} which is a valid set at runtime.
yeah, i think it couldn't and shouldn't work for assignment because of this. but in any instance of type expression
foo: {1, 2, 3} = 1
bar(baz: {'a', 'b'}) -> {'c'}: ...
Spam = Eggs[{True}]
i'm for it
I don't think just saying it's a type expression gets you around that with how things work at runtime.
oh sorry you're proposing Literal instead of it meaning a set
which is also another argument against: it might not be obvious what it means
Yeah there's that too, initially I thought sets.
My current annoyance with annotations is that STRING annotations currently end up evaluating the annotations due to needing to check VALUE_WITH_FAKE_GLOBALS in a non-fake globals environment and I kinda wish they didn't
I'm honestly fine with one over the other. I think to add to this, tuple literals. I know people are also looking for
def coords(pos: Pos) -> (x, y):
return pos.x, pos.y
Tuples are also in that why can't we page
This would also just be generally more useful to more people since way less people use Literal when annotating in my experience
oh i didn't scroll lmfao
ye i'm so find with annoyingly having to deal with Literal if this gets in instead. way more useful imo
With literal a more common syntax fwiw is just union operators on the string/int literals themselves
yeah, but it's unfortunately already taken by union. this would be kinda a mathy correctness alternative
that's also there https://jellezijlstra.github.io/why-cant-we#bare-literals 🙂
and the | operator has meaning with int
though of course in all cases, there's a tradeoff (broadly, it makes runtime introspection harder)
we could decide we can live what that tradeoff or find another way to support runtime introspection, like Imogen's AST introspection
If only type hints were always strings or type context existed
I think I've soured on AST runtime transformations as I can't see it not splitting things and being worse for performance
At least if they're being transformed at runtime
!clban 1341358376680034376 some kind of claude AI gift code scam
:incoming_envelope: :ok_hand: applied ban to @onyx oasis permanently.
Hi
!warn 1293650689796472892 Don't post rule-5-breaking videos. Also don't post off-topic stuff in non-offtopic channels.
:incoming_envelope: :ok_hand: applied warning to @empty sundial.
Good luck to everyone who might hear about their pycon proposal today
https://discuss.python.org/t/pep-814-add-frozendict-built-in-type/104854/121
After careful deliberation, the Python Steering Council is pleased to accept PEP 814 – Add frozendict built-in type.
A bit disappointed that we won't get a HAMT, but nice nonetheless
what's HAMT?
Hash Array Mapped Trie, see https://peps.python.org/pep-0603/
It's already part of Python as part of the contextvars implementation.
To shorty outline why I'm sad: HAMTs allow you to efficiently derive changed values by using structural sharing. So if you have py fruits1 = ham({"apple": 0, "banana": 1, "cherry": 2}) fruits2 = fruits1.replace("banana", 69) the replace operation is O(log N) instead of O(N)
This kind of structure is present in many functional languages. Having it in the stdlib would make functional programming practical for more kinds of things in Python
I mean it's available if you try hard enough right? I can't remember the module name though
It's not exposed to Python land, but yeah, it is used by contextvars.
don't let the boss man tell you not to do from _testinternalcapi import hamt
Yeah that
I'm not gonna say what I searched on the internet but it didn't turn up that
Or anything python related...
much docs, wow
>>> from _testinternalcapi import hamt
>>> help(hamt)
Help on built-in function hamt in module _testinternalcapi:
hamt()
>>>
duckduckgo does sometimes give mind-bogglingly inappropriate results if you don't turn on "safe search", if that's what you mean
ooh
hamt
Reminded me that I had some help pages on classes that looked a bit like that due to weirdness with inspect.signature on some __init__ descriptors that I don't quite understand.
If you put a descriptor in place of __init__, inspect.signature will call .__get__(cls, type) and not .__get__(None, cls). I never figured out if this was a bug or I didn't understand something 🤔.
I get that it removes self for the purpose of getting a signature, but the function you'd get by doing that makes no sense.
i think posting about this on discuss.python.org would be a little more visible
Fun game, run CPython from this branch, use ./python -X lazy_imports=none -c "import module_name" - try to find a module that successfully imports. https://github.com/python/cpython/pull/144894
I do actually like finally having lazy imports - it looks like it might finally let us have dataclasses with reasonable import performance.
I also see frozendict just got merged
don't you use @dataclasses.dataclass as a decorator at the top-level of your module? How would a lazy import help there?
I mean within dataclasses
As in, lazy imports making import dataclasses faster - https://github.com/python/cpython/pull/144387
oh, i see.
But there is also the case for libraries that want to do something specific if they encounter a dataclass but don't construct one themselves
They could potentially lazy import and use is_dataclass
Might be a case for making the annotationlib import lazy inside dataclasses for that too
There was already the case that configparser was going to use dataclasses, but using them tripled the import time so it didn't in the end.
What's the plan in general for lazy imports in the stdlib?
Note that Pablo has closed this PR.
Yes, I see that now. I hadn't expected it to be merged but it was interesting just how much of the stdlib ends up in a circular import of some kind if you do so.
It would definitely be nice to remove some of the ugly in-line imports but we do need to be careful not to break the eager mode completely.
I mean that the PR to implement it was merged to main - https://github.com/python/cpython/pull/144757
Understood, was just linking to the pep approval, I'm excited for it
I guess I'm condemned to maintain this hack until the day I die 🥲 https://github.com/python/cpython/blob/6ef2578f209f230f26c41683bd8eab6ee05e013c/Lib/typing.py#L1931-L1936
Lib/typing.py lines 1931 to 1936
@functools.cache
def _lazy_load_getattr_static():
# Import getattr_static lazily so as not to slow down the import of typing.py
# Cache the result so we don't slow down _ProtocolMeta.__instancecheck__ unnecessarily
from inspect import getattr_static
return getattr_static```
Yeah I'd seen that one, typing has about 3 different hacks for lazy imports iirc
I'm not actually sure how eager mode is implemented, ideally we could have a form of import that is always lazy but slightly more awkward to use?
Huh, __lazy_import__ seems to ignore the flag - maybe that could be a work-around?
lazy import annotationlib
typing = __lazy_import__("typing")
print(globals()["annotationlib"])
print(globals()["typing"])
print(annotationlib)
print(typing)
With -Xlazy_imports=none
<module 'annotationlib' from '/home/ducksual/src/cpython/Lib/annotationlib.py'>
<lazy_import 'typing'>
<module 'annotationlib' from '/home/ducksual/src/cpython/Lib/annotationlib.py'>
<module 'typing' from '/home/ducksual/src/cpython/Lib/typing.py'>
yeah... this is the most elaborate hack because it's the most performance-sensitive one. So you really don't want to just have a local from inspect import getattr_static statement -- even with the various caches the import system has internally, it slows down _ProtocolMeta.__instancecheck__ noticeably
I don't have the bandwidth to put up a PR myself but I'd happily review one!
I'd like to know it's a intended 'feature' first - and check it always holds
Or maybe it's better to just make the PR and someone else can decide that
true, usually way easier to get a decision once there's code to look at
do lazy imports have any C hooks? Might be neat to expose in PyO3. I recently added preliminary 3.15 support.
perhaps I should have paid more attention to the PEP 810 discussion -- what's the use of -X lazy_imports=none? supporting that seems like it would kill a lot of use cases for lazy import
I wouldn't be surprised if libraries ignored support for it
it's basically a kill switch to force eager imports, mostly for debugging
yeah, but why?
mostly for debugging side effects that break when deferred, see any other use?
no, I don't, but I don't see how it'd be helpful in debugging side effects. it could basically tell you "your lazy imports break something, but I have no idea where or why!"
anyways, I'm sure people already discussed this on the forum, I won't bother
I think the use case is supposed to be for cases where consistent latency is more important, you'd rather pay the import cost and know the time for an operation will be (more) consistent?
I'd expect some libraries will ignore it. But if the stdlib can't import with it then there's no point in having it in the first place.
I'm not sure what the actual practical use case is, but the flag was included in the PEP so it should probably be at least somewhat usable.
has anyone gotten a notification about their pycon talk proposal today?
good luck, hope you get in, usually takes a bit to get through all of them
oh, I see. That, I don't know.
fair enough, not sure either but usually someone on the tracker knows for sure what's up
wow
Security for one. pip has had security issues over deferred imports.
Realistically, we'd set the global sys(?) attribute to disable lazy imports, but effectively, we can't really use lazy imports.
couldn’t pip just not use lazy imports?
oh, is the problem in your deps?
Yes
We can't control what our vendored dependencies do. Actually, I'd imagine some of our dependencies would want to use lazy imports, but 'cause pip vendors them (and has an unique security perimeter), they can't be used.
what security problem does lazy import cause that import in a function body doesn’t?
pip installs packages, what if a lazy import executes after we install a malicious package that overwrites one of pip's modules?
and for the difference between lazy imports and imports in functions, both are unsafe for us
ugh, this seems like something that could have been solved with an audit event rather than a global filter system
wait, how does pip prevent its deps from importing in functions?
IIRC this is an issue for pip because it's installing into the same folder it's running from so one pip install <malicious package> can both replace a part of pip (or vendored dependency) and subsequently trigger a code branch that imports the malicious replacement?
right, but ISTM that could have been solved by installing an audit event that prevents lazy import so pip (or its deps) don’t do it by accident
I think it relates to this issue - https://github.com/pypa/pip/issues/13079
I had zero involvement in the PEP and I also have almost zero involvement with CPython. I can't tell you why the global filter mechanism was chosen over audit events, but we do need some form of it for pip.
We don't right now, but I'd be open to linting for those and patching those out.
yeah, that seems like a CVE waiting to happen. it’d be nice if pip had a different way of avoiding the problem, like preserving sys.path or something like that
I don't really understand this.. 1. Can't packages anyways execute arbitrary code when being installed? (I guess not when installing wheels specifically?) 2. Doesn't this mean that using -X lazy_imports=none prevents wheels from executing code at install, but not from executing code when pip is run a second time?
Ah, I should just read the ticket:
For example,
pip install --only-binary :all:could be used in a trusted context, before using the installed packages in an untrusted context (e.g. different stages in a build pipeline).
(It's not too late to push for a change to be made to the PEP, though it'll have to go through the SC)
The now closed PR #internals-and-peps message I brought up does demonstrate that if you patch out all of the lazy imports the stdlib at least will break. Not sure what the case is for pip's other dependencies.
I remembered in an earlier stage testing pip with the equivalent of -X lazy_imports=all and it did give a reasonable start time improvement so it's kind of sad to not be able to get that due to this issue.
That said -X lazy_imports=all also appears to be slightly broken at the moment
yeah, i’m considering bringing this up on DPO. i’m concerned that either:
- people ignore
-X lazy_imports, so the flag never really works - people view the flag as too annoying to reason about, so they continue to use lazy imports in function bodies
Under the 'all' mode it seems some __getattr__ lazy imports don't always work? ./python -X lazy_imports=all -c "from typing import Match; print(Match)"
Ah, no I see - it's broken with normal lazy imports too.
The lazy from import puts a lazy_import object in the dict, which blocks the getattr from being called. I created an issue for it.
python -c "import typing; lazy from typing import Match; print(typing.Match)"
yeah that makes sense, getattr won't fire if the key exists
I'm just surprised it creates a lazy object in typing at all.
I would have expected it just to create the object in __main__ and attempting to use the object would then try to do the from import.
./python -c "import typing; print(typing.__dict__.get('Match')); lazy from typing import Match; print(typing.__dict__['Match'])"
None
<lazy_import 'typing.Match'>
I'm using typing.Match as that was the import that failed when I tried to pip list with forced lazy imports.
On CPython’s main branch, PyMem_Raw apis have been updated to use mimalloc instead of the system allocator on free-threaded builds. For a ufunc-intensive benchmark of NumPy that does a lot of allocations for temporary arrays on worker threads, we see substantially improved scaling. This will benefit all projects that use the PyMem_Raw apis, which I find pretty neat.
nice, mimalloc for free-threaded builds is sick for scaling
also this all came as a result of a human asking a question on stackoverflow where they didn’t understand why multiprocessing beats multithreading
stackoverflow: it’s still a thing!
Spicy security design take: 🌶️ pip should ideally not run within the Python environment it is installing software into. it's a tool, it needs strict isolation.
How would that work for people not using virtual environments?
Conda functions like that. Conda is only supposed to be installed into the base environment.
I don't use venv or UV so idk how those work.
Wouldn't this require a separate Python installation, just to run pip, if not using virtual envs?
I have been thinking about this problem a bit, and I've been thinking about a design where python runs in venvs by default. I have some experimenting and noodling to do on the design though
I think it (pip not running in the environment it is installing into) is kind of a mildly spicy take, not quite spicy enough to make you reach for a glass of milk.
Without breaking established workflows I'm not sure how painful it would be though.
I have a TUI tool for managing runtimes and venvs, part of the design is essentially that you should not be using your python runtimes outside of a venv. You can open a REPL in a runtime, but if you want a shell with a default Python you can install things in it has to be a venv.
I mean, I don't know what to tell you. I can tell you that there are vague plans to make pip to get better at installing to external environments, but who knows when that will happen. We have a --python option, but its implementation is quite simplistic.
I'm trying to get some actual resources to work on pip, but I don't need to tell anyone here how that story goes, of course.
Yeah maintainer resources are always a problem, I figured that was why just removing lazy imports was the resolution to the previous issue
FWIW, I did try to experiment with eager imports but deferred execution (where the finder/loader finds and reads the source code at import-time, but the module isn't executed until use), but I didn't observe any nontrivial improvements in startup time.
So I'm not really convinced even if we had the resources to support lazy imports, it'd be worth it.
I did run some pip commands with an earlier version of the lazy imports and remember there being some significant improvement, but the current main is broken if you try to force lazy imports for pip.
significant improvement just running pip or an actual pip command?
pip list I think
hm, interesting
I saw a 10% improvement with bare pip but ~almost zero with pip install when I experimented with this.
Install I expect to be smaller, but that's also where I expect you'd disable lazy imports
At least without getting into reimplementing how pip works
Honestly, the performance of pip show/list isn't really a concern.
idk, it's just annoyingly unnecessarily slower as it's spending a fair chunk of its time importing modules it will never use
But that's also true of all the other packaging tools written in python
And the bigger improvements will come from faster installation metadata reading, which is blocked on either JSON distribution metadata (needs a PEP) or someone contributing a faster/C accelerated email parser in the stdlib.
I'd like to be able to give numbers but - https://github.com/python/cpython/issues/144957
I mean, I get that people generally faster tools, but out of all pip's performance woes, this is not something we're really worried about.
There is something to be said about keeping lazy imports for non-destructive commands (and auto-completion), but we'll cross that bridge when 3.15 is near release.
It's fair enough if you don't have the maintainer bandwidth, pip isn't even the worst among the tools.
I just don't think the lazy_imports=none flag is the best or even a good tool to deal with the lazy imports issue.
well, we need something
We've dealt with the low-hanging fruit for startup time, but yeah, there are more gains to be had. I was once looking at the time spent compiling regexes at import-time, but nothing came out of it (the PRs I'd need to file looked hard to justify...)
I've mentioned this in the DPO thread, but I think what you need right now would just be a way to force all existing lazy imports to resolve. Which PEP-810 lazy imports make easier than the current condition.
sure
I have no opinion on PEP 810 TBH.
I only got involved in this thread in the first place because people were curious to why -X lazy_import is even a thing.
IE, before you do the install process, you just loop over every module dict, and .resolve() every lazy import type.
(and so on for all of the new modules that have been imported)
ok!
I'm really the wrong person to talk about this since I really don't care (I'm sure we'll care once we get around to supporting 3.15)
but I have zero desire to be involved in a standards discussion, hah
Without PEP-810 imports, the imports can be lurking in functions and classes. If everyone uses PEP-810 imports then they'll all be at module level (PEP-810 imports are only allowed at module level)
I don't think people are going to be removing function imports any time soon
pip still supports 3.9+ FWIW
Oh yeah, but the idea is that it makes things better rather than worse.
New lazy imports are easier to handle than old ones
Might take a few years though
i only vaguely had in mind something like the installed "pip" and what "python -m pip" winds up actually invoking being a pip run from a dedicated venv, passing along all of the relevant python sys config and install/env info paths for the to be installed in python. but i predict plumbing all of that could be... complicated. (i haven't looked inside the pip code box in eons).
uv happens to sidestep this given it isn't implemented in python so solved plumbing this information out of necessity.
In my mind this is similar to "native compilation" vs "cross compilation" toolchain wise. Projects naturally start out presuming native and encounter pain later when they need to move to cross compilation because the presumption of local runtime & tools & env matching execution runtime & tools & env is baked in implicitly all over the place.
I realize this is all about maintainer bandwidth in the end.
None of this is particularly hard (as uv has shown), but A) developing that plumbing, and this is the real kicker B) managing that transition is impossible with our resources
uv was much better designed from scratch
Helps to not have all the existing workflows to break
if emma and kirill’s PEP gets accepted it’ll be very likely that email gets rewritten in rust
Hindsight lends to that
What's the significance of doing that
faster
It's that slow it requires a rewrite? Almost all stdlib could be made faster if an extension. Why is email so special
email is complicated and prone to security problems, so rewriting it in C would generally be too much of a maintenance burden. rust is, hopefully, easier to write than C
Huh. I've never looked at the module, what makes it hard to have a fast pure Python implementation?
Almost all of the metadata used for Python packaging is stored in email message format. To scan for all of the distributions installed, pip has to read METADATA from each <name>-<version>.dist-info directory in site-packages. This can be quite slow if there are enough files.
It's probably not that slow, but the thing is a lot of basic packaging operations involve parsing these metadata files so the overhead adds up.
@slate spire this seems like an interesting point, if true
That's not overly surprising, assuming they're sdists they can execute arbitrary code and by default pip will attempt to build from sdist.
The issue with the lazy import trick was that it would work even with --only-binary :all: which shouldn't execute arbitrary code on its own.
That's presumably true, but not really all that interesting. If it was made so that you couldn't execute malicious code at install time, they'd just switch to executing it at runtime instead - still without an import malicious_lib probably, since they could just install a .pth file.
iow this is a point that only really matters to security researchers who want to build untrusted code. It doesn't really make a practical difference for anyone else
If your goal was to install some package (which turned out to be malicious), and then run the Python from the environment you installed it into, you're still pwned
I wonder why the test is failing. https://github.com/actions/setup-python/issues/1277
There was https://github.com/python/cpython/pull/142756, but I don't think that would cause this
Actually, I guess I could see an atomic store maybe causing troubles on aarch64?
is the test that’s failing even multithreaded?
no, I don’t think so
i wish github actions were less opaque 🤷♂️
Yeah I was imaginging maybe something wrong with the C11 atomics implementation in ubuntu 22.04 on ARM, maybe?
But probably something else going on
This (improper recognition of error and/or location) only seems to appear when typed in REPL, but not when run from file.
This was fixed recently by Pablo, it was an issue in traceback.py IIRC
I often wish that the default iteration behavior for a dict is key-value pairs (a la dict.items)
But for a containment check keys are more natural, and I think it would be weird if it was for key, value in dict: but if key in dict:..
I think I'd prefer that dict.__contains__ not be implemented and that it'd be required to specify keys or values, but I expect everyone who reads this to immediately hate that idea.
Yeah I hate that idea 
it's okay 🫂
This is the default in Rust... sometimes it's more convenient and sometimes it's not 😄 from having used both languages heavily, i think i have 0 preference between the two
I'm used to numpy telling you that you're an idiot if you call __bool__ on an array, and this feels similar
implicit iteration is handy, but can be confusing. I would have been ok with needing to use .keys(), .values(), or .items() in order to iterate a dict. And strings could have required .chars()
!tempban 1324082847384080437 3d You've had multiple warnings. Asking for paid work is, as you know, not acceptable here.
:incoming_envelope: :ok_hand: applied ban to @tight field until <t:1772146112:f> (3 days).
this is why Rust wins 😉
It should be because it's significantly faster than iterating keys and doing a getitem
you can still iterate over dict.items, and that's faster than what you said
Alternatively, dict shouldn't be iterable at all.
in a different language maybe but that would break even more code than making print a function did
unfortunately the case for basically any backward incompatible change to a builtin
If we were starting from scratch I'd be with @spark magnet and only allow iterating over .items(), .keys() etc., not the dict itself, but at this point that's way too disruptive a change to ever make
That's one for python 4
That's what I'm trying to say
Why I think .items should be the base iteration not the keys
Do you think if key in the_dict: should work?
I don't think base iteration and the containment check need to be similar
You're hinting this piece from the data model page?
It is recommended that both mappings and sequences implement the contains() method to allow efficient use of the in operator; for mappings, in should search the mapping’s keys; for sequences, it should search through the values. It is further recommended that both mappings and sequences implement the iter() method to allow efficient iteration through the container; for mappings, iter() should iterate through the object’s keys; for sequences, it should iterate through the values.
But yes I would work containment to work like that with a mapping
If it didn't how would you check for composite keys
For every container type that I can think of, found = x in y is logically equivalent to py found = False for e in y: if e == x: found = True break and to found = x in list(y)
I expect breaking that symmetry would subtly break lots of things
Fundamentally, the in operator checks whether a collection contains a particular thing, and a for-each loop loops over all the things in a collection. It would be weird and asymmetrical for the "things" in a dict to be kv pairs for the purposes of a for-each loop, but keys for the purposes of a containment check.
yeah, you’d have to turn off __contains__ and do key in d.keys() explicitly
it probably made more sense when keys and items returned lists instead of lazy iterators
I don't think they're lazy iterators
I think so
says deprecated
Is this style of containment supported anymore? I know there's some weird __getitem__ protocol that will use __len__ and __iter__ I believe but I didn't think __containment__ used __iter__ in anyway. I know you're not saying it does and more drawing a symmetry
says its moved to collections.abc
That is still supported, but indeed not what I'm talking about
Personally I think its .keys() being the implementation of __iter__ to be a relic of pre python3 and its bc things such as I linked
im not a language designer and dont want to be so I digress
I think that it's quite nice from a language design perspective that x in y and for x in y are analogous - that x is the same data type in both cases. I think that if you want for x in y to loop over kv pairs if y is a dict, that x in y would also need to expect x to be a kv pair
In what sense are they lazy?
I didn't know ItemsView supported the set operations
Nothing about them are lazy I believe. https://docs.python.org/3/glossary.html#term-dictionary-view
I didn't know that u could access the view and update the originating dict and the views are updated
I always assumed they were throw away
The sense that they do not eagerly copy the keys/values/items from the underlying dictionary when they are constructed, and so reflect changes to the underlying dictionary made after they are constructed
The Nth key to be yielded when iterating over d.keys() is not determined until the Nth time the iterator is advanced
from what I understand, they don't copy keys/values/items into a new structure when they're created, and they do reflect changes to the underlying dictionary after they are constructed, but each nth iterate is known before the first iteration (not determined at each iteration)
especially considering that the iteration order is specified to be the insertion order.
maybe im thinking about it wrong but the KeysView has to in some way know the full sequence of keys for that RunTimeError to have occurred, right?