#internals-and-peps
1 messages · Page 32 of 1
Hi
just a curiosity but is there a (practical) way to write python opcode mnemonics (ala writing assembly) and execute them on the interpreter?
and if not, is there an impractical way?
ah yes, seems like there are projects for this. answered my own question!
@rich nimbus can you say more about why you want to write bytecode directly? it changes from version to version.
There's https://github.com/brandtbucher/hax but it only works for up to 3.10
Purely curiosity, I'm learning more about how the interpreter works and I thought it would be interesting to be able to play with
ok!
Hi, I'm Arsalan from Pakistan. I'm a beginner in Python and excited to learn with you all!
Hello and welcome. Be sure to stick to the topic of each channel. Your message would be more appropriate for #python-discussion
https://memory.python.org is looking cool 👀
Analyze and compare CPython memory benchmark runs
The filtering for versions on the trends site doesn't seem to work: Despite a selected version like 3.13 it still shows 3.11, 3.12, and 3.13 builds in the graph.
I'm not seeing that behavior - do you still see it?
Either way, this is still fake data. There's a reason this hasn't been announced yet 😅
Ah, I see. Yeah, I'm pretty sure that's just because the chart is still being populated by mock data
Is this a bug? We just talked about it in #python-discussion and I decided to experiment and it doesn't work
https://peps.python.org/pep-3101/
The object.format method is the simplest: It simply converts the object to a string, and then calls format again:
class object:
def __format__(self, format_spec):
return format(str(self), format_spec)
But with simple custom class with str
>>> format(Pair(1,2),"^10")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported format string passed to Pair.__format__
>>> format(str(Pair(1,2)),"^10")
' (1, 2) '
And even with string itself passed to object's dunder:
>>> object.__format__("aaa","^10")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported format string passed to str.__format__
>>> str.__format__("aaa","^10")
' aaa '
Tested on 3.12.11.
It would seem object is doing something to the format spec instead of just passing it like the pep says?
CC @vernal narwhal and @ornate wyvern because maybe you're curious too
from a first look
^10 in my opinion adds those spaces
notice that the first string is 10 length
the second is 9
... Yes, that's the point of center formatting. I meant that format(Pair(...), "^10") doesn't work despite what the object's dunder is supposed to do
Literally this vs the quote you originally posted in pygen
>>> format(Pair(1,2),"^10")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported format string passed to Pair.__format__
>>> format(str(Pair(1,2)),"^10")
' (1, 2) '
If object's format is calling format(str(obj), format_spec) like pep said, those two should work the same. But one doesn't work
u mean why with str it works and without it doesn't
I think it's just that PEP 3101 is out of date, the canonical docs for object.__format__ say this:
The default implementation by the object class should be given an empty format_spec string. It delegates to
__str__().
Delegates to str dunder sounds like it does what the pep says? Or does it mean it only delegates there with empty spec?
PEP's status is final
let me try as well
ah, yup:
Changed in version 3.4: The
__format__method of object itself raises aTypeErrorif passed any non-empty string.
PEP 3101 was written for 3.0. it's not updated for any future changes made to __format__.
Where's the pep defining that change?
he passed str(Pair(1,2)) and it worked
Really? I have my pronouns in 3 places in my profile, and you interact with me regularly, yet you say "he"?
Also, str works because centering is string's format stuff
Full offence, that is misgendering
i dont feel comfortable using they
Tough luck. That is misgendering on purpose and against the rules
🤷♂️
Again, where is the pep defining this change? Because that is a breaking change
there wasn't any. see https://bugs.python.org/issue7994
so whats the purpose of object's format then
seemingly we were more lenient with changes to builtins back in 3.4
yes
Shouldn't pep be edited to mention that this part of described behaviour is no longer valid? Because the rest of the pep is still important, just that part about object's format dunder is not
Also imo error message could be clearer - it says {type(obj)}.__format__ got unsupported format string - suggesting that this class has format dunder, but imo should say just object.__format__ got non-empty format_spec (like the issue says - raise when it's not empty)
nope, final PEPs are kept solely as historical documents. they're not updated for new amendments to the feature they added.
Huh, not even the header to mention which pep changed them? Although here it wasn't changed by a PEP...
if it were a PEP that changed it, then it would say Superseded-by in the PEP headers. a lot of PEPs have a note saying something like "this is a historical document, the up to date documentation is at <docs url>", but I personally don't think it's worth adding it here (on the basis that it would probably inspire copycat PRs).
Imo it's still worth adding if it's valid superseding, and editing the header was forgotten. Because as I said, the quotes from this pep were flying in #python-discussion as if it was valid and I was just like "huh, I didn't know that" and tested it and then arrived here 😄
I mean, this applies to all PEPs, especially old ones
PEP 3101 should probably get a header pointing to current str formatting docs
we do that for a number of other PEPs
Yo
Please follow the rules 😌
And refer to people with correct pronouns and just ask if you are unsure about it
Before saying anything
@lofty hornet the moderators have already responded to what you're referring to. There's no need to pile on.
Mb
Thank you for your concern for our rules. Just send a message to @summer lichen if you see something that you think we should respond to.
@balmy plank you can look at #❓|how-to-get-help if you have a question. you can't try to hire people here, though.
oh, i sm sorry
大変申し訳ありません。
is that a CPython implementation detail?
yes
method_cache() implementation supporting subclasses. feedback's welcome lol
https://github.com/jaraco/jaraco.functools/pull/34
At a glance, it looks like it locks on a per-closure basis instead of a per-instance basis, which pre-3.12 functools.cached_property did as well (thought its lock was technically per descriptor). There was enough feedback about how much of a performance issue it was that the steering council agreed that it should be removed. Not sure it's a big deal if you don't intend for method_cache to be used for something performance-critical, but who knows.
Relevant links:
great insight! I agree I was a little bit unhappy about per-closure instead of per-instance, but I was thinking that it wouldn't matter that much with double-checked locking
I'm seeing that I must have belittled the case
thanks, will think about a different approach
Quite possibly you should just not do locking. My takeaway from the locking saga with functools.cached_property was that the locking added more problems than it solves
If the worst thing that can happen is the cached method might execute multiple times in some cases, that seems fine
The various links have people’s attempts at per-instance locking as well, if you want to try that.
Could also just document that the decorated function should use a lock if a user thinks thread-safety is important for their purposes.
Is this addressing https://github.com/python/cpython/issues/102618 ? This stuck in my memory from when I first encountered the footgun, so just wanted to close the loop
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
I don't know if these problems are cached_property specific, but I've been noticing lately that a teammate of mine uses @lru cache annotations on functions a lot, instead of just writing a simple class that would open the relevant data once and then access it.
lru cache seems even worse because then your state is globa, whereas with cached_property I would guess it's just stored in the instance of the object
and it just feels... idk. Maybe adds a tiny bit of convenience short term, but I can't help the nagging feeling that someday it's either going to surprise me with really unintuitive behavior, or it's just going to make debugging a headache
I'm curious if other people feel like @lru_cache should be used very sparingly, or think it's fine.
For me, i feel like I'd mostly only really use it when I had a bunch of code already written with a function, and wanted to speed it up without rewriting it.
I've used it (or other caching decorators) frequently. It can definitely add a layer of complexity, but caching can also solve a lot of problems
fair enough, I should maybe be more open minded then. It just feels like quite a bit of magic to save what is typically a handful of lines of code. but maybe the magic isn't an issue often enough that this matters.
I am concerned about it interfering with testing, since it is a hidden global.
If you add the state manually instead of using the decorator, you can get similar issues
Well, to be fair, if I was not using the decorator I wouldn't have a lazy global
I'd have a class, whose init function took an argument explicitly saying which data to load
And then an accessor
Hello everyone I'm not sure if this is the right place to ask but I was wondering about the use of _asyncio_future_blocking
As far as I know, it's set when the future is awaited (or yielded from same thing) and the yielded future is brought back up to __step_run_and_handle_result. I was told that "the flag is there to avoid waking up the event loop when the future is done but no one is there to await it" but I only see wake up mechanisms (run_forever looks like just a while True and run_until_complete just adds a callback to stop run_forever which it calls internally)
The only wakeup mechanism I see is on individual tasks, and tasks that receive a future from coroutines register their wakeup as a done callback on the future, but I don't seen why is _asyncio_future_blocking == True needed for that, since when the tasks receives a non blocking future it raises a RuntimeException anyway
I'm doing my best to understand but this is all a bit obscure to me haha
Just tried to look up all instances of that in the Lib folder, I don't see what it's used for haha
Hi please helpe me I understand python concepts I'm facing deficult while solving the problem
Hello, try reading #❓|how-to-get-help . Be sure to read the entire thing.
Hello from the future. What problems did snakemake cause for you? Would you recommend something else for configuring a data processing workflow?
I think that's a little OP for my needs at the moment and meant for a different kind of data, but thank you for the info.
hey everyone, where can I find a reference for python bytecode? for learning & exploratory purposes, I know it can change between versions (on that note, could be for any version that's still supported)
the dis module docs have some info
also the InternalDocs/ folder in the cpython repo maybe, not sure how much the bytecode is covered
you'll probably be able to find various tutorials and stuff on the internet, but a lot of aspects of the bytecode change significantly between versions
!pban 1319768663485583391 scan spam
:incoming_envelope: :ok_hand: applied ban to @nimble dust permanently.
I am making a program which caches files (up to 100MB), sent over network one kilobyte at a time
How should I allocate memory for the cached file?
Should I allocate the memory for the entire file beforehand?
while len(data) == 1000:
head = conn.recv(8)
data = conn.recv(int.from_bytes(head[1:8], "big"))
cache[i : i+len(data)] = data
i += len(data)```
or should I allocate memory as data is received?
```cache = bytes(0)
while len(data) == 1000:
head = conn.recv(8)
data = conn.recv(int.from_bytes(head[1:8], "big"))
cache += data```
The first method would be the C way of doing it but I don't know how bytes objects are stored in memory, and I'm worried about memory fragmentation.
The first way looks best if you know the size in advance, except that bytes objects aren't mutable, so you'll need to use a bytearray instead.
am I correct in assuming that initializing the byearray will take up a contiguous block of memory, whereas that may not be the case in the other method?
It'll be contiguous for both, but the first method allocates it only once, and the second reallocates on each append, so that each iteration of the loop creates a new contiguous array that's 1000 bytes larger, and then deallocates the previous contiguous array
Which can lead to fragmentation, in addition to just being generally less efficient
also - I didn't have time to bring it up earlier, but - it doesn't seem like you're handling recv() correctly here. If this is using UDP, you're missing any handling for dropped packets, so I assume that it's TCP. But if it's TCP, any recv() call can return less data than you asked for (even if the client sent the data in the chunk sizes you're expecting). You need to loop until you get the amount that you want (or an empty bytes object, indicating the connection dropped before the client sent everything you were waiting for). You need something like (untested, off the top of my head) ```py
def read_exactly(sock, n):
buf = bytearray(n)
read = 0
while read != n:
chunk = sock.recv(n - read)
if not chunk:
raise EOFError # Or however you want to handle this
buf[read : read + len(chunk)] = chunk
read += len(chunk)
return buf
cache = bytearray(size_of_file)
i = 0
while i != size_of_file:
head = read_exactly(conn, 8)
data = read_exactly(conn, int.from_bytes(head[1:8], "big"))
cache[i : i+len(data)] = data
i += len(data)
I think someone said that Astral eventually intends for uv to have a solution for dependency conflicts. Could that be done through a custom implementation of importlib, and would it be possible for uv to monkeypatch it at runtime?
I don't think the idea relies on a custom implementation of importlib, I think it just depends on a finder and loader on sys.meta_path that searches for modules in a different place depending on what module is doing the importing. Plus some way to deconflict things in sys.modules I guess
and they could register that sys.meta_path entry using a .pth file in uv-created environments, or with a custom sitecustomize.py, or something.
the hard part isn't finding multiple copies of a library at different versions, or loading the right one depending on what module is doing the importing. The hard part is handling extension modules, which get loaded as shared objects and which need to not be able to find symbols from a different version of the library (or its dependencies)
from __future__ import annotations
import dataclasses, ssl, struct, hmac, hashlib, socket, time, tempfile, secrets
from typing import Final
MAX_CHUNK: Final = 2 * 1024 * 1024
MAX_FILE: Final = 50 * 1024 * 1024
SOCK_TIMEOUT: Final = 10
TRANSFER_DEADLINE: Final = 120
VALID_CMD: Final = 0x01
HMAC_KEY: bytes = secrets.token_bytes(32)
@dataclasses.dataclass
class ProtoError(Exception):
msg: str
def read_exactly(sock: socket.socket, n: int) -> bytes:
sock.settimeout(SOCK_TIMEOUT)
buf = bytearray(n)
mv = memoryview(buf)
read = 0
while read < n:
chunk = sock.recv_into(mv[read:], n - read)
if not chunk:
raise ProtoError("peer closed early")
read += chunk
return bytes(buf)
def make_tls_server_socket(bind: tuple[str, int], cert: str, key: str) -> socket.socket:
ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
ctx.minimum_version = ssl.TLSVersion.TLSv1_3
ctx.load_cert_chain(certfile=cert, keyfile=key)
raw = socket.create_server(bind, reuse_port=True)
return ctx.wrap_socket(raw, server_side=True) # type: ignore
def receive_file(conn: socket.socket) -> bytes:
start = time.monotonic()
pre = read_exactly(conn, 16)
size, n_chunks, salt = struct.unpack("!II8s", pre)
if not (0 < size <= MAX_FILE):
raise ProtoError("bad file size")
if n_chunks == 0 or size // n_chunks > MAX_CHUNK:
raise ProtoError("bad chunk meta")
mac = hmac.new(HMAC_KEY + salt, digestmod=hashlib.blake2s)
with tempfile.SpooledTemporaryFile(max_size=MAX_FILE) as tmp:
for expected_idx in range(n_chunks):
if time.monotonic() - start > TRANSFER_DEADLINE:
raise ProtoError("transfer timeout")
header = read_exactly(conn, 9)
cmd, idx, length = struct.unpack("!BII", header)
if cmd != VALID_CMD:
raise ProtoError(f"invalid opcode {cmd:#x}")
if idx != expected_idx:
raise ProtoError("out-of-order / replayed chunk")
if not (0 < length <= MAX_CHUNK) or tmp.tell() > size - length:
raise ProtoError("chunk length invalid")
chunk = read_exactly(conn, length)
tmp.write(chunk)
mac.update(header)
mac.update(chunk)
received = read_exactly(conn, mac.digest_size)
if not hmac.compare_digest(mac.digest(), received):
raise ProtoError("HMAC mismatch – tamper detected")
tmp.seek(0)
data = tmp.read()
if len(data) != size:
raise ProtoError("size mismatch after readback")
return data
just skimming, but that looks much more reasonable
Yeah, unfortunately it's python though. There is much more value in making the same thing in go
hm? why?
IO-bound stuff should work just about equally well in Python as in Go
Python can wait for the remote just as fast as Go can, and copy bytes into a buffer almost as fast
Yeah true, Python can wait just as fast on I/O. But it reacts slower, since it’s interpreted and has to juggle the asyncio event loop and callbacks. Plus, the GIL gets in the way of real multithreading. This means if you're caching a bunch of files at once, you'd need to use multiprocessing or tools like Numba. Still, even then, Go’s goroutines are just way more flexible and scale way better.
Uvloop might make the event loop just as good, although I haven't benchmarked it
Nope it doesn't. Unfortunately the event loop and callbacks are still not as good.
I heard that on recent versions of Python uvloop no longer makes much of a difference
Why? I thought it was still really performant when used with cpython. Most of It should still be faster at I/O than regular asyncio, maybe not with SQL database queries or something specific like that. I'm not too sure
asyncio in CPython itself was optimized, while uvloop hasn't moved much over the last few years
haven't looked at benchmarks though, just something I heard discussed at work
In October 2024 they fixed compatibility issues with python 3.12.5 and improved signal handling with cpython. So I'm not sure what issues there may be or if the gap narrowed but Uvloop is just a C type implementation of asyncio that runs it in a wrapper pretty much
I haven't seen any benchmarks that prove it to be still not worth using
I had this exact issue with the client not receiving data, and found a similar solution and made an equivalent of read_exactly().
I have not had any such problem with the server, even after hundreds of megabytes of test data sent without issues, though if I do I will replace all recv() calls with a more reliable function.
Thanks for the concern though, and another method of handling sockets' unpredictable behavior
what is the the use of socket programming??
Any thoughts, ideas or knowledge on pytest/cpython diff in usage of __test__?
A file level test = False succesfully stops pytest from collecting any test from that file, however when running pytest --doctest- modules on the same file, this error appears: /sw/Python/Ubunt...
so like
theoretically
if i made an implementation of python that was completely compatible with cpython
except
i had a slight implementation difference
like say
not implementing pep 456
it wouldnt be a python implementaiton
?
there isn't a well defined rule for what can or cannot called 'a python implementation'
if it can pass the CPython tests suite, run most Python programs in the wild and supports pure python pypi packages it's probably fair to call it a python implementation
I think it's fair to call something a python implementation even if it fails most of those tests. I'd say that MicroPython and CircuitPython are clearly Python implementations, despite missing a giant chunk of the stdlib
Is PEP 456 just an example or do you really want to get more hash collisions in your Python code?
particle physics programming
how do you know the hashes exist if they don't collide?
shordingers hash
pep 798, yay
!pep 798
Did it just get accepted?
i think they mean that someone actually wrote it now
Status: draft. That means it's time to discuss and refine it before sending to the Steering Council. Discuss link is at the top of the PEP.
I have a implementation for a HybridLock to work with multiple threads and asyncio at the same time, although i now also want to allow sync access through that lock, but i am worried about the possible caveats it can have
also i do want to confirm that whenever a thread will access this lock will it block that thread (which is what i want and expect) or will it block the thread in which the lock instance has been created (which is not what i want)
what we are mostly worried about is the cancellations part, it could maybe cause unexpected behaviour
well nvm now, we figured it and yeah it worked in whatever testing we did, but yeah feel free to point out any caveats if you see them
i wonder why the stdlib doesnt provide such a lock
In the core.py podcast, they mention that the 3.15 profiler supports asyncio. I want to see how this works under the hood, but when I check Lib/profile/sample.py I don’t see anything useful + I don’t see any relevant PRs with some basic searches.
What am I missing?
Ultimately I would love for this builtin async support to work for more than asyncio, which is why I asked here. (Crossposted from a help channel because I doubt I’ll get any good responses before the question gets automatically closed)
I think it's to do with the _remote_debugging module (PEP 768) having support for iterating tasks, it might do that automatically? Also this issue: https://github.com/python/cpython/issues/91048
There was a PyCon US talk about this, actually!
asyncio in Python 3.14 introduces a new powerful feature: introspecting a running asyncio program from another OS process. This changes everything—now you can debug and profile your asyncio code in production with no performance penalty. Join us for a fun ride as we show how this magic works under the hood and how you can use it. Learn about t...
Both hilarious and informative 😃
Hm I thought that one was specifically asyncio debugging, I’ll take a look. Thanks!
one million hertz
Ah, well, yes, that video is specifically about how the asyncio call stack can be gathered. Separately from that, and not covered by the talk, the profile module is gaining a sampling mode - which I believe didn't make the cut for 3.14, but will be in 3.15. That might be the thing you're asking about
yeah I thought there was async support but couldn't find any mention in the code/in the docs and no obvious prs, so I was wondering what I was missing
I don't think it's been merged yet, but https://github.com/python/cpython/compare/main...pablogsal:cpython:sample-async seems to be the branch where @nova wraith is iterating on it
the new Lib/profile/async_pstats_collector.py there seems to be what you're interested in
thanks that's exactly what I was looking for! I wonder why there's no PR -- was this feature dropped temporarily to get the profiler in first due to some issues?
no, the other way around
the sampling profiler was added first, and this is a new feature being added to the sampling profiler, building on top of the previous work
oh I just assumed because this was adding e.g. the sampling profiler to the whats new (so I assumed w/o asyncio was just a cherry pick or something)
uh... hm. 🤷♂️ 😄
:incoming_envelope: :ok_hand: applied warning to @high ice.
is there a PEP somewhere that defines python to be strictly interpreted and not AOT compiled to machine code?
shouldn't be, tho many PEPs implicitly makes this assumption in the implementation section.
plenty of things AOT compile python to machine code
nuitka and mypyc off the top of my head
technically pypy
Technically Cython
also the distinction between an “interpreted” and “compiled” language is pretty squishy to the point where the terms aren’t very useful IMO
the CPython interpreter has a bytecode compiler, for example
funny, I was just thinking that this morning as well
Curious what makes you say "technically" here. Cython compiles Python to C code and then machine code, nothing technical about it. Is it because Cython semantics don't exactly match standard Python semantics?
It has a mode that compiles unmodified Python code in addition to the ones where you write in a .pyx file and get more syntax
recent versions of Cython also parse type annotations, which can lead to speedups just like mypyc
Some people don't like to count it because the resulting binaries still depend on the CPython runtime. I was just trying to nip that objection in the bud 🙂
Any ideas why sum over a generator would be slower than creating a list?
See benchmark here: https://github.com/astral-sh/ruff/issues/19419
Using a generator means repeatedly starting and pausing a code object. That adds overhead
I thought generators were recommended over creating lists. Is this only the case sometimes?
they'll definitely save you memory and create more concise code
it's recommended when you don't want to collect it all into one big temporary memory dump
if your argument is that the generator is faster, you should benchmark first before making the argument
so for memory efficiency, sum(1 for ...) is definitely fine, but speed efficiency is where len([1 for ...]) wins
Ok, so for the lint ruggestion to ruff in the above link, is there one of the forms to always prefer?
And there is no case that can win both memory efficiency and speed? I mean, this is just counting 🤔
write a for loop 😄
🙃
there's also been talk in CPython of inlining the generator execution with functions like any() and sum(), that would likely make the generator win on speed
Interestingly if I add this to @brisk jay 's script in that issue it's even slower
print(
"sum generator bools",
timeit(lambda: sum(e % 2 for e in big_list), number=10_000),
)
likely because sum() has a fast path for summing just ints
bool is a subclass of int but the fast path is likely only for exact ints
e % 2?
since subclasses of ints could be doing funky business in __add__
oh you're right
oh the issue is I'm summing more things
5000 ones and 5000 zeroes instead of just 5000 ones
branchless somehow loses out over branchful in python :p
benefits(?) of an object-only language
Maybe we can crosslink the ruff issue with the cpython issue you mention @jelle? Seems like some testcase/benchmark should perhaps be added?
Actually a for loop is slower than len(list) too
ah it was done for any()/all() but not sum() https://github.com/python/cpython/issues/131738
And any/all is actually affected by creating an array/generator of booleans when filtering (ref https://github.com/astral-sh/ruff/issues/16904), I supposed because of an extra call to __bool__?
That's why in my micro-benchmark in the linked issue above I also included a len-check on an array of booleans, just in case it mattered (it made a micro difference, could just be due to not storing big numbers)
Also e % 2 is just a stand-in for a filter function. It cuts the list in half for filtering which I figured should be good enough for a quick sanity check.
Can sum be improved in similar way?
sum() is more difficult because what it does is a little more complicated
That's strictly bc of the run around that may occur between 2 arbitrary objects when doing addition operator?
Like __add__ then if that doesn't work try __radd__ ?
i believe it does some tricks for numerical stability
Oh wow
Huh, indeed:
Changed in version 3.12: Summation of floats switched to an algorithm that gives higher accuracy and better commutativity on most builds.
TIL. I knew about math.fsum but not that sum gained some of those smarts
Got a special JIT build segfaulting where it wasn't before. Bisected it to https://github.com/python/cpython/pull/136307. Does anything in that PR seem suspicious? The diff I apply to get a special build is (and it works without this patch):
index 8b7f12bf03d..329b1f615e3 100644
--- a/Include/internal/pycore_optimizer.h
+++ b/Include/internal/pycore_optimizer.h
@@ -116,12 +116,12 @@ PyAPI_FUNC(void) _Py_Executors_InvalidateCold(PyInterpreterState *interp);
// Used as the threshold to trigger executor invalidation when
// trace_run_counter is greater than this value.
-#define JIT_CLEANUP_THRESHOLD 100000
+#define JIT_CLEANUP_THRESHOLD 10000
// This is the length of the trace we project initially.
#define UOP_MAX_TRACE_LENGTH 800
-#define TRACE_STACK_SIZE 5
+#define TRACE_STACK_SIZE 10
int _Py_uop_analyze_and_optimize(_PyInterpreterFrame *frame,
_PyUOpInstruction *trace, int trace_len, int curr_stackentries,
@@ -152,7 +152,7 @@ static inline uint16_t uop_get_error_target(const _PyUOpInstruction *inst)
}
// Holds locals, stack, locals, stack ... co_consts (in that order)
-#define MAX_ABSTRACT_INTERP_SIZE 4096
+#define MAX_ABSTRACT_INTERP_SIZE 8192
#define TY_ARENA_SIZE (UOP_MAX_TRACE_LENGTH * 5)
@@ -163,7 +163,7 @@ static inline uint16_t uop_get_error_target(const _PyUOpInstruction *inst)
// progress (and inserting a new ENTER_EXECUTOR instruction). In practice, this
// is the "maximum amount of polymorphism" that an isolated trace tree can
// handle before rejoining the rest of the program.
-#define MAX_CHAIN_DEPTH 4
+#define MAX_CHAIN_DEPTH 16
/* Symbols */
/* See explanation in optimizer_symbols.c */
diff --git a/Python/optimizer.c b/Python/optimizer.c
index 8d01d605ef4..33c00169a60 100644
--- a/Python/optimizer.c
+++ b/Python/optimizer.c
@@ -456,7 +456,7 @@ BRANCH_TO_GUARD[4][2] = {
#define CONFIDENCE_RANGE 1000
-#define CONFIDENCE_CUTOFF 333
+#define CONFIDENCE_CUTOFF 100
#ifdef Py_DEBUG
#define DPRINTF(level, ...) \
I found the related change, but it makes no sense. It's just adding an identifier to Include/internal/pycore_global_strings.h and respective generated files.
That's a pretty big hint. It's likely something to do with constants or immortals in the JIT, because strings in that file are always immortal while normal strings arent always
Turns out this diff is enough to segfault main:
index 454c8dde031..c49652adc27 100644
--- a/Include/internal/pycore_backoff.h
+++ b/Include/internal/pycore_backoff.h
@@ -99,7 +99,7 @@ backoff_counter_triggers(_Py_BackoffCounter counter)
// Must be larger than ADAPTIVE_COOLDOWN_VALUE, otherwise when JIT code is
// invalidated we may construct a new trace before the bytecode has properly
// re-specialized:
-#define JUMP_BACKWARD_INITIAL_VALUE 4095
+#define JUMP_BACKWARD_INITIAL_VALUE 344
#define JUMP_BACKWARD_INITIAL_BACKOFF 12
static inline _Py_BackoffCounter
initial_jump_backoff_counter(void)
If we define JUMP_BACKWARD_INITIAL_VALUE to 345 instead of 344, no more segfault.
False alarm, 345 won't segfault during compilation, but will on normal use :/
True boundary numbers seem to be 703 for no crash, 702 for crash.
Adding a bunch of new identifiers doesn't change that number 🤔
Trying to get to the REPL is what makes the crashy build crash. python -m _pyrepl will crash, python -m random or python -m http.server will run to completion. 🤔 🤔
Crash happens on unpatched main too, patching just makes it happen faster.
Does a hashavle object need to have a "correct" __eq__ implemented?
In cpython dict lookop seems to be short-cirtcuiting using is. Is this specified in the python language or an implementation detail? Is cpython's behavior ok? Why?
The only required property is that objects which compare equal have the same hash value.
If a class does not define an__eq__()method it should not define a__hash__()operation either.
I think having equal objects hash to different values just results in UB, anything CPython does is fair game then.
I have objects with hash function set to id() and __eq__ raising Type Error at the moment
But I'm not confident with this code. 😅
And the short circuting in dict really surprised me
The requirement is that objects that compare equal should return the same hash value. This implies that they need an __eq__ method otherwise you cannot compare them and determine whether or not they are equal. It also means that two objects that are different (but equal) needs a __hash__ method that returns the same value. Using id() as a hash value will return different ids for different objects and so the requirement will not be met.
I just noticed 3.14 rc2 comes out aug 26th, but 25/8 is a better pi approximation 😦
smh
you mean 22/7?
better approximation, but it's a bit late for that date now
RC1 was on that day
22/7 is even better but august only has so many options...
in other news: neat! CPython is getting a built-in sampling profiler apparently. https://discuss.python.org/t/pep-799-a-dedicated-profilers-package-for-organizing-python-profiling-tool/100898
Hi everyone, This is a straightforward structural proposal: PEP 799 introduces a new profilers standard library module to house Python’s built-in profilers under a single, coherent namespace. It: Adds profilers.tracing (alias for cProfile) Adds profilers.sampling (alias for tachyon, the new sampling profiler in 3.15 using the introspec...
!pep 800 is also neat
I've recently run across this while working on a Ruff issue, and I was wondering if anyone here has some insight:
Is the interaction between f-string format specs and whether or not the string is raw documented anywhere?
What I mean is f"{A():\xFF}" vs rf"{A():\xFF}"
It looks like this changed in 3.12 with PEP 701. On 3.11 and prior, f"{A():\xFF}" is not equal to rf"{A():\xFF}", the rawness does affect the format spec (ÿ vs \xFF). On 3.12 and later, f"{A():\xFF}" is the same as rf"{A():\xFF}", both giving ÿ.
I've looked everywhere I can think of in the docs, as well as PEP 498 and PEP 701, but I can't find anything talking about this edge case. I also could not find any issues about it on the CPython github.
PS ~>uvx python@3.11 -c 'A=type("",(),{"__format__":lambda _,f:f});print(f"{A():\xFF}"==rf"{A():\xFF}")'
False
PS ~>uvx python@3.12 -c 'A=type("",(),{"__format__":lambda _,f:f});print(f"{A():\xFF}"==rf"{A():\xFF}")'
True
wouldn't be surprised if that's accidental and nobody noticed so far
probably worth opening an issue on cpython
if I wanted to download the Cpython source and then checkout the tag for 3.13.5, that wouoldn't be difficult to do would it?
no, you can also use pyenv to automate all that: pyenv install 3.13.5
if you want a build with your checkout anyway
that worked great thank you
sorry if this has been answered but I've been looking around the code and I can't figure out where different types are added as virtual base classes
this would be helpful for knowing how to make instanceof checks
you mean the ABCs like Sequence?
yeah
That's done with .register calls, I think they're mostly in _collections_abc.py. But classes can also present as subclasses of these ABCs through the __subclasshook__ which generally just checks for the existence of some methods.
The source code says:
Note that the new implementation hides internal registry and caches,
previously accessible via private attributes_abc_registry,
_abc_cache, and_abc_negative_cache. There are three debugging
helper methods that can be used instead_dump_registry,
_abc_registry_clear, and_abc_caches_clear.
That was in NEWS.txt
however I've been unable to find any actual usae of _dumb_registry searching the codebase and github (I don't have particularly great technique for searching especially github)
NEWS.txt might describe the original implementation, not necessarily how things work now
I thought maybe if I read all the type hint related PEPs in orderthat might help some
ABCs were not originally meant for type hints, and are still a somewhat independent system
Reading PEPS can be helpful but their text doesn't necessarily reflect current behavior exactly
ah, ok, yeah they seem a bit like a shadow type system but they're fully incorporated to instance of one way or another, it seems to me
so it seemed to at some point get added into runtime behavior, maybe for type checkers?
What is "it" here?
which one? haha
the runtime recognition of how the typing helper classes get incorporated in I guess
I'm trying to get an example
The runtime behavior came first
Type checkers don't see that; they know that e.g. list is a Sequence because in typeshed we pretend it's an actual base class
stdlib/builtins.pyi line 1084
class list(MutableSequence[_T]):```
actually, it looks like Any cannot be used with isinstance 😄
I'm no longer sure, the only thing that passed any of my checks was object
but I'm reading your GitHub ref, thank you
strange
>>> l = list
>>> l.__subclasses__()
[<class '_frozen_importlib._List'>, <class 'functools._HashedSeq'>]
>>> l[0].__subclasses__()
[<class '_frozen_importlib._List'>, <class 'functools._HashedSeq'>]
>>> l[1].__subclasses__()
[<class '_frozen_importlib._List'>, <class 'functools._HashedSeq'>]```
I think I messed it up, once moment
ok maybe not, I just still get those two
I'm probably missing something
What were you expecting?
Note that gives you subclasses, not base classes
I had something to give me the class tree for list recursively the other day and I want to say something from the email module was in it
maybe it wasn't list but something else fairly simplistic
it's only going to be in the list if you've imported that module
In a fresh 3.15 REPL I get
>>> list.__subclasses__()
[<class '_frozen_importlib._List'>, <class 'traceback.StackSummary'>]
>>> import email.message
>>> list.__subclasses__()
[<class '_frozen_importlib._List'>, <class 'traceback.StackSummary'>, <class 'email.header._Accumulator'>]
oh ok, well, at least I'm not crazy
So yes, importing email gives you another subclass of list
how does one use abc.ABCMeta._dump_registry() ?
everything I've tried has said that the class has no member _abc_meta
>>> collections.abc.Sequence._dump_registry()
Class: collections.abc.Sequence
Inv. counter: 25
_abc_registry: {<weakref at 0x102f56040; to 'type' at 0x102945418 (memoryview)>, <weakref at 0x102f55e80; to 'type' at 0x10294ac48 (tuple)>, <weakref at 0x102f55fd0; to 'type' at 0x102948e50 (range)>, <weakref at 0x102f55ef0; to 'type' at 0x10294ec60 (str)>, <weakref at 0x102f55f60; to 'type' at 0x1029316b8 (bytes)>}
_abc_cache: set()
_abc_negative_cache: {<weakref at 0x102f55da0; to 'type' at 0x102945418 (memoryview)>}
_abc_negative_cache_version: 12
So objects that I want to forbid comparison for, can/should never be hashable?
If you do not provide an __eq__ method, then objects will be compared using their "id". This is a built in default. If you want to use an object as a key in a dict, then you will need to provide both an __eq__ method and a __hash__ method. It is up to you how they work. If you want to "forbid" (your word) an object from being a key, then you can have your __hash__ method return NotImplemented.
It is unclear, at this point, what you objective may be. If you want to ensure that your object can never be compared and found equal, then write your __eq__ method so that it always returns False.
If you need further clarification, you will need to state what you are trying to achieve.
I want to avoid two objects of my class to be compared using equal operator like: a == b. So I did
raise TypeError("..")
I did this because there is no logical reason to compare them, but there has already been mistakes of comparison these objects where there is actually a risk of incorrectness in the logic.
However, some of the usecases are as keys in memoize caches. If I just remove the memorization, I see 3x runtime increase (e.g. 20 seconds -> 1 minute)
This is old messy code I'm looking at. Trying to figure out what is worth doing, and how.
If it is your code and you do not want to compare two objects, then do not write code that compares them. If you want two objects of your class to alway compare "not equal" then return False.
(if they are used as keys: as long as all hash conflicts are due to the objects being the same same, id() will "short circuit" and never call __eq__. But I believe this is an implementation detail of cpython as I haven't found any documentation about it. Also I don't want to rely on this behavior anyway 😅)
"My code" this is code written by 50 people
I don't want them to always compare unequal. I want them to not be comparable.
Then why all the talk of id values and hash values? If you cannot compare two objects, then you cannot determine that you have found it, whether it is in a hash table or elsewhere.
You can with id(). It's not nice, but code "works out". I don't want to do it but all of these questions spurred from lack of detailed knowledge about dict.
And getting the last puzzle pieces was difficult due to dict short circuting with an is (id()) check.
The problem is I won't forbid comparing these o jects, because there is too much code relying on comparing them in different places of the code. Some are straight up comparisons and easy to deal with. Some are as keys in memoize caches or as items in sets - not easy to deal with.
It would be interesting to know WHY you do not want two objects to be able to be compared. In the meantime I would try something like this:
def __eq__(self, other):
return TypeError(f'Cannot compare objects of type {type(self)}')
is pip shared between python versions on windows?
I guess my real question is, if so, is the cache per-version or shared?
That's what I have already 🙂
The underlying code that a lot of other stuff is implemented on top is very bad designed.
It's a graph structure (mostly tree with some sibling pointers). Each is constructed lazily. And (very!) unfortunately a new object might be constructed for a node that has already been constructed. Suddenly we have two objects representing the same node. Should they compose equal or not? I want to forbid comparison to force the user code to understand this issue.
The ideal thing would be to fix the underlying structure to never have this situation appear.
nope
per version
It seems that you have answered your own question. Disguising one problem as another does not solve the problem. You know what you need to do ...
What can be done to clean up the mess is not always the most important thing. Sometimes looking for alternative fixes is worth it. But yes I learnt some parts about the code trying to fix it. Now I won't have time to continue this path 🙃
@knotty pond your message was removed for surveying, which is not allowed.
Oh super sorry.
Not sure where else to float this - but the scoping of sys.remote_exec seems a little confusing...
The docs say that the injected code is "executed by the target processes' main thread", which I (maybe wrongly) assumed would mean in the main thread's global namespace
But currently the code is executed with an empty global and local namespace: https://github.com/python/cpython/blob/c5cebe1b5afa58fc0ee95153a4b1905229dce7dc/Python/ceval_gil.c#L1198-L1217
Maybe a clarification in the docs? Or maybe I just need a worked example of importing and interacting with existing objects
Anyway, seemed minor to open an issue about and couldn't figure out where this might go on Discuss, so just tossing it out here
There's no such thing as "the main thread's global namespace". Each module has its own global namespace. The injected code gets its own global namespace, just like it would if it were in a file that you pulled in with import
Can you explain what you're trying to do? What objects are you trying to interact with?
Thank you! Just looking to examine/modify variables in existing/imported modules at runtime:
#foo.py - run a loop
import time
x = 0
while True:
x += 1
print(x)
time.sleep(1)
# inject_me.py
x = -9999
# main.py
import sys
sys.remote_exec('pid-of-foo.py', 'inject_me.py')
I wish we called it "module scope" and not "global scope"
for this specific example, assuming you had done python foo.py to run the process, you'd want inject_me.py to do ```py
import main
main.x = -9999
if you'd instead done import foo, you'd want to instead do ```py
import foo
foo.x = -9999
Ahhhh thank you very much
what does this even mean
Do you understand what a module is, as an object in the code that you import?
python-gdb.py is super useful. is there a way that people use to get it automatically sourced every time they debug python?
anyone have the foggiest clue what’s happening here? https://github.com/pandas-dev/pandas/pull/61950#issuecomment-3166257178
somehow python 3.14 doesn’t catch a ValueError that 3.13 does
hm. very foggy guess... what do you see if you do: ```py
import pandas as pd
obj1 = pd.DataFrame({'0': [1, 1, 1, 1], '1': [1, 1, 1, 1]})
obj2 = pd.DataFrame({'0': [1, 1, 1, 1], '1': [1, 1, 1, 1]})
try:
obj1 and obj2
except BaseException as exc:
print(f"{exc = }")
print(f"{type(exc) = }")
print(f"{type(exc) is ValueError = }")
oh i think I’m hitting the bug that needed a bytecode tweak for rc2…
ah, indeed, it does look like that
At HRT, we’ve found that centralizing our codebase facilitates cross-team collaboration and rapid deployment of new projects. Therefore, the majority of our software development takes place in a monorepo, and our Python ecosystem is set up such that internal modules are importable everywhere. Unfortunately, the convenience of this arrangement ...
Great name
!pep 802
new pep just dropped
{/}
people seem to really dislike it, unfortunately
I wrote the PEP as I’m sympathetic to the idea and I think it’s worth exploring how/if we can resolve the syntax hole for sets, which have sometimes felt a little unloved by the core language. Unsurprisingly, people have lots of thoughts!
_ _ _ _ _ _ _ _ _ _ _ _
| 🚲🚲🚲🚲 🏍️ |
I’m unsure what this indicates? The peloton?
it's a bike shed
Of course, foolish of me
should I have coloured it differently to make it clearer?
I did think the Tour de France finished several weeks ago
How is the empty frozenset literal going to look like? f{/}?
that's pretty unique syntax
Ideally, of course, the shed would be a time machine and/or magic wand and we would ‘fix’ all the choices from the past. The argument has been made that the key distinction of dictionaries are the presence of a colon (:), so the notation ought be {:} for empty maps and {} for empty sets. This, though, will never happen.
(At least in the Python language; it’s fun to design your own language when you’ve anywhere between half an hour and ten years of downtime)
You presume the existence of a frozenset literal…
a time machine is already built into the shed
Fabulous, now we can properly focus on colour schemes
Give me fronzenset littlerals
Pls
write the PEP
i’ll join you on the frozendict PEP - or dict.freeze()
Wasn't frozenmap and frozen dict both rejected?
Or frozenmap never made it to the steering council
pydis gang needs to rise up in support
there’s both apparently, PEP 416 (frozendict, rejected) and PEP 603 (collections.frozenmap, draft)
From memory there were/are questions over what a frozen mapping ought be. I think 603 effectively proposed exposing HAMT, whereas 416 was ‘dict but immutable’. Take this with a grain of salt though, I haven’t read those ones in a while
Hi I can sell my last script in python dm me for more information !!
I haven't read through either PEP, but conceptually, nobody uses frozenset, so why expand that mistake to dictionaries?
Conceptually?
Lots of people use it
I use loads of frozen sets
Always hard to make statements like this though, it really depends on what kinds of Python code you see
for multithreaded applications, if you know it’s frozen the thread safety story is trivial
Useful to have the guarantee it won’t change (hashability is nice but I don’t really use it)
yeah, i'm generalizing. i've personally never seen a frozenset in a codebase, other than CPython
As far as I'm concerned nobody uses complex or Decimal either. But I'm sure there are domains where they are useful and common.
i’ve used frozendicts fwiw
dict.freeze is appealing to me too, if it returns a frozen dict
You can still get the underlying dict from a MPT ;-)
but that doesn’t actually freeze the underlying dict though
sssh
wouldn't it make more sense to just revive PEP 351 if freezability is the desire
there’s also the project verona folks and the deep immutability stuff too, but that was received even less warmly than {\}
This is HAMT vs ‘immutable’ though — I think the argument is that the former has certain nice properties
lol one downside of {\} is you need to escape the backslash to type it in markdown
You can use it today from one of the _testinternalcapi modules
it's {/} 🙂
It’s a forwards slash
Backslash isn’t currently a legal token in Python, forwards slash is
PEP 803: backslashes!
One of the first things I did with Python was generate the Mandelbrot set, complex was useful for that. I suppose probably useful in sciency stuff that’s not at the NumPy level?
my frustration with frozensets is that they have to wrap another container. it's like if you had to do tuple([...]) to make a tuple. still useful, just generally not worth the effort.
that, and we've kind of slept on frozensets internally. the empty frozenset() is not an immortal singleton, unlike all other immutable types
Sounds like a pr is needed
Oh boy, a 1MB MRE isn't quite ideal
TIL warnings.warn grew a skip_file_prefixes kwarg that is strictly better than the old stacklevel kwarg
A contextmanager would be better in case of wrapper libraries
Another sys.remote_exec question - is it possible to get remote execution to happen 'sooner' while the program awaiting on asyncio.sleep() or on select?
I.e. if i have a script running:
import asyncio
import os
async def _a():
# Print own PID and a counter for convenience
x = 0
while True:
x += 1
print(f"{os.getpid(): <8} {x: > 8}")
await asyncio.sleep(10)
def a():
asyncio.run(_a())
if __name__ == "__main__":
a()
and I do sys.remote_exec on that process, is there a way to avoid waiting for the end of the asyncio.sleep for the remote execution to happen?
no, there's not. PEP 768 is all about waiting until the interpreter hits a safe place within the eval loop before trying to execute arbitrary code, and unfortunately that does mean waiting at least until the next iteration of the eval loop.
if you control the code of the process you're attaching to, you can do tricky things like register a no-op signal handler using loop.add_signal_handler, though, and then send that signal after sys.remote_exec returns to trigger the event loop to run the signal handler, which will cause the eval loop to run, which will cause the script injected by sys.remote_exec to run
though I imagine that if you controlled the code of the process you're attaching to, you wouldn't be using sys.remote_exec in the first place, heh
That might still do me some good - I wouldn't be able to do anything about the first invocation of remote_exec, but I could use that invocation to set up the no-op handler and make future interactions more 'responsive'
yeah, that's definitely true, but it's not the approach I'd take. I'd instead set up a completely separate side channel with your first remote_exec, and then use that side channel for every future command you want to run
that's what we do in PDB's new remote mode, for instance - the PDB client spawns a TCP server, then runs a sys.remote_exec command telling the remote process being attached to that it should establish a connection to that server and accept PDB commands that come over it
Oh yeah I've seen - I've been into the guts of _PdbClient and _PdbServer this week. But you still have the same issue with debugging async apps - if the remote app is waiting on a selector (or just generally awaiting), you don't get a response back from the Server until the selector times out/returns.
Even after pdb is attached
well that's just down to the particular side-channel you're using
like, if you inject a separate thread, then nothing that's happening in the event loop thread will cause your thread to stop (short of acquiring the GIL and never releasing it, at least)
Huh, that's an interesting thought... and from a thread in the same process, I think it should be easy to throw signals at the main thread to spin the eval loop, with something like pthread_kill - I was running into some issues with a Python distributable that didn't seem to build in pidfd_send_signal
pidfd is still relatively new, there's definitely platforms out in the wild that don't support it yet
but anyway, the best overall structure is definitely using sys.remote_exec for one-time setup of some sort of side channel, and then using that side channel from there on out
Makes sense to me!
I may come back with some more questions - I’m working on essentially a TUI wrapper for remote pdb to add some quality of life features. I’m trying to keep it compatible with stdlib pdb so that existing processes can be attached to, so long as they’re running the same Python version
I'm the person who contributed remote pdb, so you can feel free to ping me here if you've got more questions 🙂
Isnt ur name Matt, I thought Pablo did the remote pdb, he gave that presentation on it at Pittsburgh PyConn
Oh
Wow okay, just read the pep. NEVER MIND
Pablo and Ivona (who gave the PyCon presentation together) are both teammates of mine. PyCon would only allow a max of two presenters per talk, and I don't enjoy giving talks anywhere near as much as Pablo does 😄
the 3 of us worked together on PEP 768 and the C portion of sys.remote_exec, but remote PDB in particular was almost entirely me. Pablo did an early prototype of it, but the design that he had couldn't handle a lot of PDB's features, and I don't think a single line of it survived to the final version

that early prototype was just done using 2 FIFOs, having everything typed in the client be sent directly to the remote process, and everything written by the remote process be printed by the client. That's simple and elegant, but it has lots of subtle problems - how does tab completion work? how does line editing work? how does the remote know whether syntax highlighting should be enabled? how does the remote know how wide the client's terminal is and where to wrap listings?
really, the inability to square that with tab completion is what killed it. Lots of other things could be papered over, but the approach where the client sends complete lines to the server over a fifo is just totally incompatible with tab completion
Ah
pdb has been getting a lot of love lately
remote access, and Tian fixed a long standing issue from reading commands from a .pdbrc that I just love
I still don't know how one develops pdb, I tried and it was difficult to walk through the code with pdb. Like debugging pdb with itself
print()
Yeah Ive developed a habit of never print debugging bc I could always use pdb
and it felt like I was doing something wrong
its the one time its appropriate I suppose
!otn a who debugs the debuggers
:ok_hand: Added who-debugs-the-debuggers to the names list.
I have no idea how @spark magnet manages to develop coverage.py, honestly. The few times I've tried to make changes there, I've immediately gotten frustrated with the inability to use PDB and coverage together. I think he's said that he has tons of conditionally enabled print statements just left in the library for debugging things
debugging anything that uses sys.settrace is just a nightmare
I do suggest people use the new .pdbrc commands feature though for programmatic debugging
Its nice
this is accurate
except it's not literal print statements, it's debugging code, enablable in a few ways
and btw, it's a pain in the ass debugging coverage problems sometimes
and you say that as someone who knows how things work, heh. it is super frustrating to need to debug unfamiliar code and be unable to use a debugger
for example: https://github.com/pytest-dev/pytest-cov/issues/708
Summary Running a test suite involving Celery task workers, the new coverage.py patch = subprocess feature correctly measures the workers. But with the same coverage settings, using pytest-cov does...
lol, you believe i know how coverage works 😄
https://github.com/nedbat/coveragepy/commit/ce732fd36b9edaba9932027328c157d6f565e4e1 is the patch that makes things start working?
oh thank god 😆
pytest-cov has a .pth file - is that what's changing the behavior, I wonder?
I wonder if the failure still reproduces if you mv $VIRTUAL_ENV/lib/python*/site-packages/pytest-cov.pth{,.bak}
i want to talk about this, but am on a work thing
I don't remember why I thought it sucked, but I felt like it solved a problem that I didn't have, and didn't solve the problem that I ha. I didn't need or want a special syntax with Python source blocks in it, basically just swapping out shell for Python in a Makefile. I wanted (and still want) a general-purpose DAG-based task runner for which "success" doesn't have to mean emitting a file to disk. Imagine Airflow but without a server and scheduler and all that: just operators, DAGs, runs thereof, and a database to track state(maybe just a directory of files like Git) . That tool still doesn't exist to my knowledge in 2025. But Dbt + DVC can get pretty close.
Snakemake is fine if all you want is "Make but for data processing in Python"
that doesn't fix it. I don't understand what my after_fork_in_child is doing that isn't already being done by other mechanisms, but it's only with pytest-cov that the problem appears.
and this is one of those problems that debuggers aren't good at: processes forking, starting, stopping, and the interactions that happen at those times.
could I get a debugger to step into a .pth file? i wouldn't try.
https://discuss.python.org/t/make-typing-namedtuple-classes-final/102502
what do you think?
This proposal aims to make typing.NamedTuple classes final—disallowing subclassing—in order to eliminate long-standing inconsistencies, surprising runtime behavior, and broken inheritance semantics. Consider the following example: from typing import NamedTuple class Point2D(NamedTuple): x: int y: int class Point3D(Point2D): z...
I can't tell: is it proposed to change the behavior of collections.namedtuple as well, or is the proposal to only change typing.NamedTuple?
I have definitely seen code in the wild - and probably written some myself - that does ```py
class MyClass(collections.namedtuple("_MyClass", "foo bar baz")):
"""
Some docstring
"""
That shows up in quite a few places in the stdlib - https://paste.pythondiscord.com/KWMQ
the docs for collections.namedtuple even give an example of doing this
I feel like people go pretty far to be able to attach docstrings everywhere that python should make better support for them.
Between class variables, module constants, and that namedtuple example, makes me think there should be better support.
one standardized and officially endorsed docstring convention would be great IMO
like what Rust has
I was in a conversation with nedbat and Guido at pycon two years ago. Guido said that docstrings were supposed to be for maintainers of code and that they're now overused.
Though "docstrings are for maintainers" seems to conflict with the help function being a thing.
It's common for libraries to generate autodocumentation from docstrings
It's also very convenient to read the documentation for the thing I'm using in my editor, either by hovering or going-to-definition.
Otherwise I'd have to open a search engine, find the documentation site for the library I'm using, and find the thing on that website
we should have done a better job capturing what was discussed
I can see that. Given you can run with them stripped out.
Ironically that would break a lot of modern libraries
it's not about running with them stripped out. For the core devs, it's about the UX of working in the code. Having the full docs in docstrings makes it more difficult to get to the code.
I was more replying that the docstrings were originally for devs only and not use of end-user
Which I think makes sense given u can strip them out to save memory
Not that I have any reason to not believe guido
I agree that overly long docstrings with tons of examples aren’t great - ideally it would be more ergonomic to crosslink between API docs and narrative docs. In practice teaching everyone to use sphinx at that level of sophistication just to write a docstring is a big ask.
I think it's possible that since he was speaking off the cuff, he was misstating his own opinion when he said "docstrings are for maintainers". I remember people in the conversation were talking about sphinx, and he said he thinks only documenting each module, level, and class individually doesn't actually document how the software is intended to be used holistically, and that he thinks code examples in docstrings are annoying.
only typing.NamedTuple
i'll reply to the comments in the thread soon, they're very right to call out some existing & valid use cases. it may not be worth it
Writing a humongous discuss.python.org post about the JIT fuzzer I'm working on. If anyone wants to preview it and give feedback on the draft, it's at https://gist.github.com/devdanzin/a000ca0bae1794e43336bc134382dfe0.
It's good! I was a little concerned about a file named coverage.py, but what else would you call it? 😄
Thank you for taking a look!
my codes and project done during Angela yu's course - Daniel110976/100-day-of-code
Instead of sending screenshots, I can just send this, and anyone can correct me on my code, as I'm a beginner and would love to learn
I am writing a library for parsing python bytecode. I resolve the extended args so that it's easier to work with but this also means I have to regenerate them if I want to convert the bytecode back to a bytearray that python can understand. I wrote integration tests that "rewrite" the whole standard lib and check if I have a 1:1 input and output after parsing and writing it back using my lib. During this process I found an interesting edge case in a file in Python 3.13 and I wondered if anyone might have an idea why python generates this bytecode. See below for what I found.
TLDR:
Python 3.13 generates this piece of bytecode:
1344 EXTENDED_ARG 1
1346 JUMP_BACKWARD 256 (to 838)```
my library outputs this:
1344 JUMP_BACKWARD 255 (to 838)```
it's fairly obvious that this means exactly the same. So my question is why does python generate this useless extended arg? This looks a bit like the chicken and egg problem. The existence of the extended arg is itself. The extended arg causes the oparg to exceed 255 and thus need an extended arg. Does anyone know why it would generate such bytecode?
Probably an accident of how the compiler generates the bytecode. Could be a nice small improvement to fix it.
it's such a weird "bug" though, what kind of logic would cause this to happen? And more so how would we find that logic to fix it 😅
Maybe an off-by-one?
Or some combination of optimizations to the bytecode that cause us to generate suboptimal code
yeah I think the latter is more likely the case
I have seen that happen before while writing this lib
it might also be because the jump target (838) is itself a jump with an extended arg
so perhaps the way that's calculated afterwards causes this bug
Python/assemble.c line 730
/* XXX: This is an awful hack that could hurt performance, but```
I made my library replicate the bug but it seems like python doesn't do it consistently everywhere so there's no way to get a perfect 1:1 output without preserving the original extended args
QR code tool is done
What do men truly want?
Please go to an off-topic channel for this.
My bad g
nice!
hey! do you have a github for it? I would be interested in trying it out
UIs
Hi, yes it's on GitHub: https://github.com/Svenskithesource/pyc-editor
The relevant code is in here:
https://github.com/Svenskithesource/pyc-editor/blob/main/src%2Fv313%2Fext_instructions.rs#L617
src%2Fv313%2Fext_instructions.rs line 617
pub fn to_instructions(&self) -> Instructions {```
You can run the integration tests to try it and see the error
thanks I gave it a star, will take a look after work 🙂
I'm not sure what you're trying to find/fix though. I think it's an issue with python, not my library
Hi does anyone know efficient way to define TabularCPD for the real data which has observed nodes. Doesn't contain latent nodes data. Using pgmpy or pomegranate. Please
This channel is for discussing internals of CPython, I’d take your question to another channel
how does one go about finding the implementation for operators?
I want to see how dunders are called
do you have an example?
you mean the implementation of an operator for a type, or how binary operators execute in the interpreter?
interpreter
This talk by Armin Ronacher has some pretty useful information: How Python was Shaped by leaky Internals
I think it's complicated nowadays, with the adaptive interpreter and new opcodes
Objects/abstract.c should have the generic functions for most operators, i think
just search the operator itself and maybe it's there
oh, yeah that's a complicated thing
main function should be PyObject_GetAttr() in Objects/object.c i think
and that handles __getattribute__ vs __getattr__?
the operator itself
the mechanics of distincting those are a bit complicated
generally the default slot function/implementation is located in Objects/typeobject.c
that's what I'm interested in, what you say is complicated
not the default impl necessarily
Objects/typeobject.c line 10565
/* There are two slot dispatch functions for tp_getattro.```
tl;dr _Py_slot_tp_getattro() when __getattribute__() is overridden and __getattr__() doesn't exist, _Py_slot_tp_getattr_hook() otherwise
this book shows how to implement a new dunder method in deep detail, going through the parser and runtime https://realpython.com/products/cpython-internals-book/
err, a new operator, along with a dunder method
All of the material is completely up to date for Python 3.9, the latest version of the Python programming language.
Seems like this page hasn't been updated in ages
eh, 3.9 isn’t that old
That's true, it's not even EOL yet
updating a book like this is a lot of work! just thought I’d share even though it’s a little old because it fit your ask almost perfectly
Yeah the book is perfectly fine
too old for the current state of the cpython repo's main branch, but most of the object implementation part should be fine i guess
I doubt this stuff has changed much and even if it has the gist will be right
i mean it's just a few file migrations and that, i guess
might be fine
hello and welcome to our wonderful python server. This channel is for in-depth discussion about the python language. try #python-discussion
We are same
same here
please don't post the same thing in a bunch of channels. please make sure that all your messages are on-topic for the one that you post in.
Ask in #python-discussion plz!
sure, thanks for informing
hello everyone, I've been really interested in performance improvements, and I'd love to contribute to the cpython project.
https://github.com/python/cpython/issues/138453
is this something I could try contributing to?
cpython welcomes contributions. If you think you can do it, just try it.
The cpython core team members are very nice, but be sure to value their time. Make sure your PR works, etc.
@fast elbow how do you make a fast game on python ?
Ask in #python-discussion plz
Wrong channel
Ask in #python-discussion
https://discuss.python.org/t/support-suppressing-generator-context-managers/103615
https://discuss.python.org/t/special-semantics-for-generator-context-managers/103616
Hey y’all, type checkers complain about this: from contextlib import suppress def foo(x: int, y: int) -> float: with suppress(ZeroDivisionError): return x / y and that’s great, that’s exactly what is expected 🙂 Any calls to foo() with y=0 return None and the type checker will pass as soon as we correct the return type o...
from collections.abc import Generator from contextlib import contextmanager @contextmanager def oopsie() -> Generator[None]: yield yield with oopsie(): ... This will raise a RuntimeError: generator didn't stop. Do you think we could detect that with type checkers? I think it shouldn’t be hard to cover 99% of cases, just countin...
hi
Trying to implement __annotate__ for dataclasses' generated __init__ method does make me wish there was an option to get all PEP-649/749 annotations in an unevaluated format.
My understanding of PEP-749 was that VALUE_WITH_FAKE_GLOBALS was added so you could raise a NotImplementedError if you did not support it or another format. However this doesn't appear to work currently, because call_annotate_function calls the function with VALUE_WITH_FAKE_GLOBALS in an environment with fake globals where NotImplementedError doesn't exist and neither does Format.
I'm creating an issue for this but it was fairly confusing until I realised what was going on. format == Format.VALUE evaluates as True even though format is Format.VALUE_WITH_FAKE_GLOBALS because Format.VALUE is a _Stringifier so the whole expression is converted to a _Stringifier.
looking at https://github.com/python/cpython/blame/d0c9943869bb143df445229444224930330ac0f3/Lib/linecache.py, i know this is theoretical, but wouldn't it be sensible to add an upper bound for this?
oh, although that's keyed by filename
in natural uses this can't grow bigger than all python files imported i guess... shouldn't be a concern
Python/compile.c lines 1047 to 1048
case YIELD_VALUE:
return 0;```
why does the stack_effect function in 3.10 say this opcode has no impact on the stack?
YIELD_VALUE
Pops TOS and yields it from a generator.
It should be -1 no?
yield expr is an expression, it evaluates to the value sent back
(through the send method of a generator)
If the result of the yield expression is unused, then that value is dropped, similarly how to e.g. the result of a call can be unused
!e
import dis
@dis.dis
def f():
x = yield 42
yield x
:white_check_mark: Your 3.13 eval job has completed with return code 0.
001 | 3 RETURN_GENERATOR
002 | POP_TOP
003 | L1: RESUME 0
004 |
005 | 5 LOAD_CONST 1 (42)
006 | YIELD_VALUE 0
007 | RESUME 5
008 | STORE_FAST 0 (x)
009 |
010 | 6 LOAD_FAST 0 (x)
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/DQ6KN6FWKMHXXC6DXBADIEHZOM
ah okay so 42 stays on the stack
42 is popped from the stack, then whatever was sent is put onto the stack
gen = f()
print(gen.send(None)) # 42, equivalent to next(gen)
print(gen.send(69)) # 69
print(gen.send(None)) # StopIteration
I hate unit testing so much
That's not really what this channel is about, but if you go to #unit-testing and ask how to make your unit tests better with pytest (you have to show your code), I think you'll find it can be kind of fun
Understanded, but try proofing random numbers and you will experience the pain
Ah okay I see, didn't know generators work in both ways
is there a tool for cpython that figures out what re-setup is needed between any 2 revisions so that when hunting regressions i can only put a repro and make git bisect do the rest, without writing a sophisticated script?
sometimes between revisions there are no changes to the C/Tool part so make doesn't have to be run
sometimes it's only make -j -s, sometimes make regen-cases, sometimes perhaps make regen-all.
it largely depends on which was the previous revision, if there was no cleaning.
oh
although i can just narrow down the bisection to those revisions that touch certain files...
and then assume that a certain degree of recompilation is always needed
sometimes i don't know whether i'm hunting a python regression or a cpython regression though. but that should be easy to spot in the first place. it's a "depends case-by-case"
think the easiest approach is to identify the area of regression and then filter out the set of bisected revisions to those that can be said need certain steps to full rebuild
but there can be changes in between that change other parts as well that next revisions depend upon... i think?
oh, right. i don't need make regen-cases, such things are included in commits.
Hows python3.13 without Gil for you guys
pretty difficult when underwater
Gil doesn't help when above water, which is where I keep all my stuff
I want to add more examples here, but maybe you’ll find something interesting on https://py-free-threading.github.io/examples/
this time something very general :-) https://discuss.python.org/t/common-syntax-mistakes-new-language-learners-and-python/103838
The other day I’ve seen a person write: counter =+ 1 in some convoluted code and be unable to find the problem. I’ve also seen this so many times: all = ( "foo", "bar" "biz", ) where you’d expect "foo", "bar", "biz" in the tuple. But you obviously would get "foo", "barbiz" because of implicit string concatenation. An...
in 3.7, PEP 563 introduced the from future import annotations directive, which turns all annotations into strings. This directive is now considered deprecated and it is expected to be removed in a future version of Python. However, this removal will not happen until after Python 3.13, the last version of Python without deferred evaluation of annotations, reaches its end of life in 2029. In Python 3.14, the behavior of code using from future import annotations is unchanged.
This is directly at odds with the messaging about how future is considered stable and a feature will NEVER be deleted?
(please ping reply)
Where do you see that messaging? Features are deleted all the time (just search for "removals" in the "What's new" pages of various Python versions).
It's true that the docs claim that "No feature description will ever be deleted from __future__", but I think that refers to future imports that have become the default never being removed (e.g. generators, with_statements). annotations is a bit of a special case, because it's the only future that has never (and will never) become the default. annotations will (probably) be removed eventually, but that's a long time in the future (the earliest possible date would be autumn 2029). As for the feature itself: it's just not needed anymore, since forward references are now possible without the import.
Hm
we won't delete the feature description from the future module but the feature won't become the default
but the feature will be removed, causing __future__.annotations imports to error?
The current plan is for it to raise a SyntaxError https://peps.python.org/pep-0749/#specification
we could make it a no-op eventually instead, can discuss that again in a few years
I suppose I need to think about if removing existing future annotations would break code.....
it should generally not in 3.14
As in, if I have a file with that feature and it no longer has that feature, will that file continue to work
I think it depends on if the third parties support it
you won't want to remove the from __future__ import annotations until you drop support for Python 3.13 and lower. Once you do, it should be safe to drop that import.
huh. Come to think of it - I wonder if there should be a PYTHON_IGNORE_FROM_FUTURE_IMPORT_ANNOTATIONS environment variable that allows people to test their code with the new semantics without needing to modify the code...
that'd let people test proactively to make sure that it won't cause them any problems to drop the import when Python 3.13 hits end of life
fwiw i'm very happy that inspect.get_annotations was made into an alias
this means that we probably support python 3.14 annotations without changes
I suggested adding some mechanism for globally turning off futures but it wasn't very popular
for turning off futures in general, I don't think there's any need. For the specific case of turning off the one future feature that's going to go away without ever becoming the default, there is a pretty compelling need
we're telling people right now that they ought to be able to drop from __future__ import annotations when they drop support for 3.13, but if 3.13 isn't EOL until ~5 years from now, surely we don't want the first time that they test without from __future__ import annotations to be when they try dropping it from their code in 5 years...
imagine they find a problem at that point - when 3.14 is already out of bug fix support. Not a great situation to find themselves in...
or anyone in tbh
could be bad for the ecosystem and then we have another abi3 on our hands
I guess the one redeeming thing is that string annotations aren't going away, so if someone does find a problem when they remove from __future__ import annotations in 5 years, they can probably fix it by just adding quotes
I guess maybe that's enough of a reason to not worry about it
France?
Françias?
If you're running a single-file script, you can use the -x option 🥴
Why doesn't IndexError for built-in sequences say what the index and the length were? That seems kinda useful
maybe it was missed? we added that sort of thing for a number of exceptions recently I believe
!d IndexError
exception IndexError```
Raised when a sequence subscript is out of range. (Slice indices are silently truncated to fall in the allowed range; if an index is not an integer, [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) is raised.)
Does string interning not happen only for dynamically created strings or also very large strings? Or any other possibility
I think it only occurs by default for names and module constants
What do you mean names?
It is happening for any string i make except when the string is made dynamically
I tried large strings they were interned to so idk what the limit is or when does a string not intern
function names, method names, class names, variable names, etc
Apart from those any string you make are also interned
x = "hello"
y = "hello"
id(x) == id(y) returns true
yep - "hello" is a module constant in that case
Yeah so a string made inside a function will also be interned right?
Yeah so any string literal of any size is interned right? And dynamically generated strings like "h" * 100 isnt
yes
Okay thnx
that's all implementation details, though
don't write code that relies on it 🙂
I don't think there's any guarantee that anything will ever be interned
Also for integers the range is -5 to 256 was this choosen in light of ascii fitting under this bracket
Wanted to know that only, i am not relying on it, just confirming my knowledge of internals
Except if you manually intern it
yep, except then 🙂
?
I doubt it had any relation to ascii, but I'm not sure. I think it was just a small range of commonly used integers
Hm well 256 is the range for 8 bits or 1 byte so each ascii character can be almost represented with those cached
As for -5 it looks like a safe lower hand since mostly just -1 is used for array indexing and all
sure, but I don't think it's common to write Python code that refers to ASCII values for characters
🤔 but internally ascii is just gonna convert to numbers so maybe for that?
Or is it direct binary conversion
those internal numbers won't be Python int objects, they'll be C integers
Correct
so they wouldn't benefit from the small integer cache
they'd be an entirely different type of thing that's never cached
Fair
i think moreso anything to do with byte strings
e.g. iteration or indexing yields an int in byte range
Hm PyLongObject pointers
that's true in Python 3, but the small integer cache predates Python 3
and in Python 2 iterating over a byte string gave byte strings
and likewise for indexing a byte string
so be it
Also just one more confirmation
If you do
x = 10
x = 12
Setting x to 12 uses new memory and not the one 10 was in and the 10 is still there in memory where it was and it will be garbage collected
because of the small integer cache, it will not be garbage collected
that cache owns references to interned copies of -5 through 256
Yea i mean except for that range
Yea yea i was talking about in general except for this range
Sorry the x = 10 example was wrong
Just consider it x = 2000 and then x = 3000
So 3000 will not be stored where 2000 was right? A new allocation will happen and 2000 will be gc'd later right?
for some period of time, both things exist
it's easier to see this if you think of an object with an explicit constructor, instead of an integer. Imagine you've got: py foo = MyClass() foo = MyClass() After the first line executes, there's a MyClass instance in memory, and foo refers to it. Then the second call to MyClass() executes, at which point there are two MyClass instances in memory, one that's referred to by foo and one that's a temporary. Then the foo = part executes, and foo now refers to the new MyClass instance that up until now was a temporary, and now nothing is referring to the original MyClass instance
it could never reuse the same memory because, while the second line is in the middle of executing, both objects need to exist simultaneously
Correct so a new allocation happens and the old one will be gcd
It wont overwrite the same memory
Makes sense
Well maybe someone could argue that you check what is being set and if there is a value already in memory for that variable just overwrite it or something but then the problem comes to maybe more space is required than the already existing memory so new allocation is the best we can do
think about: py foo = [1, 2, 3] bar = foo foo = [4, 5] At the end of that, bar refers to [1, 2, 3] and foo refers to [4, 5]. If the new value was somehow written to the original object, it would cause bar to refer to the wrong thing.
also, think about py class MyClass: def __init__(self, name): self.name = name if random.random() < 0.05: raise RuntimeError("unlucky") 5% of the time, that constructor fails. If you do py foo = MyClass("name1") foo = MyClass("name2") it can't change the .name attribute of the first MyClass instance, because there's still a chance that the constructor will fail, and so the second foo = won't happen, even though the second MyClass() call did happen
Correct, makes sense
10 will not be garbage collected also because it is a constant and constants are stored in code
this is also mostly the reason your "hello" example worked earlier
all constants in the same scope are stored in the code object's constant tuple
and all code objects, except the module code object, are referenced by a parent code object
therefore until the module code object dies (or until an exported function's code object dies), all constants within the scope of the code are alive and not GCed
def f():
x = "hello"
x = "bye"
return x
print(f()) # bye
in basic terms, "hello" isn't GC'ed for the entirety of this code's execution
when it is assigned to x, a reference to it simply gets pulled (copied) from the function's code object's constant tuple
and when it gets replaced with "bye", its number of references just gets decreased by one
until the end of that print() call, when the code is exiting, "hello" (and all other constants within the code) are alive and not GC'ed, since they are stored in the code object and the code object needs to be alive for the code to execute
also applies to both 2000 and 3000 in your other question as long as it's literally just 2000 and 3000 in the code
Ya i am aware of that and by 10 being gc'd i meant after the function scope is over only, obviously a number like 10 isnt still gc'd since its cached by the interpreter for interning reasons but for any other number it would be gc'd only when the function scope is over
The only thing i wanted to confirm was that new memory is made since obviously we have no guarantee of the old memory being that big to store this new stuff and also what godlygeek said, so memory allocation happens everytime
Well, now, that depends on what you mean by "memory allocation". A new object is made each time, but that doesn't necessarily mean more memory is allocated - it can reuse previously allocated but now unused memory
Correct what i mean basically is it will use some other place, basically any unused memory
But doesnt the allocator automatically look for already allocated free space
things might or might not get reallocated in the same place. There's no guarantees about that and you shouldn't rely on any such behavior
and it might not even make it to the allocator. Some dead objects can be reused (the "free lists")
any other number, as long as it's a constant, doesn't get gc'ed as long as the module or function object (literally the callable function) is alive though
unless you mean "gc'd" as in "reference decremented" and not completely destructed/freed
I am breaking my head over this. When python 3.11 calculates the max stack size of the following function it calculates a max stack of 8. I don't get how it reaches 8, if I manually calculate it I get 7 which happens at index 110. Can someone check what I'm doing wrong?
https://paste.pythondiscord.com/UDVQ
I am recompiling the bytecode and my tool also reaches 7, so I think this is a bug in cpython maybe
or I'm just missing something
If I compile this specific function with 3.13 I do get a max stacksize of 7
so I think it's a bug they fixed along the way
!e ```py
def f():
return id(3000) # 3000 is semantically temporary here
old = f()
temp = [] # prove there's no reallocations of 3000 by taking up memory in a "free space" after calling f()
new = f()
print(old == new)
:white_check_mark: Your 3.13 eval job has completed with return code 0.
True
Yeah i mean after the function is dead the stuff will be gc'd
Not relying on anything, just knowing the internals 
What i thought was when you try to do a allocation using the allocator, the allocator looks first for free space which was allocated to the process already, if there wasnt any free space then it requests the os for new space
yes. But, depending on the data type, it may not make a request to the allocator at all
it may just have some already partially constructed instances of that type sitting around for reuse
that's an optimization that the interpreter plays for types that are very frequently constructed and destroyed

does anyone have a fix for this i am using socket.getfqdn and it is returning the ipv6 and hence socket.gethostbyname_ex fails, in other project my db is trying to connect and again this issue is being raised in macos
Mac OS Version: 26
Python Version: 3.13
https://mystb.in/892042e1d5021df322, here is the error i get when connecting to the db
this may not be the correct channel to post this in, but this isnt getting attention anywhere else idk why, not in python-help neither the original cpython issue is getting any attention, its open since 2018
The cpython issue has gotten attention in the past, but it's hard to tackle because the issue is not reproducible for the devs that tried to. It might help figuring out whether there's some commonality for the pooled group of people that had the same issue.
If anyone is willing to teach me, I need to learn about how all of these work
I mean i tried on 20 macbook airs from around where i live everyone has this error
Only 1 macbook doesnt have this error which is of my friend from germany
Tbh i dont think its hard to find a macbook where this happens
Also this may be very wrong but i think its a combination of macbook and your wifi provider which causes this
That's the issue in my opinion. The reporters believe it's trivial to reproduce, the developers believe it's trivial to not reproduce, and no progress happens. Figuring out what's similar among everyone you know that has the issue and contrasting that to your friend who doesn't might be our best bet to diagnose this.
Ugh i mean if this is hitting someone connecting to a database then this should be fixed
Also i dont know how but psutil lib figures a private a ip correctly while this socket thing issues
It should be fixed, but it needs reproducing to be fixed and so far nobody figured a way to reproduce other than "it happens in these systems" which unfortunately doesn't include the developers' systems.
Ugh
Oh boy, we hit a segfault in the fuzzer (as opposed to in the fuzzing script), I'm never going to be able to reproduce it 😭
AddressSanitizer:DEADLYSIGNAL
=================================================================
==393769==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000020 (pc 0x55a51dacc61e bp 0x7ffd6e813260 sp 0x7ffd6e8131b0 T0)
==393769==The signal is caused by a READ memory access.
==393769==Hint: address points to the zero page.
#0 0x55a51dacc61e in Py_INCREF Include/refcount.h:279
#1 0x55a51dacc61e in _Py_NewRef Include/refcount.h:527
#2 0x55a51dacc61e in _PySet_NextEntryRef Objects/setobject.c:2817
#3 0x7fe2d6f4ceb3 in save_set Modules/_pickle.c:3498
#4 0x7fe2d6f40713 in save Modules/_pickle.c:4415
#5 0x7fe2d6f43cf9 in batch_dict_exact Modules/_pickle.c:3355
#6 0x7fe2d6f45c5a in save_dict Modules/_pickle.c:3417
#7 0x7fe2d6f406f6 in save Modules/_pickle.c:4411
#8 0x7fe2d6f4366c in batch_dict_exact Modules/_pickle.c:3333
#9 0x7fe2d6f45c5a in save_dict Modules/_pickle.c:3417
#10 0x7fe2d6f406f6 in save Modules/_pickle.c:4411
#11 0x7fe2d6f43cf9 in batch_dict_exact Modules/_pickle.c:3355
#12 0x7fe2d6f45c5a in save_dict Modules/_pickle.c:3417
#13 0x7fe2d6f406f6 in save Modules/_pickle.c:4411
#14 0x7fe2d6f43cf9 in batch_dict_exact Modules/_pickle.c:3355
#15 0x7fe2d6f45c5a in save_dict Modules/_pickle.c:3417
#16 0x7fe2d6f406f6 in save Modules/_pickle.c:4411
#17 0x7fe2d6f43cf9 in batch_dict_exact Modules/_pickle.c:3355
#18 0x7fe2d6f45c5a in save_dict Modules/_pickle.c:3417
#19 0x7fe2d6f406f6 in save Modules/_pickle.c:4411
#20 0x7fe2d6f412d5 in dump Modules/_pickle.c:4611
#21 0x7fe2d6f41df2 in _pickle_dump_impl Modules/_pickle.c:7744
#22 0x7fe2d6f42388 in _pickle_dump Modules/clinic/_pickle.c.h:724
Python/generated_cases.c.h:2361
SUMMARY: AddressSanitizer: SEGV Include/refcount.h:279 in Py_INCREF
==393769==ABORTING
walking on landmines
this is a cursed stack trace
increfing an invalid object inside of pickle? wha?
any idea why Python would calculate a max stack size of 1 for this function?
is the minimum 1 for some reason?
This is Python 3.13 btw
it comes from here
load 1 constant and return it, probably
no RETURN_CONST has no stack usage
otherwise it might just naturally have a minimum max stack size of 1
that's the whole point of the opcode really, because returning None was so common that they made a special opcode for it
special opcode for returning constants*, but yea
maybe it's just that minimum 1 stack size
what method is used to calculate it?
why do you care about the stack size? the answer to your questions might very well be that it's not important for CPython's purposes to be very precise
if I understand correctly, if we underestimate stack size, things explode, but if we overestimate we just waste some memory
yes correct, but I'm writing a library that allows for bytecode modification so I have to recalculate the stacksize and I was doing some integration tests where I noticed this discrepancy between my "algorithm" and what cpython outputs
so I wondered if there was any reason for this behaviour, so that I myself don't underestimate the stack size
Python/flowgraph.c line 757
calculate_stackdepth(cfg_builder *g)```
sorry wrong link
Objects/codeobject.c lines 503 to 505
if (con->stacksize == 0) {
con->stacksize = 1;
}```
legend, thanks
When python v2.3 came out, the MRO was switched from depth-first, left-to-right resolution of the class precedence list, to C3 linearization. What problems exist with the former?
I believe sometimes a class could appear after its base class in the MRO
are there any other good resources on python internals apart from this book which is paid
https://realpython.com/products/cpython-internals-book/
InternalDocs/ has docs on several topics
you can ask questions in here
sure but going into stuff one by one and how they were built and all in an order in a book are much better
I agree, FWIW I own that book and it’s pretty good
there are also some really good cpython internals blog posts i found once…
i haven’t actually looked at that since 2023 so no idea if it’s gone downhill
also you’re 90% of the way there if you just always check for errors in your C code 😝
yeah i did go through the sample and all the topics labeled, i find it pretty good too, i am fairly familiar with the internals but i wanted to go through the book or something organized for teaching purposes only, so that i can teach stuff in order.
for me running
CPPFLAGS="-I$(brew --prefix zlib)/include" \
LDFLAGS="-L$(brew --prefix zlib)/lib" \
./configure --with-openssl=$(brew --prefix openssl) --with-pydebug
gives me this
configure: error: Unexpected output of 'arch' on OSX
# class A has a method `bar`
a1 = A()
a2 = A()
a1.bar # x
a1.bar # y
a2.bar # z
This creates a method three times. How much machinery is shared between x and y, and between x and z? Are there any optimizations?
CC @ornate wyvern
i'm not sure what u're asking
it does create the method 3 times, thats true
if a1.bar caches bar, the next time ti'd use the same bound method, not create a new one
as for a2, it'd be the same, create on the first call, use the cached one next
@boreal umbra
you're assuming that's how it works.
no, its not how it works
i've implemented cache for get
the way i implemented it works like this
I'm asking how it really works, in the internals-and-peps channel.
how it really works is that it creates a new method on every lookup
I know, and I'm asking what optimizations or shared machinery there is.
oh alright, that i dont know, others can maybe answer that
if anyone knows something, i'd appreciate a ping
I believe it creates a new bound method object every time. However, usually you'd of course do a1.bar() or similar, and in that case, the specializing adaptive interrpreter creates specialized code that optimizes out the creation of the bound method object.
>>> class A:
... def bar(self): pass
...
>>> def f():
... x = A()
... x.bar()
...
>>> for _ in range(100):
... f()
...
>>> import dis
>>> dis.dis(f, adaptive=True)
1 RESUME_CHECK 0
2 LOAD_GLOBAL_MODULE 1 (A + NULL)
CALL_NON_PY_GENERAL 0
STORE_FAST 0 (x)
3 LOAD_FAST_BORROW 0 (x)
LOAD_ATTR_METHOD_WITH_VALUES 3 (bar + NULL|self)
CALL_PY_EXACT_ARGS 0
POP_TOP
LOAD_CONST 0 (None)
RETURN_VALUE
maybe look at Pyenv instead? It automates Python builds and makes it easy to switch between them.
it’s what I use with brew on my Mac for all the work I’ve done in the past year and a half adding free-threaded support across the ecosystem
i’m pretty sure method objects are on a freelist too, so the creation of them isn’t that expensive
ya well, i was following the book
makes sense! I skipped the “building python” bits because I’d already figured out pyenv, I’ve used it for more than ten years at this point
fair
how do you build locally with pyenv?
unlike uv, pyenv will download sources and build on your machine
at least on linux
i mean i cloned the cpython repo how do i compile it using pyenv or build
https://devguide.python.org/ has instructions for that
it has that ./configure thing which gives me an error
configure: error: Unexpected output of 'arch' on OSX
that's not something I've seen before, might be worth reporting as a bug and trying to figure out what's special about your setup
lots of people definitely are compiling successfully on osx with those instructions
so there must be something slightly unusual about your setup
What version are you compiling?
i searched and there are reports about apple silicon chips not being able to compile versions around 3.9
3.9 the one in the book so that i can follow it
oh yeah that might predate mac aarch64 support
this book
which commit are you using?
mmm so what do i do
probably best to just compile the latest version, things will be a little different but probably you should be able to get things to match up
3.9.0b1 is the version to be precise
hm the book works on the old parser while now python has the peg parser
you could set up a linux VM or docker image
oh apple silicon support was in 3.9.1 https://docs.python.org/3/whatsnew/3.9.html#macos-11-0-big-sur-and-apple-silicon-mac-support
so you should just be able to check out a more recent commit on the 3.9 branch, things won't be that different
i use orbstack for docker on macs, it’s pretty neat
fair when was the parser added i think 3.9.1 shouldnt be much different
ah that will also work too
that would have been in a different major version, I think 3.10
patch versions usually don't change a lot of things
correct
(i think they're supposed to be backwards compatible)
yeah 3.9.1 it is then, ill try
||if we ignore the 4300 incident||
do you care a lot about the parsing details? otherwise I’d probably still use the main branch of cpython
or the 3.14 branch
i dont care, just wanted to follow the book since it takes in account the old parser
C is not memory safe, so for security reasons we rewrote CPython in Rust for Python 3.15.2
otherwise i already have the new version compiled
I don’t know how much in that book is out of date, but you’d probably learn something finding out where it is and why
and you’d come out the other side having a better handle on the current codebase
thats fair
but it was more for showing something to other's, so i needed to be compatible, for myself i have the newer version compiled
hm 3.9.1 worked thnx a bunch
hello guys , there's this python exercice i really don't understand , i didn't understand the third instruction https://colab.research.google.com/drive/1icNch9D4CwKQj_qp93f2NIHLLdxljvB6?usp=sharing#scrollTo=fWX651L_Xj9j
Best to ask on a #❓|how-to-get-help channel
!rule 6 9
6. Do not post unapproved advertising.
9. Do not offer or ask for paid work of any kind.
HI 👋
I want to read Fluent Python. Do you have any opinions about it? Is it a good book?
I have made some simple projects in Python, like a stock price downloader that converts data into CSV or JSON. I also made some videos using Manim
So basically, I’m still a beginner.
#python-discussion is a better channel to ask! More people and more beginners who are learning than this channel.
I was not a fan of the previous PEP, this one seems fine. I'm not sure I like lazy from foo import bar over from foo lazy import bar, but eh..
I also don't understand the point of the syntax restrictions, presumably this could also work inside functions etc.
Anyone interested in giving https://github.com/python/cpython/issues/63497 a look? Seems interesting to me.
Why do you need it in functions?
This would probably complicate access to local variables
Right now it's just a lookup in an array
(and also there isn't a big use case for it)
e.g.: star imports are not allowed inside a function as well
TIL
but also thank you whoever decided that
* imports: let’s not
I still need to fix one spot in NumPy that does that and shouldn’t
It provides that nostalgic feeling of "which one of these damn header files this comes from?"
I don't need it, I just saw no reason for the restriction.
...well, nostalgic to C developers, I am just getting acquainted with C
before numpy 2.0, the entire public API was set up using * imports and that meant that we leaked all kinds of implementation details into the API and also as a bonus made it impossible to find where things are defined.
well, not impossible
yeah, I frequently look at library code on github and it's frustrating
just harder for no reason besides laziness
I don't see how it would, but I'm not too familiar with the details to contradict you.
But couldn't you just put the placeholder objects in the locals array and resolve them the same way they get resolved when it's a normal namespace?
sounds a lot more complicated than indexing into a C array
hm, I guess it could be specialized into a different instruction
like it would be specialized for globals
but it would be extra work and extra code, and I'm assuming there isn't a usecase for lazy imports within a function
what benefit would there be in having a lazy import inside a function and not at the module level?
I guess (almost) none, yeah.
Yeah I mean fair
If you want the import to be done conditionally, then it won't run if the function is not called
But can't you just put the import in the conditional?
wdym
maybe you have a closure or something like that: ```py
def f():
lazy import foo
def g():
foo.bar()
do_something(g)
``` but in that case, just put lazy import foo at the module level
yeah, lazy imports lets us abandon all the delayed import hacks we’ve accumulated like this
I think this is the biggest reason to like lazy imports.
Having all imports that a file might use defined at the top is very nice.
It can sometimes be annoying to search a 10k+ LOC file for all the imports it randomly does. (Not hard if I am explicitly looking for it, but it is extra mental space I would rather not have to take up)
https://peps.python.org/pep-0810/#syntax-restrictions
Yeah I wonder how this would work given __import__ exists in importlib
Will there be a specific importlib util for proper lazy loading?
Will I be able to use importlib.util.import_module(..., lazy=True)
Partially intialized modules....
A module may contain a lazy_modules attribute, which is a sequence of fully qualified module names (strings) to make potentially lazy (as if the lazy keyword was used). This attribute is checked on each import statement to determine whether the import should be made potentially lazy. When a module is made lazy this way, from-imports using that module are also lazy, but not necessarily imports of sub-modules.
The normal (non-lazy) import statement will check the global lazy imports flag. If it is “enabled”, all imports are potentially lazy (except for imports that can’t be lazy, as mentioned above.)
Will this raise an error if defined after the first import statement? I think it should. Otherwise a user might be confused as to why it's not working.
When the module is first reified, it’s removed from sys.lazy_modules (even if there are still other unreified lazy references to it).
And then
If reification fails (e.g., due to an ImportError), the exception is enhanced with chaining to show both where the lazy import was defined and where it was first accessed (even though it propagates from the code that triggered reification). This provides clear debugging information:
If accepted this seems to have a small question of what if the following code would error with and where:
# main.py
lazy import json
import util
data = json.load(...) # reified here
util.dump_json(data)
``````py
# util.py
lazy from json import dumsp
def dump_json(...):
dumsp(...)
having lots of fun testing the proposal out
Cynically, now we just accumulate additional hacks 🙂
I would expect the error to occur on the call to dump_json with an import error the same way that
import json
from json import dumsp
raises the error on the from import
Downside is that supporting old python means I have to maintain all of my hacks anyway
I wonder how much effort it would take to create a third-party “backport” of PEP 810. I’d guess it’s possible with PEP 523 hooks, but perhaps infeasible.
Could probably do it with a custom coding and a context manager. Parse it all with ast and rewrite the module as a class, implementing the lazy import as a descriptor
Though I'm not sure how we would make globals() return the unreified instance
import hooks?
FWIW, I’ve experimented with a mix of those ideas to somewhat emulate PEP 690 behavior. Fun but imperfect: https://github.com/Sachaa-Thanasius/defer-imports
Idk if the same general approach would be enough to emulate 810.
… looking at the reference implementation, maybe not, lol. Benching the idea for now.
There's some use cases where we defer importing (import inside a function) because of optional features that depend on packages that may may not be installed; would that fail fast as a lazy import? Or, does the proposed lazy import accept everything and defer module not found till later?
It defers everything until you actually access the imported stuff
So if you do: ```py
lazy import json
print(json)
The actual module is loaded on first use of that name.
Oh that's good. I wonder if this would lead to deprecating importing outside global scope, curious if there's any valid use cases left
idk, ```py
def try_load_windowsthing():
try:
import windowsthing
except ModuleNotFoundError:
pass
else:
self.set_thing(windowsthing)
maybe you're fiddling with sys.path before loading this function
huh, apparently not? https://discuss.python.org/t/pep-810-explicit-lazy-imports/104131/58
i'm confused now
oh, reading the globals doesn't count as using the object
it does, it's just globals()["json"] that would get you the proxy thingy
yeah, I realized
maybe the PEP could specify more precisely what it means to "use" the imported name
idk
oh wait, globals() are mentioned explicitly in the Q&A of the PEP
I feel they put some things in the FAQ section that maybe should have gone in the spec section
but shrug
I am just bad at reading
I get the rationale of "external introspection should generally be unaware of the change but internal introspection should be aware of it, as that's the code that's most able to change" but I too think it'd be better to put it somewhere other than the faq
would you find this small option useful?
❯ ./python -c 'import sys; sys.version_info'
❯ ./python -pc 'import sys; sys.version_info'
sys.version_info(major=3, minor=15, micro=0, releaselevel='alpha', serial=0)
❯ ./python --help | rg -m 1 -- '-p'
-p : print the result of the program passed in as string (use with -c)
(the interface can be a little different, i'm just asking about the idea)
Would you find this small flag useful? ❯ ./python -c 'import sys; sys.version_info' ❯ ./python -pc 'import sys; sys.version_info' sys.version_info(major=3, minor=15, micro=0, releaselevel='alpha', serial=0) ❯ ./python --help | rg -m 1 -- '-p' -p : print the result of the program passed in as string (use with -c) It eliminates the ...
yo can u. make a bot using ios on python
I like the idea, very iPython-esque
I like idea that I could run a notebook cell via command line, without having to add in a trailing print statement.
was about to ask how this worked but after implmenting a repl recently no yeah that's exactly how I would implement it I think
sigh guess I'm doing a small refactor to my eval command shortly
!e 3.14
import annotationlib
def f(x: int): ...
print(annotationlib.get_annotations(f, format=annotationlib.Format.STRING))
print(f.__annotate__(annotationlib.Format.STRING))
:x: Your 3.14 pre-release eval job has completed with return code 1.
001 | {'x': 'int'}
002 | Traceback (most recent call last):
003 | File [35m"/home/main.py"[0m, line [35m5[0m, in [35m<module>[0m
004 | print([31mf.__annotate__[0m[1;31m(annotationlib.Format.STRING)[0m)
005 | [31m~~~~~~~~~~~~~~[0m[1;31m^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
006 | File [35m"/home/main.py"[0m, line [35m3[0m, in [35m__annotate__[0m
007 | def f(x: int): ...
008 | [1;35mNotImplementedError[0m
why is the second one not implemented?
Because the implementation is in annotationlib.call_annotate_function
You're not intended to call the __annotate__ functions directly
otherwise I'd have had to write STRING in raw bytecode which would have been exciting
Nice spam
thought experiment: what if global del were a compound keyword that deleted the actual object from memory and gives all its remaining references to None
though come to think of it, I doubt there's a mechanism to get all the references to an object that isn't traversing the whole reference graph
with a name like that it might as well leave the pyobject pointers dangling 😆
That's just how it could be done without introducing a new keyword. But soft keywords are popular now.
If python has to check before accessing any object that it's still there, every time it follows a reference, that kills the idea
How would referencing a stale pointer be handled?
Ah yes, static in C
You'd get None
So almost the same effect as obj.__class__ = type(None)
that doesn't work
Theoretical effect
I guess 
We'll need to introduce a borrow checker to typing
that's not even enough