#internals-and-peps
1 messages · Page 27 of 1
CC @grave jolt, since we discussed this earlier
what would be the best way to hash an "arbitrary object" that has the same value between programs? i was told that using hash() is a bad idea, because the hash is not guaranteed to be the same (which is true, but it's not exactly my problem -- as in, if a user implements an odd __hash__, there's nothing i can really do about that).
i ended up writing a function that looks like this:
def _hash(self, value: Hashable, size: int) -> tuple[int, int]:
if isinstance(value, str):
# String hashes are not retained between programs
hashed_str = int(
hashlib.sha1(value.encode("utf-8")).hexdigest(), 16
)
return hashed_str, hashed_str % size
hashed = hash(value)
index = (hashed & 0x7FFFFFFF) % size
return hashed, index
is there something inherently wrong with this?
i'm opposed to writing my own stable_hash protocol that's guaranteed to always be the same, because what's the point if there's already an existing __hash__? is it really that common for objects to have different hashes between interpreters?
Hashes for some objects are randomized, meaning they will be different from one run of the interpreter to the next.
$ python -c 'print(hash(("x",)))'
4734399606021899668
$ python -c 'print(hash(("x",)))'
7577783400188628811
$ python -c 'print(hash(("x",)))'
8033416648646465101
Your comment indicates you're aware of that for strings, but checking for strings only at the top level isn't enough, because many objects compute their hash by combining the hash values of the objects they contain
ok, so i could special case some collections to deal with the strings inside them, is that reasonable?
no
what if it's a dataclass
you essentially end up having to know the internal structure of every object you're trying to hash
i’m unaware of how they hash, are they randomized?
they combine the hashes of the values in the dataclass
also, the hashes of some objects (e.g., types) are based on the memory address, so they also won't stay the same across runs
ah. i thought about setting PYTHONHASHSEED, but that only works on interpreter startu
FWIW, i’m technically not doing this for “any object,” just types that are serializable by pydantic (which doesn’t include types, i think)
why not do something like serialize to JSON and then take the md5 of the JSON?
that's not particularly fast, but it is stable
that’s probably what ill end up going with, it’s just nice to support user-defined __hash__ methods
Forces you to use a session.
(httpx has a sync interface which doesn't, so for quick interactive requests you type less)
PyPI's astunparse?
!pip astunparse
Be aware if you do this, you have to be very careful as it's very easy for python objects that are "equal" to serialize to different json
serialization/deserialization protocols aren't generally interested in making a guarantee that "equal objects serialize to the exact same thing". They're interested in the guarantee that the value is preserved when it round trips.
Sets don't care about order at all or guarantee anything, dicts have some guarnatees around ordering but they are still "equal" if their ordering is different.
when you turn these things into json, you'll potentially get differently-ordered json objects/arrays, and thus different md5
Mm, true.
that’s a very good point. maybe i could do some extra check to force the JSON to have a certain order?
well....
it gets a little bit tricky
for dicts, the keys have to be strings, so you could force them to be in key sorted order - that's relatively easy
and having the same key twice is pretty questionable anyway so you don't really have to worry about ties
the issue is lists
to sort lists you'll need to defining a sorting order over json values, which is... annoying
well.... but then, you won't want to always sort the lists. just sometimes.
so it gets pretty messy. you'll basically need to define your own json serialization.
json.dumps() has a sort_keys=True option, for what it's worth
e.g you would want lists to simply go into a json array in the same order - but sets you would need to perform sorting
Agree that that doesn't fix all problems though
yeah, sort_keys will fix the dict issue
(which is the easier issue)
the real headache is the json arrays
honestly, the best solution seems to be messing with PYTHONHASHSEED
that removes the string randomization from all objects
What is python ?
5883826190306
$ python -c 'print(hash(None))'
5905208968162
(and that's not affected by PYTHONHASHSEED)
is None affected by PYTHONHASHSEED?
damn it
i would have assumed the None hash was just zero
Why no one program on the 1 and 0 ?
That's the thing: you should not assume things about hash 🙂
not disagreeing, but there's not really much else i can do, is there?
Every program project 40% stealing 40% ai generated 10% eating 10 % actual work
you really just need a stable hash function and a stable conversion from whatever object to bytes. trying to just json and pythons __hash__ is probably the wrong call
More seriously it just doesn't seem like hash() is the right tool for what you want
You'll have to define your own hashing mechanism
probably. i lose potential support for any objects that support __hash__ though
Right, but as we've been discussing, support for such objects is likely to create bugs for you
I can't understand this chat
Because there's a good chance their __hash__ depends on hashing a string or None or something else with an unstable hash
What is hash ?
You can define a custom_hash for types like list, dict, int, str, None etc., and for compound types you can iterate over the fields in a predefined order. Like:
def custom_hash(obj):
if isinstance(obj, int):
return int.to_bytes(byteorder='little')
elif isinstance(obj, str)
return hashlib.md5(obj.encode("utf-8")).digest()
elif obj is None:
return 0
... # handle list, dict, etc.
else:
field_names = sorted(obj.__fields__)
values = [obj[k] for k in field_names]
return custom_hash([type(obj).__name__, *values])
Grinding is hard on this one
For a similar case, I have:
return xxhash.xxh64_digest(msgspec.msgpack.encode(payload), seed=0)
which is limited to types msgspec knows how to encode to msgpack (it will do so recursively), and reliant on 2 libraries (xxhash and msgspec), but you can do basically anything similar.
theoretically speaking (i'm pretty much convinced by now to not use hash()), could one set PYTHONHASHSEED, and then monkeypatch None.__hash__ to return 0 or something?
How to code guys ?
depends on the value of "theoretically". You'd need to use very bad hacks, like using ctypes to modify NoneType's methods
yeah, that's what i mean
I mean that's a "global" thing so I really don't suggest doing that
you would have to change it back afterwards
what's actually the thing you want to do here
this, probably
pretty sure the effect remains for the lifetime of the process?
associate some kind of shorter, representative string, to an arbitrary python value?
not "represenative" in the sense of debug information, but something like a UUID or hash or whatever
actually, i take that back, i don't think you would have to set anything. from some experimenting, setting PYTHONHASHSEED does nothing once the interpreter has started. would setting PYTHONHASHSEED, and then spinning up a subinterpreter work? it shouldn't affect the current proc
I haven't checked the code but I'd assume it's read only at process startup
If so, a subinterpreter wouldn't be enough
a process pool should work
Python/initconfig.c line 1509
config_init_hash_seed(PyConfig *config)```
create a process pool of size one with PYTHONHASHEED set appropriately
and don't use fork to create the processes
why?
you know more than me here, is config_init_hash_seed called at process startup, or interpreter startup?
either way, why take the chance
not sure, sorry
i'm just curious at this point
i traced the top level function to PyConfig_Read, what about that?
I got that far too but I don't know. Haven't looked at this part of the interpreter much
Forked processes would not re-execute the Python startup code, so they won't read the value of the env var
it does not :(
import os
import _xxsubinterpreters as _interpreters
print(hash("123"))
os.environ["PYTHONHASHSEED"] = "0"
interp = _interpreters.create()
_interpreters.run_string(interp, "print(hash('123'))") # prints the same thing
that makes sense. Note that changing the environment after process startup is inherently thread-unsafe
@uneven raptor i think I got it, fwiw
got what? runtime modification of PYTHONHASHSEED?
with ProcessPoolExecutor(max_workers=1, mp_context=multiprocessing.get_context("spawn")) as ppe:
print(ppe.submit(hash, "123").result())
if you put this in your program after the os.environ call (and obviously add the needed imports)
you should see different values
You can spawn the PPE once at top level, and just have a convenience function that takes the ppe and the object to be hashed and computes the hash. so the overhead won't be too bad
unfortunately that's a very expensive operation just for calling hash()
how many hashes are you calling? Keep in mind that you just do this for the "top" level hash you need to compute
takes about 50 ms
though most of that time is simply waiting to get back the message, the CPU time is more like 1ms
well, this would be in a function that gets called quite a bit
yeah. It's better not to mess with this stuff anyway. It really depends on just how arbitrary of a python object you want this to work on.
if you're dealing with reasonably constrained set of objects, then it's not that bad
So, is there any fundamental reason PyREPL doesn't support command history in Windows, or would it be OK to add?
I've submitted a PR to add history support for PyREPL in Windows. Anyone wants to give it a try?
is it possible for cpython to be built without _socket? types.CapsuleType relies on that
did you try?
no, i looked through configure's options and didn't see anything
did i miss some blatant option somewhere? 😅
printf '*disabled*\n_socket\n' > Modules/Setup.local
./configure --with-pydebug && make -j
interesting. is there a better option for exposing a CapsuleType?
I don't think we need to support that sort of configuration
You're right, there is nothing in our documentation that mentions this feature. IIRC, there is only a mention in the changelog, which is quite difficult to find.
You can do it if you really want to but we shouldn't need to cater to that sort of thing in the rest of the implementation
sure, i'm just speculating. i saw that in types and was wondering if it would be problematic
oh.. we do have a types.CapsuleType.... I've thought it wasn't exposed in the types module..
it's pretty recent
yeah, it was added 10 months ago
it does, but types doesn't rely on its existence
CapsuleType is defined as type(_socket.CAPI). what happens if _socket isn't built?
>>> import types
>>> types.CapsuleType()
Traceback (most recent call last):
File "<python-input-1>", line 1, in <module>
types.CapsuleType()
^^^^^^^^^^^^^^^^^
File "/home/eclips4/programming-languages/cpython/Lib/types.py", line 336, in __getattr__
import _socket
ModuleNotFoundError: No module named '_socket'
as expected
so it does rely on it
hmm yea wait
yes it is
oh
it doesn't break import of the module, though
Lib/types.py lines 334 to 338
def __getattr__(name):
if name == 'CapsuleType':
import _socket
return type(_socket.CAPI)
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")```
The capsules are ... it's a complex thing.
it's a dynamic get-attribute
so if _socket doesn't exist, it won't break the entire thing
are all extension modules optional? or just ones like _socket
depends on what you mean by "optional"
IIRC _datetime has a capsule, if that's more stable than _socket it might be worth moving to that
probably most of them
i think there's no difference between _datetime and _socket
there are certain extension modules in the stdlib that rely on the presence of third-party dependencies that may or may not be present (e.g., tkinter, gdbm)
those are really optional, and it's not unusual to encounter a system that lacks some of them
then there are those that are always built by default, but you could massage the build system to remove them if you try hard enough, such as _socket. I don't think those are "optional" in a meaningful sense.
And then there are modules that are really built deeply into the interpreter, such as sys. If you tried hard enough I guess you could build CPython without sys, but it would be a lot of work.
ah -- i was wondering if _socket was similar to _tkinter, in the sense that it relies on an external dependency
I think PyPy doesn't have a _socket module. And ideally the pure-Python parts of the stdlib should be written so that they work out of the box for all Python implementations, not just CPython. (That's impossible to achieve fully, but we do the best we can.)
It does have a _socket module, but it lacks _socket.CAPI in 3.10, while it's present in CPython 3.10. But it does have a notion of capsules.
does pypy even have capsules in the first place?
pypy supports most of the CPython C API
what does os._exit do, specifically? the source seems to call os__exit_impl, but i wasn’t able to find the definition of that function
call _exit()
Modules/posixmodule.c lines 6684 to 6690
static PyObject *
os__exit_impl(PyObject *module, int status)
/*[clinic end generated code: output=116e52d9c2260d54 input=5e6d57556b0c4a62]*/
{
_exit(status);
return NULL; /* Make gcc -Wall happy */
}```
oh, there it is. gh search wasn’t bring it up
this internal module also covers nt for some reason
despite being named posixmodule.c
Docs for compile() and ast.parse():
This function raises SyntaxError if the compiled source is invalid, and ValueError if the source contains null bytes.
Behavior since 3.12 (it does raise ValueError in 3.11):
>>> compile("\x00", "lambda.txt", "exec")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SyntaxError: source code string cannot contain null bytes
Does it look like a valid doc issue I should file, or am I doing something wrong?
Good catch, definitely something needs fixing. Possibly it should be changed back to raise ValueError for compatibility. Would be good to bisect to see why it was changed
Wouldn't that break code that has adapted to the 3.12 and greater behavior? I was thinking about updating the docs, but if you think a behavior change is warranted I can run a bisect.
It would, but that change is undocumented and it's still early in the life of 3.12. It might still be better to keep the change since it's been released and nobody appears to have complained, though.
Ah, it's a docs issue, the change is in whatsnew for 3.12:
https://github.com/python/cpython/blob/d1a1bca1f0550a4715f1bf32b1586caa7bc4487b/Doc/whatsnew/3.12.rst?plain=1#L600-L602
Doc/whatsnew/3.12.rst?plain=1 lines 600 to 602
* :func:`ast.parse` now raises :exc:`SyntaxError` instead of :exc:`ValueError`
when parsing source code containing null bytes. (Contributed by Pablo Galindo
in :gh:`96670`.)```
If you send a PR, request review from me and I'll look at it tonight
Thanks, I pinged you: https://github.com/python/cpython/pull/122462
Update docs for compile() and ast.parse() because they raise SyntaxError instead of ValueError for null bytes since #97594.
Issue: gh-122461
📚 Documentation preview 📚: https://cpython-preview...
a PR of mine has one of the docs jobs failing due to check-warnings.py missing. is that my fault, or is there something wrong with CI?
hm we may have been changing that recently, I think there's some security project. Maybe try merging in main into your PR branch
looks like that did it, thanks!
Hey, not sure if this is the right channel to ask. But I noticed that IDLE (for python 3.12.4) very often freezes, like the Not Responding thing on windows, when massive amount of data is being printed, for example after running dis.dis(...) with a very large code object which has a lot of bytecodes. Is it supposed to be like that?
Who know how to make minecraft cheats
That's quite offtopic here. Maybe start in #python-discussion and be aware of #rules against helping with malicious stuff.
Thx
Hmm, a ChatGPT generated PR.
There are just so many ways of exiting the new REPL with errors... This one looks like a "then don't do it", but I will report it in case someone wants to make pyrepl bulletproof...
import builtins
builtins.__import__ = lambda x, y, z, a, b: None
# -> Exception ignored in the internal traceback machinery, exits interpreter
lambda x: None exits with a different error. Both "work" in basic REPL as in it doesn't exit (but imports are obviously borked).
if something is deprecated on 3.14, is that something that should be reflected on typeshed? or does typeshed only add types for versions that are past the feature freeze
we sometimes tend to wait until the feature freeze because things can change still
I'd probably accept a PR marking new deprecations though
ok, good to know
thanks for merging 🎉
Why is Python3.13 so much slower than Python3.12.4?
what makes you say that?
Performance tests
#1268631691002380369 message
cool, but off-topic
also rule 6
Thanks, I am new here. I am so sorry.
@shy mango It seems you found a real regression in performance in python.org mac builds, thank you for your investigation and for filling the issue 🙂
I'm glad to be able to help. I had always imagined that the Python repo would have so many issues and that if I ever reported anything it would just get pushed to the bottom and never viewed. But it seems that I was wrong.
Knowing this, I will likely be quicker to report issues I find in the future.
Thank you for listening and helping out.
I have 48 hours to train a model before the VM where it's running goes down for maintenance, so I asked ChatGPT to make a context manager that breaks out of the context after a fixed amount of time (to save the model in its then-present state right before the VM goes offline). The (human-refactored) solution is here: https://paste.pythondiscord.com/UEOA
I was surprised that it worked for my test cases. work. It depends on signal.SIGALRM. Though I wonder in what ways this approach sets one up for failure.
You will need to thread for your use case as time.sleep is blocking
Or rather processes since your training will be blocking and CPU intensive
did your test cases actually call the library that you'll be using for training? Whether or not that works depends on whether the code that gets interrupted by the SIGALRM is actually prepared to handle that exception by finishing what it's doing and immediately propagating the TimeoutException upwards. time.sleep() is prepared to do that, but your ML library might not be
I'd generally expect that to play pretty poorly with C extension modules that drop into native code for long periods of time. They'd need to poll PyErr_CheckSignals to make it work.
Nope. (And I'm also not actually using the context manager.)
well, then - be warned that whether or not this can actually interrupt running code depends on what that code is doing
Is there any news on that lock file pep that Brett's been rewriting?
!pep 665 this one?
oh wait, i didn't see the supercede, oops
!pep 751 looks like you're looking for this
Actually @boreal umbra: a reasonable proxy for whether or not this will interrupt whatever's running is whether or not a ctrl-c can interrupt that thing and raise a KeyboardInterrupt. It should work in almost exactly the same set of cases
is there a reason there's a Core and Builtins as well as a Core_and_Builtins directory for the NEWS entries
the former has way more entries than the ladder
same goes for C API and C_API
We're transitioning from one to the other
the ones with spaces will eventually go away
yes, the newest release does
Mayhaps
there's a new draft yeah
with 2 types of locking this time!
yup, i've already upgraded my blurb
Hello Everyone,I am Planning To DSA in PYTHON if Any body Interested let 's Connect!
DSA = Data Science & AI ?
Data Structures and Algorithms
Wrong channel
Having a file called _pyrepl.py in current directory makes the interpreter silently unable to start (or rather: it automatically exits). Having a file called runpy.py also makes the interpreter unable to start, but prints a helpful text about there being a file with that name shadowing the one from the stdlib. If we add an import from _pyrepl to main.c, having a file called _pyrepl.py will also display the helpful message about shadowing. Would that be worth it?
This only affects the new REPL.
Could not access _pyrepl.__PYREPL_MARKER
AttributeError: module '_pyrepl' has no attribute '__PYREPL_MARKER' (consider renaming '~\PycharmProjects\cpython\_pyrepl.py' since it has the same name as the standard library module named '_pyrepl' and the import system gives it precedence)
why are you trying to kill new REPL so hard?
I get anxious thinking it will break by itself and be considered a bad idea. I love the new REPL, so I'd like it to be robust.
I find what you're doing really cool
I was also kind of scared by how it was introduced without a non-default period (I know it was in PyPy before, but still)
Felt like it might end up a lot less robust and people are not gonna find many bugs before release
Really appreciate how you're testing this so thoroughly!
Wow, thanks, that means a lot, I'll try even harder to find issues now 😄
just curious, is there actually a reason to use this new REPL over ipython, other than not having to or being able to install ipython?
I think that's the main reason. I hear that's a big one for educators though; installing ipython is a lot more steps for students to get up and running than just installing Python
I see. isn't it trivial to install ipython using pip? (I don't really use pip)
trivial if you already know what you're doing 🙂
the biggest disadvantage might be that IPython has quite a few dependencies, which might conflict with your own application's needs
on top of the obvious one that, if you need to install IPython into every venv, the user experience is much worse
i suppose so. I have no idea what setup educators are using; I guess maybe if they're just depending on the system python and not installing anything, and it saves them having to ask sysadmins to install one package, or something
there might not be a sysadmin; e.g. if you're teaching students and they all bring their own laptops
I agree this could be an issue in theory, though I've never actually heard someone claim this. I'm not sure I understand the second point really, you'd just make ipython one of the packages in your dependencies.txt or however you're doing it, wouldn't you?
that assumes the existence of a requirements.txt
beginners are vanishingly unlikely to have one
beginners mostly create a venv and pip install what they need into it one thing at a time, as they discover a new library that they want to use
I feel like in that scenario it should actually be easier, as the harder part is actually installing python, isn't it? using pip should be as simple as pip install ipython or maybe sudo pip install ipython - but then, that's what I'm asking. I've barely used pip in years.
but beginners probably aren't going to use venv at all, let alone multiple venvs?
they're often forced to - it's impossible to install system-wide on lots of systems these days
I don't have personal experience with this but I imagine the more commands people have to run, the more opportunities there are for people to get confused
but basically what I'm hearing is - I should continue to tell people to use ipython, and to try to get ipython installed if they have even a smattering of experience
but the same point applies even with system-wide deployments. Better defaults are nice because people don't need to know that there's something better out there that they could instead be using and take the time to install it
if they are just using python for a 3-4 month course then obviously it doesn't matter as much
this makes me wonder just how bad ipython's dependencies are, would be nice if it was just packaged with python.
I think that in a Python class you're not even getting to explaining pip and what are packages until a pretty advanced part
I do think IPython is nicer than pyrepl, FWIW
While a REPL is probably day 1
but pyrepl is much better than the old default REPL
IPython is amazing!
no that's a different thing
IDLE is a text editor
It was just a wild moment for me when I heard about the default repl getting syntax highlighting and block support a few months ago and I did a bit of a double take
the old default REPL didn't have a name that I know of. It's just called the "basic repl" now that we need a name for it to distinguish it from the new pyrepl
ipython has been stable and widely used and had those features for around a decade
the old default repl is what you get if you just run "python3" in a shell
btw on servers with thin docker images you probably won't have IPython
yeah. IPython is great, but better defaults benefit everyone
it seems like IDLE has an interpreter built-in though, it's not just a text editor
IDE, then, if you like, I guess
well, this is an exaggeration tbh :-). For sure, it will benefit some people. It hasn't affected me in the last 10 years, and I suspect it will not benefit me in the future. But I'm happy someone will benefit!
I think the interpreter has IDLE built into it 😛
i was just confused when he said IDLE was a text editor because I could have sworn I had a distant memory of a window with IDLE written on it, and an interpreter, and I did a google search to make sure I wasn't having a senior moment
but yes, it's meant to be an IDE, not just a text editor
idk about probably, I suspect that it's so little space that if you like ipython there isn't really much reason not to just deploy it
I'm actually curious now to see how much space ipython + dependencies actually uses
i wonder if ipython installs all the notebook stuff too by default, or if you can just get the ipython interpreter by itself
it does install the notebook stuff by default
So, I don't have pip handy, but with micromamba, an environment with just python is 180M. When I install ipython, it goes to 217M
I'm not sure if it does, at least on mamba/conda, all the notebook stuff has moved to the "jupyter" moniker
ooh, indeed - I think I'm wrong about that, and the notebook stuff got split out
here's what it installed:
Installing collected packages: wcwidth, pure-eval, ptyprocess, traitlets, six, pygments, prompt-toolkit, pexpect, parso, executing, decorator, matplotlib-inline, jedi, asttokens, stack-data, IPython
+ pickleshare 0.7.5 py_1003 conda-forge Cached
+ decorator 5.1.1 pyhd8ed1ab_0 conda-forge Cached
+ exceptiongroup 1.2.2 pyhd8ed1ab_0 conda-forge 20kB
+ pygments 2.18.0 pyhd8ed1ab_0 conda-forge Cached
+ traitlets 5.14.3 pyhd8ed1ab_0 conda-forge Cached
+ typing_extensions 4.12.2 pyha770c72_0 conda-forge 40kB
+ executing 2.0.1 pyhd8ed1ab_0 conda-forge Cached
+ pure_eval 0.2.3 pyhd8ed1ab_0 conda-forge 17kB
+ wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge Cached
+ ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge Cached
+ parso 0.8.4 pyhd8ed1ab_0 conda-forge Cached
+ six 1.16.0 pyh6c4a22f_0 conda-forge Cached
+ matplotlib-inline 0.1.7 pyhd8ed1ab_0 conda-forge Cached
+ prompt-toolkit 3.0.47 pyha770c72_0 conda-forge 271kB
+ pexpect 4.9.0 pyhd8ed1ab_0 conda-forge Cached
+ jedi 0.19.1 pyhd8ed1ab_0 conda-forge Cached
+ asttokens 2.4.1 pyhd8ed1ab_0 conda-forge Cached
+ stack_data 0.6.2 pyhd8ed1ab_0 conda-forge Cached
+ ipython 8.26.0 pyh707e725_0 conda-forge 599kB
not quite a 1:1 match
i wonder why it's different
but yeah, I do think the "weight"/space reason isn't much of a reason not to install ipython, at least now with jupyter split out - it will add a trivial amount to the size of your venv/docker/etc
50 MB, it seems - small, but not trivial
On a raspberry pi or something like that not trivial
On a typical server probaly trivial
(it was 37 M for me and in a real project it would be even less, as some things would be amortized by other dependencies)
fwiw, pyrepl is adapted from pypy's repl
it's not a brand new repl being built from scratch for CPython, it's an existing one being incorporated
not that small tbh
Alpine itself is just 50MB or ao
a Python install is ~100 MB, if IPython is ~50 MB (du says 48.5 MB in the fresh venv I just installed it into) that's a ~50% size increase for an app with no other dependencies
very big!
obviously you're not gonna be running that many REPLs on your servers anyway
but still
all the more reason why it's not great to be paying the cost for a heavier REPL by default
I think this is pretty theoretical, the vast majority of people I talk to have far bigger deployments than that
But yes, along with students, people for whom < 50 megs on a deployment is make or break are another major beneficiary here
But most peopl can easily have access to ipython everywhere if they choose to
when we talk about every install of the interpreter getting larger by X%, that has quite a broad impact. A server that could have hosted N apps can now only host 2/3N
at the extremes, granted - but there are a lot of apps that have few or no dependencies outside the stdlib
I don't think being limited by disk space is a common scenario 🤷♂️
Also if you are really that disk space constrained, and deploying that many environments, you should use something that can reuse storage
Well, it does depend on who you talk to... The Python community is very very broad
at the level of infrastructure providers it's a huge one, AFAIU
Conda/mamba use hard links extensively so that N environments will not take up even close to Nx as much memory
That doesn't work with containers though, right?
Not sure
But like, you can't simultaneously care so much about disk space that this is a big deal but also pick such an inefficient solution to begin with, it seems to me
I don't know, I think if you'll look at solutions offered by cloud providers it's not uncommon to see an app deployed on a 100+ slim containers
Anyway, I'm certainly willing to bet this is quite niche - I've yet to encounter someone who actually said they wanted ipython on a server, considered adding it, but felt they couldn't because of disk constraints
Are you my first? 😛
Everyone else I talked said they just didn't bother, or didn't know what ipython was, or already had it everywhere
Again, the python community is very very vast
I'll take that as a no
The people I talk to are different than who you talk to
I have talked to people that really cared about container sizes
And I have been on containers where I wished I had IPython
That's not what I asked, but good to know!
I mean, it's just not a discussion I really have
I don't know what answers I'd get
not sure where you asked something
Look for the question mark I guess
Oh
Yeah me personally I wouldn't mind the size
on apps I work on
I would add IPython to all of them probably if I got around to it
<@&831776746206265384> joined to advertise, they put this same message in 3 channels
What are the chances that type statements could be extended to support keyword arguments?
type Vector = list[float, size=x]
This example doesn't seem very useful, but for the types that DS/AI people often use, it would be helpful to encode promises like what columns a given dataframe would have or the number of dimensions an array would have.
!pep 637
I don't think this should have anything to do with type statements, but I think a case could be made for revisiting that PEP
Yeah, I remember that PEP. I liked it at the time, but I think limiting the new behavior to only type statements would address the concerns that the council had.
I don't think so. The right-hand side of a type statement is just an expression
limiting it to the type statement would complicate the type statement further because currently ^
I see
I think it's more compelling now than it was when it was rejected, but I'm personally not a fan of keyword arguments in indexing
I think PEP 696 fits nicely with this syntax, it's nice to be able to name defaulted type parameters
!pep 696
how bad would a change such as this be for parsing efficiency? (where keyed_getitem is some imagined new pattern)
type_alias:
| "type" NAME [type_params] '=' (expression | keyed_getitem)
that's fine as far as the parser goes, but I don't think it would make for a good user experience. For example, you could write type X = list[int, a=3] but not type X = list[int, a=3] | set[int, a=3]
I see. and when it's just "type" NAME [type_params] '=' expression, nothing extra needs to be done to make it recursive.
Is there a way of setting an exception's __cause__ without having to raise with a from or is that the only mechanism that sets it?
You mean other than manually setting it?
!e
i = IndexError("oh no")
z = ZeroDivisionError("no!")
z.__cause__ = i
raise z
:x: Your 3.12 eval job has completed with return code 1.
001 | IndexError: oh no
002 |
003 | The above exception was the direct cause of the following exception:
004 |
005 | Traceback (most recent call last):
006 | File "/home/main.py", line 4, in <module>
007 | raise z
008 | ZeroDivisionError: no!
Emily Morehouse, speaking as a Steering Council member added that the Steering Council has requested an informational PEP on the new REPL. "Hearing concerns about how [the new REPL] might be rolled out... it sounds like we might need something that's more compatible and an easier rollout", leaving the final discussions to the 3.13 release manager, Thomas Wouters. Carol replied that she believes "we could do it in documentation".
Does anybody know whether the final plan is to create a PEP or just documentation?
i had a conversation with someone the other day regarding the object structure, and it seems ob_refcnt_split is undocumented. is that something that should be?
Possibly in internal documentation, definitely not publicly
i.e. it should be documented for people who want to hack on CPython, not as something to rely on for users of the C API
Yeah I was looking for something that doesn't use the dunder as avoiding manipulating them directly usually is a badidea but that seems to be working fine so far
guys iam still at the beginning of python am i in the right channel or what
i still need guidance
You can ask questions here https://discord.com/channels/267624335836053506/267624335836053506 or make a thread in https://discord.com/channels/267624335836053506/1035199133436354600
I've been looking at this issue the last few days https://github.com/python/typeshed/issues/6347
I think no small amount of the problem is the byzantine implementation of lru_cache.
Does it need to be a bunch of nested functions and closured variables?
I'd like to simplify it to a plain-old class.
I don't think changes to the runtime implementation of lru_cache can affect how typeshed describes it. If you make a change to the runtime that makes the stubs materially different, you've probably made a backwards-incompatible change to the runtime.
i'm thoroughly impressed with nogil, i've been stress testing some of my extensions that use threads and it's been able to run them without any changes
You have c extensions that use threads?
yeah, via C11s thrd_t
Earlier I mentioned my "p-string" idea, where p"some/path" is equivalent to pathlib.Path("some/path"). And this idea is a non-starter because pathlib isn't implemented in C, and pathlib depends on several stdlib modules that also aren't implemented in C.
Though I wonder: would it be possible for the presence of a p-string in the code to trigger the importing of pathlib? does import do this with importlib?
you can import modules from the C API, yeah
it's possible yes but there is no existing precedent for it in the language core
well, I suppose defining a generic does implicitly import typing
how expensive is typing to import relative to something like pathlib and its dependencies?
I don't know, not sure it really matters for this question
relatedly are you aware of
!pep 750
I am not--thank you for bringing it to my attention
oh I was thinking exactly p-strings when I saw this pep
import pathlib
from typing import Decoded
def p(path: Decoded) -> pathlib.Path:
return pathlib.Path(path.raw)
print(p"some/path")
something like that?
what about p'C:/Users/{username}/blabla'? your tag function does not accept that
that makes me think that a helper to produce a string with all inline fields evaluated will be used pretty often
I don't believe it should accept that
of course it should probably have a good error message to deal with it tho
from ... import make_string
def p(*parts: Decoded | ?) -> Path:
return Path(make_string(parts))
the PEP says:
Tag functions accept prepared arguments and return a string:
does it mean that returning Path will raise an exception?
or it is a typo
that's gotta be a typo or an error
they even show examples of not returning a string
do you want to do interpolation or just want the raw {whatever} in there?
i want it to behave just like an f-string with tag applied on top of it
perhaps the PEP should introduce a function which takes the received arguments from tagging and uses them as an f-string would
and another that would return just a raw content from it
real raw content or applying escape sequences?
well, i guess none of it makes sense
you always can do p(''), p(r''), p(f'') to be precise instead of p''
true
tags should be used only if you intend to do something special with interpolation parts
i think the main reason people would want p-strings is to be able to use them without explicitly importing pathlib, because otherwise it's just mildly shorter syntax
like you would be able to just use them without any hassle whatsoever
just realized I don't think it's possible to write a 100% accurate raw_str tag
because of the = feature of f-strings
and also p'a{x:{fmt}}b'
tru
from .tags import p
path = p"some/path"
from pathlib import Path
path = Path("some/path")
I'm not really convinced that the former is really an improvement over the latter
The thread mentions the idea of stdlib including some pre-defined tags and that would make it much more appealing to me
There was a thread about wanting pathlib.Path.realpath() which basically did .expanduser().resolve(), I can see that getting added as a tag function in stdlib as a cool little use of this pep
the Path pre tag was proposed as it's own separate PEP i believe, maybe this is a workaround to make it more general and easier to digest
yeah, i think PEP 750 is the better solution
That means having p pre-defined in builtins? That sounds terrible.
i can't seem to find the original PEP however
was it a discourse thread instead of a PEP?
i seem to remember it being an actual PEP, i could be wrong though
i don't remember ever seeing a PEP for that
something about the no space function call is weird to me. i liked steve's suggestion of "have an i-string" and then you do regex(i"my_escaped_{word}") or whatever. solves the dotted name / namespace problem, is more minimal syntax, avoids weird things that beginners may run into like print"asdf", etc
I was thinking more of an import like from stdlibtags import regexstring
And with slightly more descriptive names
I also like the i-string suggestion fwiw
callable juxtaposition is definitely a learnability issue waiting to manifest, it sounds fantastic for #esoteric-python though
The more I think about it, the more appealing this sounds to me
Will be extra fun with soft keywords. Like match"foo":
This works in current implementation:
def greet(*args):
"""Uppercase and add exclamation."""
salutation = args[0].upper()
return f"{salutation}!"
print(greet"Hello")
__builtins__.__dict__["raise"] = greet
raise"Well that's novel"
Outputs (in the playground):
HELLO!
"WELL THAT'S NOVEL!"
greet"Hello" is lazily evaluated, right? Is it similar to returning a partial of greet(...)? I guess I need to read the full PEP.
I really dont get the tags PEP and it frustrates me, like I'm not seeing the power or how it can help with making a dsl (only dsl im really familiar with is Jinja or Jenkins declaritive pipeline syntax). I dont see the power in it or any benefits
think of them like user-defined f strings. instead of adding more string prefixes (such as p, as suggested above) to the interpreter itself, the user can make them
First, the format_spec can be arbitrarily nested:
mytag'{x:{a{b{c}}}}'
im not sure what this is supposed to mean
f'{x:{a{b{c}}}}'is invalid syntax currently
No Implicit String Concatenation
Implicit tag string concatenation isn’t supported, which is unlike other string literals.The expectation is that triple quoting is sufficient. If implicit string concatenation is supported, results from tag evaluations would need to support the + operator with add and radd.
this doesnt really make sense
'a' "b"does not perform any addition, it is just a way to write'ab'
so i dont see a reason fortag'a' tag'b'to not be equivalent totag'ab'
With other modifiers, e.g. r and f: r"hm\n{foo}" f"\nbar{baz}" means r"hm\n{foo}" + f"\nbar{baz}", not rf"hm\n{foo}\nbar{baz}"
So if you wanted to make implicit string concatenation work, it would be (tag"a" + tag"b"). But a tagged string doesn't have to return a string, which makes implicit concatenation on them kinda nonsensical
Like, if nd"0 1 2, {x} 4 5" makes a numpy array, should nd"0 1 2" nd"{x} 4 5" produce np.ndarray([0, 1, 2]) + np.ndarray([x, 4, 5])?
it would be extra awkward because path strings won't be concatenable
i see
there is no problem in treating tag'a' tag'b' as tag'ab', but tag1'a' tag2'b' cannot be implicitly concatenated because tags are different
And what if the tags are different?
Hm, maybe it could make sense to do this. But we'll need to see what people will actually use these for
is there a real usecase for implicit string concatenation?
int"1" + int"2" works
When the string literal is too long to fit in one line, and you don't want to add a bunch of +s or introduce whitespace
I don't really like this feature, because it's an easy footgun ```py
things = [
"foo",
"bar",
"baz,"
"fizz",
"buzz",
"final item"
"wait, another one"
]
i realized that tag'a' tag'b' -> tag'ab' can produce results that make little sense
consider np'1 2' == [1, 2] and np'3 4' == [3, 4]
it would be weird to have np'1 2' np'3 4' == np'1 23 4' == [1, 23, 4]
yep
Perhaps the tag could decide what to do with string interpolation. It might make sense for some tags (like HTML) but not for others (like numpy)
use multiline strings then
or introduce whitespace
i hate python sometimes
I worry about it being too powerful actually. Decimal literals? Fixed integer sizes? Random syntax? Calling functions with quotes? Everything becomes possible. Very fun to write, but will it be easy to read and understand?
there is a workaround for the case with same tag: ```py
np'a{x}b' -> np('a', lambda: x, 'b')
np'c{y}d' -> np('c', lambda: y, 'd')
np'a{x}b' np'c{y}d' -> np('a', lambda: x, 'bc', lambda: y, 'd')
^ ^ ^^^^ bad
np'a{x}b' np'c{y}d' -> np('a', lambda: x, 'b', 'c', lambda: y, 'd')
^ ^ ^^^^^^^^ ok
!pypi custom-literals
Ok that makes more sense
calling PEP 750 "Tag Strings For Writing Domain-Specific Languages" just seems very strange to me. To the extent that this allows creating DSLs, it allows creating rigid, strange DSLs that bear little resemblance to other languages and that do a poor job of allowing users to express themselves
From skimming it I got the impression that it's supposed to make it easier to work with DSLs, not create them
Like they give examples of SQL and templates in Jinja (not sure if the second one is a DSL?)
So the use of "writing" here is strange
PEP 501 seems much more reasonable to me, at a quick skim
unrelated, but is it possible that PEP 556 is revived with the introduction of nogil?
!pep 501
I'm not convinced that is what they mean, because PEP 501 i-strings meet that need. They say that "The authors of [PEP 750] consider tag strings as a generalization of the updated work in PEP 501", and the only innovation I see in PEP 750 is that it lets you do foo"{x}" instead of foo(i"{x}") - surely that is the DSL they're talking about
Hello my am
and the cost of that is that it'll be impossible to add new string prefixes in the future. They note in https://peps.python.org/pep-0750/#valid-tag-names that any existing string prefix must be an invalid tag name, but they don't acknowledge that this implies that introducing any new string prefixes in the future would be backwards-incompatible, as they might conflict with user-defined tag names
try asking in #python-discussion, @little robin
Oh that's interesting problem, as well as combinations of existing tags (rf, etc).
why are exsiting string prefixes case insensitive? why do URFB'' prefixes exist at all?
(i vaguely remember that some of them had a little different behaviour somewhere)
and given that we've introduced new string prefixes repeatedly (r, then u, then b, then f) it seems like a really bad bet to say that we'll never need any new one again once we have tag strings
i was under the impression that the point of it was so that they wouldn't have to add new ones in the future
theoretically, couldn't they add new string prefixes in a backwards compatible manner by just using the new one if it's not in the namespace?
In the current way, you can combine them (r, f, rf, etc). How would that be compatible in the new way?
oh, i didn't see that
even setting that aside, this absolutely doesn't imply that we'll never need a new one in the future. The existing string prefixes change the way that the string is parsed. If we didn't already have raw strings, you wouldn't be able to define an r that behaves like r"a\b" does today using only the tools that PEP 750 tag strings would give you
doesn't it provide raw string contents?
yeah, I'm wrong - I now see that the proposed Decoded does give you access to the raw string, which would be enough to let you do it. Not well - you'd wind up parsing it at runtime instead of compile time - but at least it's possible
ah, no - I was right the first time, based on
mytag'{expr=}'is parsed to being the same asmytag'expr={expr}’
that means that you wouldn't be able to implement an r tag because you wouldn't be able to distinguish r"{x=}" from r"x={x}"
actually, even deeper than that - r"{" is valid today, but if I'm reading the PEP right, sometag"{" would be syntactically invalid, so that's another reason why it wouldn't be possible to define r as a tag function
Is this solvable by requiring the new string functions have some prefix, like xgreet"blah"? Not pretty, just thinking out loud
yes, or even mandating a minimum length
if all existing prefixes are one or two characters, we're probably safe reserving 1 or 2 character prefixes for the language and letting user-defined identifiers be 3+ characters
so p is out of the question?
I think it should be. And, if we had p, note you'd have trouble representing a filename containing { - you'd need to escape the { as {{ or use a regular string literal and call Path explicitly
from the decoded strings section, this snippet makes no sense:
decoded = raw.encode("utf-8").decode("unicode-escape")
if decoded == raw:
decoded = raw
what's the point of this if?
I think the idea is that otherwise it would take 2x the memory
I think that's replacing two distinct but equal strings with 2 references to the same string
not something i would normally think about in python code, but interesting
i'm guessing it was translated to python from c code
yeah. if I'm right that it's just an optimization, I'm surprised that they bothered to illustrate it, rather than dropping that from their translation...
i'm curious about what error they'll pick for the disallowed tag names
SyntaxError?
or, how will they actually implement it? both of these are valid pieces of code, it would be a breaking change to make it error in the future
def f(*args):
...
print(f"whatever")
I'd expect so
i'd personally be much more comfortable with a SyntaxWarning, for this reason
I'm not sure what you're saying is (or would be) a breaking change
how are they going to specifically disallow use of those names as tags?
something in the tokenizer or grammar, doesn't seem too difficult
this is in my opinion a stretch of a generalization
i'm just curious about how they'll differentiate f"hello world" as a user trying to use an fstring, or if they're trying to use a tag called f
the PEP defines that - it's an f string
there's no differentiating, it's just always an f-string
that seems like it could cause some odd problems for beginners wondering why their code is ignoring their tag
I would expect beginners to never define their own tags
fair enough
this section mentions raising an error
technically a breaking change
ah, i got confused from the prefixes listed above
honestly, I don't think they even need to specify this
none of those can be used as names for the callable, so of course none of them can be used for the tag name
yeah, it's just confusing
(they may be specifying this since it's relevant at the level of the grammar, but it's not relevant at the level of a Python programmer using the feature)
will there be any nicer errors for trying to use functions that are not tags? e.g. print"hi"
I saw someone show an example where they did builtins.__dict__["raise"] = some_callable and then raise"foo" worked in the prototype
(i've only skimmed through the PEP, FWIW)
it probably should not work
how did that get past the parser? raise"foo" is invalid syntax
!e raise"a"
:x: Your 3.12 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "/home/main.py", line 1, in <module>
003 | raise"a"
004 | TypeError: exceptions must derive from BaseException
I do like that with this, we have a nicer solution for lazy eval log messages
TIL
I think in the prototype it doesn't tokenize to a keyword, it tokenizes to a tag
i was under the impression that a space was needed between raise and the exception
oh yeah, because it's a literal, righttt
who says print is not a tag?
That's a good observation, probably worth bringing up on Discourse. The PEP's backwards compatibility section is rather thin https://peps.python.org/pep-0750/#backwards-compatibility
it's true that return"x" is currently valid syntax and the PEP (at least in the current version) will change what it means
ok, bad example 😛. what about something more strict, like max"hello"
less glibly: given that we define what a tag function is entirely by the interface it conforms to, given that print() does conform to that interface, it seems to me that it is a tag
max conforms to the interface, too, as long as no placeholders are given - right?
we're slowly reverting back to python 2's print statement
!e print(max('foo'))
:white_check_mark: Your 3.12 eval job has completed with return code 0.
o
struggling to pick a builtin that doesn't support str
no placeholders = 0 or 1 string piece
max will fail with 0 args
upd: max will fail with 1 arg too, because it would be a DecodedConcrete
so you can't make max work as a tag
abs"olutely"
so it just throws a TypeError, interesting
honestly, I think this is a very good argument for why structural subtyping isn't good enough for this, and an ABC would be desirable
i guess that makes sense
users will get bad error messages if we infer tag-ness instead of declaring it
Maybe a decorator, like @tagtools.tag
!pypi tags
!pypi tagtools
you are killing pypi libs 💀
Yes, the users of this package from 2010 will be in utter shambles
@types.tag or something would be fine
i think types would be a weird place to put it
is that like stdlib.h in C? If we don't know where to put it, put it in types 🙂
it never had any release apparently
@ast.tag
from my understanding, types is used for getting native types (e.g. coroutine) at runtime
hm maybe pypi is broken for me
direct link works but then if I click on one of the tabs it goes blank
seems to fit well enough to me - it is a type. Honestly, I think it'd be fine to even make it an actual type and force library authors to inherit their tag callables from it
could go in string as well, come to think of it
class sql(string.tag):
!d string
Source code: Lib/string.py
this is how we might acquire the second and potentially third users of the string module 😛
one of the less-loved stdlib modules
at that point, just add it as a method of str like @str.tag
wait, it has stuff like string.ascii_lower. nevermind
string.Template is genuinely useful and should be used more
class string.Template(template)```
The constructor takes a single argument which is the template string.
why does it use the $ syntax instead of {}?
I think it predates any use of {} for placeholders in Python
and $ has been used for placeholders in lots of other languages
haven't we had that since like, the early 2000s?
!pep 3101
!pep 292
^ this one introduced string.Template
actually - requiring that the tag callables derive from some base class would address a lot of my concerns with PEP 750. It addresses the concern about random functions accidentally working as tags, and of bad error messages when trying to use a function that doesn't support the interface as a tag, and it allows an extension point for tags to opt into different behavior in the future. Imagine in the future we discover a need to suppress the {x=} -> x={x} expansion for some tags - the class could just have a def __tag_debug_string_expansion__(self): return False, which would allow a backwards-compatible way of changing how tagged strings are parsed in the future
oh, i didn't realize string was that old
be the discourse message you want to see in the world 😄
ok, done
some bikeshedding: __tag_uses_debug_expansion__ is too long
what about like, __tagexpand__
or really, since this is in an ABC, just __expands__ or something like that
it could be any name at all, totally not worth bikeshedding on at this stage
it could even be a __tagflags__
i thought the whole point of the name "bikeshedding" was that you do it instead of important things 😉
sure. I don't plan to engage, though 😉
flags as in like, TAG_EXPANDS | TAG_SOMETHING_ELSE?
yeah.
"tag flags" sounds cool
not too big a fan of that in python, is there a precedent for that in the stdlib?
the important part of this idea is that it gives a way for a tag callable to declare some attributes about itself that the interpreter could inspect. The specific mechanism for how it would do that isn't something we'd need to settle just now
i'm more of an advocate for a decorator rather than an ABC, but yeah, i like the general idea
even the specific attributes it might declare about itself aren't something we'd need to settle right now
maybe both of the examples I gave are bad ideas and we don't want to implement them - but I can virtually guarantee that there will eventually be something where we need some strings to be parsed differently than others.
r strings and u strings and f strings and b strings are all parsed differently, and it seems unreasonable to bet that tag strings will be the last new type of parsing we'll ever need for the stuff in the quotes
all classes have cls.__flags__ but it is only used by C code
also, code objects have .co_flags
and compile supports flags kwarg
!d compile
compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)```
Compile the *source* into a code or AST object. Code objects can be executed by [`exec()`](https://docs.python.org/3/library/functions.html#exec) or [`eval()`](https://docs.python.org/3/library/functions.html#eval). *source* can either be a normal string, a byte string, or an AST object. Refer to the [`ast`](https://docs.python.org/3/library/ast.html#module-ast) module documentation for information on how to work with AST objects.
The *filename* argument should give the file from which the code was read; pass some recognizable value if it wasn’t read from a file (`'<string>'` is commonly used).
some flags are probably involved in buffer protocol
rephrasing the question: is there any precedent for flags in python code
the re module, for instance
I don't remember any 🙂
sys.setdlopenflags()
damn it
some filesystem/socket/mmap/... stuff uses flags
anyway, the important part of the idea is that it provides an extension point. We don't have to define how we'd use that extension point yet, it's useful to have even if we don't yet have any need for it, since we might have a future need for it
yeah but that's all thin wrappers over C code. are there pure-python APIs that use flags as their choice of configuration (FWIW, i think re counts, as godly mentioned)
i do hope the PEP authors take your idea into consideration, it fixes most of the issues i'm seeing in the thread
is it reasonable to create new namespace for tags only?
there are currently 3 global namespaces: globals themselves, __builtins__ and __annotations__
I suggest making __tags__ namespace so that tag"foo" looks into __tags__['tag']
how will something be added to said namespace?
you can search for any use of enum.IntFlag
!d enum.IntFlag is that really a thing
class enum.IntFlag```
*IntFlag* is the same as *Flag*, but its members are also integers and can be used anywhere that an integer can be used...
how dare they
flags are, like, super duper common, my friend 😄
keep them in C where they belong
from myhtml import html_tag, html
from ... import add_tag
add_tag(html=html_tag)
add_tag(html_tag, name='html') # or this
html() # that is a class and this line creates an instance
html'fubar' # that is a use of tag
Why not follow the normal lookup rules?
i find a dataclass or namedtuple works much better for configuration (or just plain old kwargs, depending on the case) in python. what libraries (that aren't just C wrappers) use flags?
normal lookup rules are not normal anymore
so adding 4th namespace wouldn't change it much
and there are 2 more namespaces that I didn't mention : locals and nonlocals
and they all behave differently
With normal lookup rules, you can compose tags or decorate them: ```py
def some_function():
# ...
debug_html = debug_tag(html, "some_function")
return debug_html"<div>{foo}</div>"
the "mode" argument for open()
Or like ```py
def some_function():
# ...
html_ = html.override(ascii_only=True)
return html_"<div>{foo}</div>"
that's a string, i'm not sure that counts
why wouldn't it count? it's multiple discrete pieces of information packed into a single field
well, that argument is based on C code anyway, which didn't fit my criteria
it's not based on C code, open is implemented in Python
but even if it wasn't, there's bz2.open and gzip.open and shelve.open etc
open in C also uses string of flags, iirc
i thought it was equivalent to the second parameter of fopen in C
Everything is based on C code eventually
it's not, there's Python-specific flags
idk about everything. but in any case, that doesn't mean you couldn't have a nicer API - python's subprocess.run for example is vastly nicer than any C API it's eventually delegating to
open() is implemented in C (not sure it matters for this argument though)
I would like to think that open in python is how it is simply because it's quite old - if open's API were being designed today then I'd like to hope it would take an enum (but maybe I'm delusional)
Agree, I don't think the open() interface is a great example to follow
or maybe even a dataclass with multiple bools, idk
i'm still not an advocate for designing an API to use flags instead of a dataclass or whatever these days, but yes you win, python has flags in the stdlib
fwiw I agree with you that flags suck
it's a very C thing. Even in C++, if you wanted to achieve the same underlying efficiency, you'd do it in a more type safe way.
from what i'm seeing, many of the results are in libraries that are FFIs
Yes, especially for IntFlag
I believe they are used, but I'm not sure I see a good reason to use an enum.IntFlag in pure python (i.e. no wrapping of C or eventual system calls)
maybe binary serialization into another language (but again that's not really "pure python" anymore, exactly)
this whole discussion is weird and unnecessary. The relevant idea is that there should be some way for the tag callable to tell the interpreter how it wants to be called. There are infinitely many contracts that would allow for that; it's weird to get stuck on one like this
bikeshedding :D
I see it as just a handy way to express a set of literal options. Something like ```py
class Flags(enum.Flag):
MULTILINE = "multiline"
DOTALL = "dotall"
VERBOSE = "verbose"
#<=>
frozenset[Literal["multiline", "dotall", "verbose"]]
But yeah, if you need to add an option that's not a bool, you need to bolt it to the side
what's the actual benefit of this, I don't really understand
you can just write Flags.MULTILINE | Flags.DOTALL
?
the benefit of what?
I just don't really see why this is better than a normal enum and {Flags.MULTILINE, Flags.DOTALL}, that's all
in C, creating an actual hashset for something like this is an insane amount of work
So obviously you're going to use a bitset
it's also far faster and you often care about speed, avoiding heap allocations, etc
So even in C++, stuff like this does get used, though often an attempt is made to wrap it up more nicely.
there's no real reason outside of that to use it that I know of, so basically almost no reasoning applicable to python
i'm not too sure about PEP 750's choice to allow any return type, it's odd that you could do things like foo"hello" == 42
@grave jolt another nasty thing from a type perspective is that Flags.MULTILINE, and x = Flags.MULTILINE | Flags.DOTALL, have the same type
so now Flags.MULTILINE in x feels awfully weird because you're checking if a T is in a T
str 👀
actually... maybe that is also weird
it just ends up being weird no matter what, with flags.
In C++, ideally, if I wanted to do "flags", I would try to have a separate type for the enum, and for the enum set
but really it's only something I'd do for performance
{Flags.MULTILINE, Flags.DOTALL} is a set[T] and so later you're doing a membership check of T in a Set[T] - life is simple and makes sense
enum.IntFlag is extensible to flags that are not known in advance but might still be supported (by virtue of being threaded through to a different layer that knows what to do with them)
this is more about python flags than i ever needed to know 😄
usually the whole concept with enums is to restrict things intentionally. But in any case, evenif you wanted to do that, nothing is actually stopping you from putting different values into a set in python
I don't see what IntFlag's advantage in that regard is
🤷♂️ all of the things that are nicer about enums than just global variables, I suppose
What, you don't like callable strings?
def lam(*args):
return lambda: args[0]
print(lam"Hi"())
yeah... not great
however, it opens the door to lots of black magic shenanigans on pypi
python-ideas will have a field day with tags
I'm not sure why not? Your argument that instead of using Flags.FOO | Flags.BAR you could use {Flags.FOO, Flags.BAR}, which is true. But by extension to that same argument, we don't need enum at all, you can just do {module.FOO, module.BAR}. The things that you get from enum are a way to check that all the things in the set are valid (for some definition of valid), a nice repr, type safety, etc
When you start putting things from outside the IntFlag into your set though, you already lost that ability 🤷♂️
maybe it's easier if you actually show the IntFlag code that you envision - then we could see if there's a way to write it without IntFlag that's equally nice
nah, you still have some type safety (at least it's an int) and you still have a nice repr (it just uses a constant for the value that wasn't known up front)
then you can just use ints 🙂
You can have one enum with int values, pass around sets of ints that could potentially have ints from outside that enum - that another "layer" in the codebase knows about
you can even put a union type in your set
of course you can, at the cost of the nice repr, and the ability to do membership checks by name
Set[NormalEnum | int]
Seems exactly the same to me?
the diference is that here, we're explicit about the fact that some values will be from inside the "known" enum, and some values will not be.
You can insert a comment or other supplementary material in the middle of a call now ```diff
@motivational # turns the function into a tag returning a callable
def pick_polling_strategy(con_pool, expected_size, expected_count):
...
def handle_something():
- poll = pick_polling_strategy(config.pool, expected_size=2**20, expected_count=50)
- poll = pick_polling_strategy "It is the rule in war, if ten times the enemy's strength, surround them; if five times, attack them; if double, be able to divide them; if equal, engage them; if fewer, defend against them; if weaker, be able to avoid them." (config.pool, expected_size=2**20, expected_count=50)
ok. That's implicit with an IntEnum
With IntFlag, it's just always going to be implicit whether or not that's the case
yes, exactly
finally, inline comments 🧑🔬
so IntFlag is just throwing away type safety which is basically always going to be useful somewhere 🤷♂️
foo/*comment*/(args)
foo"comment"(args)
MyIntFlag is a single type that is overloaded to mean NormalEnum, Set[NormalEnum], and Set[NormalEnum | int] - hard to consider that a win
that's exactly what it's for, though
I don't like using stuff I don't understand
and I don't understand fully how enum magic works, so I don't like using enum module
it's an improvement upon bitsets. It's useful for doing bitsetty things in a more readable and safer way
I don't know how it matters what it's intention is - I don't think it serves any purpose usefully - that was the whole discussion 🤷♂️ .
I don't see the point repeating "but that's the intention!"
Yes - and if you need an underlying data representation that is a bitset, then it's fine.
Speaking of PEP 750. Will any expression be allowed inside of {}? Like, can you do hmm"{some.thing + 420} is what i want to print"?
I said that from the get go - but that very rarely comes up outside of wrapping C code and thing similar to that
i think so, it's an extension of f strings
Will await be banned inside of {} then?
for x in rg"0..10":
print(x)
(that does sound like a sensible option)
do fstrings not support await?
or actually, it probably can't, because of lazy evaluation
They do, because they're transformed at compile time. But with the proposed interface, it would not be possible to handle hmm"aaa {await something()} bbb"
worth bringing up on discourse, i don't think they mention that
Similarly, (yield) would be disallowed
for x in rg"1,2,...,{n}":
print(x)
this is exactly why tags should be required to return strings
I like it soo much 🥰
imagine doing this ```py
for x in rg"({a},{b}]": print(x)
meh, too many brackets of all sorts...
now that i think of it... maybe PEP 750 makes golfers too strong
The beauty is that you can parse whatever syntax you want, including comments or multiline ranges, just yield from range(start, end, step).
not really
only if there will be a stdlib module with many tags
making your own tag is probably too many chars
depends on what it is, i guess
Actually, I kinda agree with concerns from Eric Traut. The template can evaluate the arguments in any order (or at a later point in time), which can be surprising
I do like the explicit lazy marker. But not lambda: 💀
That would make them far less useful. In fact, you'd lose practically all of their advantages
taking the sql"" example, for instance, you'd be losing the ability to use that SQL statement with APIs that take the parameters as objects, losing out on the ability to use prepared statements or execution plan caching, etc
I worry about the idea of these being called "tagged strings" when they return anything. You cannot safely iterate over a tagged string because it might have returned itertools.count. They are actually a special function call protocol that may be used to construct anything, with one such uses being advanced string operations. I'd rather call it the "tag protocol" or something.
b"foo" doesn't produce a str
re'foobar' == re.compile(r'foobar')
as opposed to e'foobar', which doesn't preserve backslashes 👍
clearly that calls eval()
Finally. Backtick strings
But it returns something based on whatever is between quotes. You can put names and arbitrary objects in the global namespace with a tag. Seems to be a significant difference to me.
that's true of an f-string, too
!e f"{globals().__setitem__('x', 'y')}"; print(x)
:white_check_mark: Your 3.12 eval job has completed with return code 0.
y
that formats with =1 as the format string, right?
yes
Maybe allowing arbitrary expressions inside f-strings was a mistake. I've seen some terrible stuff
(a comprehension inside of {} is "terrible stuff" in my book)
format specifiers were enough to stump Pablo, lol. https://github.com/python/cpython/issues/121130#issuecomment-2197120529
Thanks! Maybe I'm not aware of just how powerful strings already are? 🤔
is f'foo{",".join(...)}bar' considered "terrible stuff" ? i wrote it several times
f'A {f"{B} {C}":10} D'
I mean, ultimately it's subjective
for me it makes it harder to separate the template text from what's being inserted into it
my brain has very little RAM
The current PEP750 reference implementation does something interesting with yield ```pycon
def inspekt(*args):
... rv = []
... for arg in args:
... if isinstance(arg, Decoded):
... print("decoded", arg)
... else:
... v = arg.getvalue()
... rv.append(v)
... print("interp", repr(v))
... return rv
...
x, y, z = inspekt"foo{42}bar{yield 5}baz{yield 6}"
decoded foo
interp 42
decoded bar
interp <generator object <interpolation> at 0x7d211cbe8670>
decoded baz
interp <generator object <interpolation> at 0x7d211cbe8880>
next(y)
5
next(y)
Traceback (most recent call last):
File "<python-input-31>", line 1, in <module>
next(y)
~~~~^^^
StopIteration
next(z)
6
next(z)
Traceback (most recent call last):
File "<python-input-33>", line 1, in <module>
next(z)
~~~~^^^
StopIteration
I think at some point it literally translated it into lambda: yield 5, not sure if it still does that, but that seems to be what you're seeing
yeah, fair. i'm just worried that adding tags will encourage some shenanigans like rg from above
Can some pls help on how to go about this using python
First thing that came tk my mind was to create a table containing dictionaries that describe that table. These dictionaries will be correspond to a simple hash (in this case i just concatenated the X Y coordinates into a string)
example:
coord_table = {
"873":{
"x":87,
"y":3,
"c":"□" # special character
}
}
x, y = 87, 3
hash = str(x) + str(y)
print(coord_table[hash])
!e
coord_table = {
"873":{
"x":87,
"y":3,
"c":"□" # special character
}
}
x, y = 87, 3
hash = str(x) + str(y)
print(coord_table[hash])
:white_check_mark: Your 3.12 eval job has completed with return code 0.
{'x': 87, 'y': 3, 'c': '□'}
me when the 1, 11 coordinate overwrites the 11, 1 coordinate
How do i repeat ?
yo what are some mobile alternatives for pycharm
Hello there everyone
code
print("hello")
how does typeshed indicate a soft deprecation? @typing.deprecated?
we haven't explicitly discussed it
good to know. it might be difficult to get people to migrate without their IDE telling them so
it gets cached the first time, so the second import does little work
import is roughly implemented as ```
try:
return sys.modules[module_name]
except KeyError:
mod = actually_import(module_name)
sys.modules[module_name] = mod
return mod
yes
from module_name import Y, Z is roughly implemented as ```py
try:
mod = sys.modules[module_name]
except KeyError:
mod = sys.modules[module_name] = actually_import(module_name)
Y = mod.Y
Z = mod.Z
del mod
(instead of being bound and then unbound, the name mod is just never bound, but this is roughly the idea)
Note it also looks for sys.modules[f"{module_name}.{Y}"] though
oh, true 🙂
does caching apply to the C APIs for it? e.g. PyImport_Import
I believe so but read the docs and/or the code
Python/import.c line 3881
PyImport_Import(PyObject *module_name)```
or, if it does, it doesn't skip the list creation and whatnot
lol it literally calls builtins.__import__
the caching is inside of that
but there's enough other stuff going on here that adding your own caching on top of PyImport_Import is likely worthwhile
i figured
looking closer, it doesn't look like any of the C APIs for import do caching
Where is the definitive source of truth in the standard and/or stdlib on how to escape strings? Both fstrings and normal strings.
"escape strings"?
while looking at the import implementation, i came across PyImport_ImportModuleNoBlock, which is deprecated and scheduled for removal in 3.15 -- it's part of the stable ABI, wouldn't that break forward compatibility?
i don't think anyone has used it since 3.3, i'm just curious 😄
hey guys, I'm running into an issue where my tests for a cpython PR is failing on some specific environments, wondering if anyone can have a look. I've already posted on #1035199133436354600 so I'm linking the whole thing here: #1274855244643172424 message
you're calling the zipapp main with:
args = [str(source), '--include-pattern', r'.*\.py', '--exclude-pattern', r'.*z.*']
where source is a subdirectory of self.tmpdir - what happens if self.tmpdir contains a z in its name?
I'm not positive that this is the bug, but at a glance, it seems like if the temp directory has a z in its name (or one of its parent directory's names), the test would fail with exactly the symptoms that you're seeing - every file would be excluded
!python
Anyone a ESRGAN expert here?
That's more a question for #data-science-and-ml , but also helps if you just ask the question (there)
OMG, that must be it, no wonder it seems to fail randomly, thank you @raven ridge
I commented on the PR, too, in case you missed it here 🙂
what about this
Urgent Help
Who can help me?
If you have a Python question, you should see #❓|how-to-get-help and ask in #1035199133436354600. Make sure to post all the details you have
thank for your support
a conversation in pydis piqued my interest, is it possible to remove the recursion limit without modifying the core?
I think it depends a lot on exactly what you mean by "remove the recursion limit"
you will eventually overflow C stack, and it will lead to a crash
not necessarily - in modern Python versions, it's possible to call Python functions forever without overflowing the C stack
def f():
try: f()
except: f()
finally: f()
f()
recursively calling infinitely will never raise a RecursionError
would like sys.setrecursionlimit(-1) work?
You can try: ```py
import sys
sys.setrecursionlimit(2**31-1)
def a():
a()
a()
well, that’s a lot, but it’s not totally infinite
it doesn't run out of stack space, though, it runs out of heap space
what if you had a beefy computer that could hold all the frames in memory at once. it would still raise a RecursionError, right?
so there’s no sure-fire way to remove the limit on all systems?
sure there is - delete the code that imposes a limit
delete the code in the core?
yeah
that’s not exactly portable now is it 😛
I don't know what you mean by "portable" here
in practice, if you never call C functions and only call Python functions, you can recurse 2 billion calls deep in current CPython versions. You will run out of heap memory before you successfully push 2 billion call frames
portable as in, you can run it on any python interpreter
if "any Python interpreter" includes Python 3.10 and earlier, you will overflow the C stack and crash the process, even if you never call any C functions
i know, i avoid recursion anyway. i was just wondering if it was something that cpython had a way to do
it'd take hundreds of gigs of memory to be able to hold 2 billion call frames, though
the limit you'll hit, in practice, isn't the recursion limit, it's the amount of memory on the system
with that being said, could one manually deallocate frames that you know you’ll never see again?
no... you need to be able to return to those frames
oh well
back of the napkin math, 2 billion call frames would take at least 176 GB of heap memory just for the _PyInterpreterFrame structs
don’t some psychos have like 256 gb ram these days
sure you can get a machine with tons of memory and get a little further
doesn't make any real difference to the answer
yeah, fair. it's a fun exercise though!
the interesting fact here is that there's no longer any reason why there must be a recursion limit because Python frames no longer take up space on the C stack
but, when a Python frame calls into a callable that's implemented in C, that adds an extra frame to the C stack.
i think it's helpful for debugging, a RecursionError is nicer than seeing a segmentation fault
you won't get a segmentation fault from the code I shared above - try it
it prevents errors in a friendly way
I would prefer to get recursion error after 1000 calls, instead of python consuming 50gb out of 16gb physical memory
oh, it just runs infinitely, and then linux kills the process eventually. i would prefer a segfault!
linux kills the process because your machine runs out of memory
right, that's not exactly nice for debugging
windows just slows down a lot, because CPU is busy compressing/decompressing memory to/from swap
it'll eventually die even on Windows. You don't have unlimited swap
yes, I think that's why we didn't just remove the recursion limit in 3.11
a quick(ish) RecursionError is much better for users than eating all your memory
that, and the fact that you can still overflow the C stack when calling stuff implemented in C
if the recursion limit were removed entirely, whether or not you get stack overflows would depend on whether or not the stuff you're calling is implemented in Python. You'd need to know implementation details of a lot of stuff in order to reason about your program's correctness
it gives me a pretty segfault on 3.9, FWIW
yep - any version before 3.11 will overflow the stack and segfault, any version from 3.11 on won't
def foo(x):
sorted([...], key=foo)
foo(...)
(foo calls sorted, sorted calls foo, and so on...)
will C stack will be growing with each recursion level?
am I understanding it correctly that this will not grow C stack: def foo(): foo() ?
why isn't faulthandler enabled by default? it doesn't affect runtime performance
yes, and yes
technically, this also depends on whether a custom frame evaluation function has been installed. If so, even calls from a Python function into a Python function take up extra call frames on the C stack.
I'm not aware of anything interesting that actually makes use of custom frame eval functions, but the API for them exists, and if something were to use them, it would disable the optimization that allows Python functions to call into Python functions without consuming stack space (yet another good reason for keeping the recursion limit)
i didn’t even know that existed
If you want to test stuff... AWS offers instances with up to 32TB of RAM
that sounds cheap!
$407.68 per hour apparently... though it does include almost 900 CPU cores, they don't offer 1 CPU with 32 TB of RAM
the 3TB one is more reasonable at $20/h
if someone wants to spend $20, I'd be curious whether 2 billion Python frames takes more or less than 3 TB of memory 😄
If you can figure it out in 15 minutes, it's only $5 🙂
would be an interesting stress test for cpython
1 cpu with 32 TB would be somewhat comical
Spot instances are a lot less, but it’s a lot harder to get access to them (the more powerful ones anyways)
Will it not also be disabled when a trace/profile function is installed?
quick Google search suggests that typical ram bandwidth is not much more than 50 GB/s
32 TB / 50 GB/s = 640 s = 10.7 minutes just to write these frames to memory
that's if the 32TB were all in one stick, which is impossible - i think in theory you need to also divide by the number of channels. but of course they can't even be on one motherboard, so who knows how many channels are there.
I don't think it matters whether they're all on one stick or not... The writes would all have to be serial rather than parallel, regardless, just by nature of this being a stack
ah, that's true
Guys im making a database but i forgot what did the cursor do
Ask in #databases
Oo thankuu
Question: AFAIK you can't change methods of builin classes such as str, and I want to do just that (specifically I want to print every declared string when running an arbitrary program), so am thinking of rebuilding the python library with that change and use that executable instead, how can proceed to doing that. or is there a better solution that will allow me to change built-in methods in python?
Is that really your only option? Did you consider creating a type that inherits from the str-like type designed for user inheritance, and changing the functionality there?
You might be able to do it with fishhook or a similar library, but first consider how much of a nightmare it would be to print something every time a string is created anywhere in the interpreter. For one, you will likely hit the problem that printing something itself requires making a string.
can you say more about "print every declared string"? Tell us the larger problem, and how printing the strings will help.
!e ```py
from fishhook.asm import get_interned_strings_dict
from fishhook import hook, orig
import sys
interned = get_interned_strings_dict()
oldnames = interned.copy()
def audithook(*args):
for key in interned:
if key not in oldnames:
print('[audithook] new string:', key)
oldnames[key] = key
for method in ['add', 'mul', 'getitem']:
@hook(str, name=method)
def strhook(self, *args, method=method):
ret = orig(self, *args)
if type(ret) is str and ret not in oldnames:
oldnames[ret] = ret
print(f'[str.{method}] new string', ret)
return ret
sys.addaudithook(audithook)
eval("'newname'")
:white_check_mark: Your 3.12 eval job has completed with return code 0.
[audithook] new string: newname
This doesnt print declared strings normally (not in eval), so it might not work for my case, and ofc I don't quiet understand what's that hooking technique/library u used, so I might be wrong.
if strings are declared in the same script then they are generated and stored before the script actually runs
if you want it to work for declared strings then you need to import your code after the hooks have been added
This idea camed to me when doing some reverse eng, and for -let's say- obfuscated python code, this would come helpful in debugging it.
audithooks are builtin to python now, they get called by certain C functions as it evaluates code.
Got u!
!d sys.addaudithook
sys.addaudithook(hook)```
Append the callable *hook* to the list of active auditing hooks for the current (sub)interpreter.
When an auditing event is raised through the [`sys.audit()`](https://docs.python.org/3/library/sys.html#sys.audit) function, each hook will be called in the order it was added with the event name and the tuple of arguments. Native hooks added by [`PySys_AddAuditHook()`](https://docs.python.org/3/c-api/sys.html#c.PySys_AddAuditHook) are called first, followed by hooks added in the current (sub)interpreter. Hooks can then log the event, raise an exception to abort the operation, or terminate the process entirely.
and fishhook is a library i wrote that allows for hooking C class methods (eg: str.__getitem__) and hooking raw C functions on supported platforms
can these too be printed using another way (like the idea that i suggested before)?
You can print all declared strings that made it into the interned strings dictionary by making oldnames initialized to an empty dict
You can also work with the interned strings dict directly using the function exposed by fishhook.asm
can you elaborate -am not that familiar with python internals-?
Python holds references to strings that are considered interned, and will reuse those references
Those references are stored in a dictionary that is normally not accessible
Fishhook.asm has an example that grabs the interned strings
so is the strings that are declared after in the same script included in that dictionary too?
from fishhook.asm import get_interned_strings_dict
from fishhook import hook, orig
import sys
interned = get_interned_strings_dict()
oldnames = interned.copy()
def audithook(*args):
for key in interned:
if key not in oldnames:
print('[audithook] new string:', key)
oldnames[key] = key
for method in ['__add__', '__mul__', '__getitem__']:
@hook(str, name=method)
def strhook(self, *args, method=method):
ret = orig(self, *args)
if type(ret) is str and ret not in oldnames:
oldnames[ret] = ret
print(f'[str.{method}] new string', ret)
return ret
sys.addaudithook(audithook)
"newname" # <---- this one ?
Yes in most cases
@crude anvil can you say more about what you mean by "declared" strings? String literals? Or the computed values of strings at runtime?
in particular, Python doesn't have variable declarations as such.
string literals and computed values of strings at runtime.
I think @pliant tusk answer is comprehensive for now
another possibility if you need it is to observe Python execution with trace functions or sys.monitoring.
I feel like for debugging/reversing at that level you would be better suited with a c level debugger
If it's really so obfuscated
Oh sys.monitoring is a good idea, haven't done much with it yet since it was added
cool, I'll check that out!
it's changing as we speak also: https://github.com/python/cpython/pull/122564
(because after 18 months of telling him, mark finally listened)
Oh that co_branches method for code objects will simplify my bytecode decompiler/recompiler
Since right now I walk the bytecode to find branches
i walk the ast
Makes sense, better than walking the bytecode
i started by looking at bytecode, which is how I realized that bytecode can be very complicated
Yea it's tricky, but my code is intended to run after the functions have been compiled so I don't have ast at that point
So it'll be nice to take advantage of a method that gets the info I need before the source is discarded
thanks for mentioning co_branches, i need to see how to make use of that. maybe I can scrap all the ast code.
woa
that's nice
overall, it's been a lot of work to adapt coverage.py to sys.monitoring, and it's not done yet.
i'm right now running tests on a +621 -1000 branch
Seems like it'll simply it once sys.monitoring is fully implemented
"simplify": yes
Definitely a lot to change tho
but it will be a few years before 3.14 is the minimum python version
the five year EOL is a killer for libraries
if it was really a problem, i could just bump the minimum version. People on 3.8, 3.9 would continue to use the coverage.py version that shipped on them.
yup. it's just a little unfortunate that libraries have to wait so long to get new features
in this case, the benefit is to the user of the library (low-overhead coverage measurement)
Wrong channel. Use #career-advice
sorry
got me thinking about running coverage.py in production😆
they spammed it in other channels too
How to juyprer notebook
In this Python Tutorial, we will be learning how to install, setup, and use Jupyter Notebooks. Jupyter Notebooks have become very popular in the last few years, and for good reason. They allow you to create and share documents that contain live code, equations, visualizations and markdown text. This can all be run from directly in the browser. I...
Thank you
PEP 318 quotes Guido as such:
[…] – with no new syntax, the magicness of a function like this is extremely high:
Using functions with “action-at-a-distance” through sys.settraceback may be okay for an obscure feature […] The widely held view here is that decorators need to be added as a syntactic feature to avoid the problems with the postfix notation used in 2.2 and 2.3. […]
What is “the postfix notation used in 2.2 and 2.3”?
I'm guessing that they mean func = decorator(func)?
also side note: idk what you're reading a PEP from the early 2000s for, but just keep in mind that old PEPs effecitvely are historical documents and not always representative of how the language works nowadays
According to https://mail.python.org/pipermail/python-dev/2004-September/048518.html
I personally feel that prefix decorators are a huge improvement over the "f = staticmethod(f)" style of decorating.
I think etrotta might be correct
In other news, it's kinda shocking how much stuff was added in each Python release back then.
Python 2.2 added
- new-style classes
- multiple inheritance
- descriptors
- iterators
- generators (as an experimental feature)
then Python 2.3 added - generators (stabilized)
setlogging,csvbool(yeah)- import hooks
wasn't bool (infamously) added in 2.1.1?
but yeah, new stuff was being added at a much higher rate then
oh I see, they added True/False globals in 2.2.1 but not the bool typ
what’s the difference internally between PyObject_Malloc and PyMem_Malloc? the docs don’t specify
remember last week when I argued that the C API docs aren't great and you asked why I thought so? 😛
yes, you win 🙄
I think this is about the three "domains" in https://docs.python.org/3/c-api/memory.html#allocator-domains ?
i saw that, but both object and mem say they “take from the python private heap.” are there two of them?
no, both of them allocate using pymalloc
so they’re just “optimized differently”?
if I'm reading the code right they point to the same underlying allocator function by default
I think the answer is that the only reason both exist is to provide 2 different customization points, in case someone wants to use a different allocator for Python objects than for other stuff
yes, I think you can use https://docs.python.org/3/c-api/memory.html#c.PyMem_SetAllocator to make one or the other behave differently
yeah
in principle, the reason you might want to use a different allocator for objects than for not-objects is that objects tend to be small and very consistently sized, while other stuff can be much larger and have much less predictable sizes
I wonder how significant the overhead is from allocations through a function pointer; Python tends to allocate small objects at a pretty high rate
I guess that's part of why we have freelists for some types
but pymalloc handles that reasonably well all on its own: it checks the size of the allocation up front, and then delegates to malloc (or, I guess to PyMem_RawMalloc) for large allocations
hello
you mean the cost of the indirection itself? I'd guess that's far, far lower than the cost of the allocator - not free, but...
yes, it's a few extra CPU instructions before you get to the actual allocator
my understanding is that the freelists are mostly useful because they avoid calling the allocator, and they avoid needing to run part of the tp_new for some types
it can be cheaper to re-initialize an object retrieved from a freelist than to initialize a chunk of uninitialized memory
i think the docs should clarify that 😄
I agree 🙂
There’s actually a stronger (and super significant) difference on the new free-threaded builds: the “object” domain must be used for all objects… and only for objects: https://docs.python.org/3.13/howto/free-threading-extensions.html#memory-allocation-apis