#internals-and-peps
1 messages · Page 24 of 1
the easiest option would be to add a breakpoint() call there, and run the process under gdb, and then just hit ctrl-c at the pdb prompt to drop into a gdb prompt
ended up SIGINT'ing it
yes, exactly
with signal
oh - sure, that works
that's exactly what doing breakpoint() and hitting ctrl-c at the prompt would do, too, though
oh i see
but yes, you could os.kill(os.getpid(), signal.SIGINT) to get the same effect
nice i see ok, ty
what for?
is it me or building --with-address-sanitizer is extremely long? or did my build hang heheh
strace reports that it hangs and loops the following and furious speeds:
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x74708f9d4d88} ---
write(2, "AddressSanitizer", 16) = 16
write(2, ":DEADLYSIGNAL\n", 14) = 14
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x618fafc50e90} ---
write(2, "AddressSanitizer", 16) = 16
write(2, ":DEADLYSIGNAL\n", 14) = 14
anyone got suggestions
DEADLYSIGNAL is also slightly intimidating ngl
getting this across multiple python versions
tried multiple docker base images
so, what would be some issues with .pop accepting slice objects?
It would be strange for the return type to change depending on the argument that was passed
That's exactly what happens with indexing tho
Yes, but I think that would be more surprising for a function
Also, how would it handle subclassing? If you subclass list, would o.pop(slice(0, 10)) return an instance of list or of the subclass?
I mean, that's the exact same problem everything else has too, UserList can deal with it.
slicing a subclass of list returns a list
So popping from a subclass of list with a slice should return a list
It seems fine-ish to me from a technical standpoint, but I don't think the symmetry with list indexing really makes sense without the : sugar
I think that, in the world with type annotations, we've moved away from adding functions whose return types can only be described using @overload because they depend on the input types
I'll say from my experience with Pandas and libraries that implement container types that return additional but separate instances of containers on certain operation, this is a tough problem
As a result Pandas has a whole bunch of stuff you gotta implement so it knows how to repack your data after operations, when you make your own subclass
I also think that list is too crucial of a container that this shouldn't be supported
it'll add too much overhead in the general case
Whats wrong with just [original.pop(idx) for idx in range(10)]
Popping changes the length of the list, so that won't get the correct items. Also the index may go out of range
ah yeah, great point
alright, here's a crazier idea:
lst.pop[start:end:step]
👀
Lock this man up
make pop an instance of Pop the way loc is an instance for dataframes?
sth like that ig
Or just allow slice literals in more spots
alr, I can sense this potentially causing issues for backwards compat, soo, to be in line with the slice notation we get when using [], how about sth like .pop_slice[start:end:step]?
yeah, but that's a far more radical of a change
why would it be backwards incompatible?
the new class just needs to implement __call__ and __getitem__
I can't think of any particularly obscure reasons right now, but like...
And __get__ to be able to bind it
why a property?
and the property handles that, isntead of making Pop a descriptor
User-defined functions are all descriptors
Because then __get__ is just always returning itself whether accessed from class or instance, but realistically you'd probably want adifferent Pop instance on each attribute access
wait, why?
That doesn't make sense
well, that's breaking backwards compat
like x = [1,2,3]; print(id(x.pop) != id(x.pop)) would return True is the correct way to do it
imo
I think this can be just solved using composition really, where you pass the list to Pop obj during __init__ and then just simple attribute access via list.pop that returns that composite Pop
Yeah im saying everytime you access [1,2,3].pop it instantiates a new Pop passing it the list instance
wouldn't the id of 2 different instnaces be different?
!e You can already do this: ```py
s = slice(1, 3)
l1 = [1, 2, 3, 4]
l2 = l1[s]
del l1[s]
print(f"{l1=} {l2=}")
@raven ridge :white_check_mark: Your 3.12 eval job has completed with return code 0.
l1=[1, 4] l2=[2, 3]
but that's not an expression
Having .pop() accept slices doesn't seem to add much, since you can always just slice to get the items into a new list and then del them from the original list
Sure, if you need an expression you can define a function that does this
def pop_slice(lst, slice):
ret = lst[slice]
del lst[slice]
return ret
that adds extra overhead... no, but like is there any harm in adding this new shiny feature?
It's more stuff in a language already fairly full of stuff
It's quite hard to justify adding something just to avoid writing two lines
I mean, yeah, more code to maintain... is it gonna add thaaat much more code though? (well, yes, because writing a Python class in C is not quite the same simple as in Python)
alright, ig my dreams are crushed here 
pandas/core/indexing.py line 304
def loc(self) -> _LocIndexer:```
I'd think most discussions here are just for fun
Not really
is multiprocessing.connection doing something about creating new processes? why is it in multiprocessing?
for example here https://stackoverflow.com/a/63547025/964478 the server listening for the client messages does not create any processes, right?
It's used if you need more elaborate IPC when doing multiprocessing. It wraps the OS-specific IPC mechanism.
!e ```py
from fishhook import *
@hook(type(list().pop))
def getitem(self, slc):
if not (isinstance(self.self, list) and self.name == 'pop'):
return orig(self, slc)
S = self.self[slc]
del self.self[slc]
return S
x = [1,2,3,4,5]
p = x.pop[0:2]
print(p, x)```
@pliant tusk :white_check_mark: Your 3.12 eval job has completed with return code 0.
[1, 2] [3, 4, 5]
if I just wanted to make it possible in my codebase in some way, I'd probably do the reasonable way and subclass list instead or use a function, I just wanted it be part of standard Python 
print ("hello world")
please adhere to the channel topics
also please use a proper IDE or code editor
wdym
Discussion on the use cases, implementation and future of the Python programming language including PEPs, advanced language concepts, new releases, the standard library, and the overall design of the language.
this is the topic of this channel.
If you have a Python question, see #❓|how-to-get-help
or just ask
will logging have pep8-compliant aliases
no
Have a discussion link on that? I’ve often wondered why it doesn’t
history, and avoiding two names for the same thing
there have been many discussions on this
Yes, I’m sure there have been, which is why I asked for a link to the discussion instead of asking to rehash. I wasn’t sure if there’s a canonical pep or something similar that attempted this.
I use the module logging a lot and it’s really annoying to remember all the attributes and the right casing for it. Some of them are camelcased like funcName, some others oneworded levelname, and some others snakecased stack_info I don’t really mind which case wins but I prefer it’s consistent.
There have been more, it feels like this comes up every 3 months or so
where are the python standard libraries located?
If you're using CPython, here: https://github.com/python/cpython in Lib/ and Modules/
thanks im just using the typical python installed by the installer from the website
that's CPython yeah
ok thanks
im trying to make my own portable version of python
i dont have a modules folder
does that mean they are just in the lib folder
it should be site-packages for third party
Some libraries are native modules (written in C), some are Python, some are hybrid
right. welp pygame seems to work fine which is all i needed really. it is for distributing my games
nice, how are you going to distribute it?
probably github. ill like make a "packager" using something like pysimplegui. you just then specifiy which packages you need. i havent started working on my game yet. well not much of it. i want a solid way to distribute it before making it
Prior art worth considering if you're not in the business of reinventing the wheel: PyInstaller, cx_Freeze, auto-py-to-exe, py2exe
the problem with pyinstaller is i didnt really like how it packaged the projects. i made a small game once before but it refused to launch because it couldnt find the assets folder i had to move it from where it had been automatically placed
Lib/ntpath.py line 225
# join(head, tail) == p holds.```
or is it just a generalized assumption?
does anyone know what the platform independant libraries are?
for context ```
Could not find platform independent libraries <prefix>
when running my portable version of python
i think i might be missing some folder from the usual python install
Not sure how relevant this channel even is for questions about hypothetical Python features
what about a .= augmented . operator?
instead of sth like
string = string.replace("this", "that")
you could do
string .= replace("this", "that")
basically just syntactical sugar
like, no __igetattr__ special method or anything like that
I think it makes reading harder because "replace" lookup scope is not clear
yeah, this looks very weird
what about this?
string = $.replace("this", "that")
``` `$` is a sugar for lhs of an assignment
creating implicit vars is always bad since it may break lots of code
what would happen in this example? py x[print('hi')][print('world')] .= a if this is equivalent to x[print('hi')] = x[print('hi')].a, then hi and world would be printed twice
i dont think this is a good idea
this might be a better alternative:
x[print('hi')][print('world')] .= a
###
lhs = x[print('hi')]
# ^^^^^^^^^^^^^^ side-effect here
key = print('world')
lhs[key] = lhs[key].a
#^^^^^^^ this executes __setitem__
# ^^^^^^^^ this executes __getitem__
x.a.b .= c
###
lhs = x.a
# ^^^ this might contain side-effect, but it is executed exactly once
lhs.b = lhs.b.c
#^^^^ __setattr__ - if there is a side-effect, this is fine
# ^^^^^ __getattr__- if there is a side-effect, this is fine too
# ^^^ __getattr__ - desired side-effect
Yes, I say it's a bad idea to use the .= or the alias $
alr, what if the compiler simply does code duplication from lhs to rhs and just converts it to a regular assignment, so during interpretation, it will print twice
yeah, $ suffers from the same problems as .= because it is not clear what code actually will be executed
and the fact that x in x=y and y=x behaves differently makes it even worse
obj.compute_very_expensive_stuff().a .= b
# ->
obj.compute_very_expensive_stuff().a = obj.compute_very_expensive_stuff().a.b
``` not good
idk, I think it's fine
you'd "save" the output of the expensive function in a temp variable anyway, so it would make sense to also do it when using .=
or it would evaluate only once always...
hmmm
obj.compute_very_expensive_stuff().a .= b - this looks like expensive stuff happens only once, but actually it happens twice, and second one is hidden behind the syntax
python is usually pretty explicit and does only stuff that you ask for, and i dont think .= syntax fits well into language
mmm, maybe, I still would love to have it, at least so it can be used for simple enough stuff... and evalute only once
that just reduces typing a little bit, I think it's already well solved by autocomplete tools of IDEs
I don't think it would be evaluated twice unless you expand
Below I use the + op as example:
class Value:
def __init__(self, value: int):
self.value = value
class Foo:
def compute_something_expensive(self) -> Value:
print("Computing something expensive...")
return Value(0)
foo = Foo()
foo.compute_something_expensive().value += 1
print("Done!")
Computing something expensive...
Done!
now expanding like you did
foo = Foo()
foo.compute_something_expensive().value = foo.compute_something_expensive().value + 1
print("Done!")
Computing something expensive...
Computing something expensive...
Done!
!e ```py
class Value:
def init(self, a):
self.a = a
class OtherValue:
def init(self, b):
self.b = b
class Foo:
def compute_something_expensive(self):
print("Computing something expensive...")
return Value(OtherValue(0))
foo = Foo()
temp.a = (temp := foo.compute_something_expensive()).a.b
print("Done!")
@faint river :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | Computing something expensive...
002 | Done!
as illustrated above it only happens twice when expanded
otherwise augmented assignments wouldn't have a use other than shortening code
and we know that just shortening code isn't a great argument for adding things
...which seems to be the entire argument for .= so far
idk if it was covered above, but .= is significantly different than other augmented operators. In s .= replace("old", "new"), how is replace evaluated?
hmm yes that was mentioned
it is just as different as . is to other operators
it is markedly different (other operators evaluate their arguments before the actual operation)
why is ... not an alias for None but its own object
because it is meant to be used as a value, in slicing/indexing syntax
its usage in typestubs came after
like lst[...]?
I recommend to explore how it is used in numpy
so is the binary operator @
the matmul operator was added in 3.5 specificially because of NumPy. No builtins use it.
is the ellipsis motivated by numpy
or does numpy just conveniently use it
not sure on that one
oh yeah, ... is used in Callable for typehinting, when the argument type(s) doesn't matter
but idk since when that has existed
imagine adding a feature just because of a third party library
can’t just use object?
no, because ... in Callable args represents any amount as well
it was motivated by typing i think
is there a PEP(s) for Ellipsis?
actually no it got added in with slices
i c
how to flatten this two dimensional set py f = {{1,4},{2,5},{8,10},{6,7}}
Trick question: sets can't contain sets
But assuming the inner sets are frozen, set.union(*f) should do it.
Still getting the error TypeError: unhashable type: 'set'
Sets can only contain hashable items. Since sets are mutable (they can be changed), they aren't hashable.
If you make the inner sets a frozenset, it will work.
Just had microsoft interview, they gave the above set and asked to merge intervals
my approach for intervals was right but the typeerror flopped my interview
@quick snow any idea how to flatten it if we face this situation again?
this is the answer
also, it is not good channel for that kind of questions
use #python-discussion #algos-and-data-structs #1035199133436354600 instead
You cannot flatten it if it cannot exist
Where you allowed touse third party libs? pandas.Interval could handle that ezpz
Has anyone been successful in building Python 3.13a05 with the experimental jit enabled? I tried on Ubuntu 22.04LTS and it died looking for clang-16.
(apologies in advance for cross-posting, but this got lost in #python-discussion)
Is it considered part of the language spec that, for some non-immortal object named x, where x is the only reference to the object: del x will cause the object to be deleted before proceeding to the next statement?
from what I understand, that is the behavior in cpython, but I'm not sure if it's specified per se.
I don't think it is part of the language spec, and even in CPython, while this is probably true for 99% of typical code and typical classes, there can be corner cases (if __del__ is implemented and must be run, for instance). For the 99%, I expect CPython's reference counting kicks in and does what you describe. And it is very easy to overlook the creation of new references to an object (was it used as an argument to a function wrapped with lru_cache? There is another reference.). So that might be a more relevant question to ask yourself - is the refcount to x really just 1?
It's not guaranteed by the spec
Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.
no, it is not specified
and it might not be true in other implementation
but it was like that in cpython for ages, so there is a lot of code out there that relies on that
If you are looking for some kind of guaranteed cleanup, that is what context managers are for.
My question is answered ✅ 
this is possible iff implementation does RC (ref counting), which is not the case for most of them
pypy doesnt do RC, iirc
micropython doesnt do RC
jython, brython probably too
(and here the docs mean any kind of garbage collection, even the one happening without the garbage collector in CPython)
now people can focus on this question instead
(didn't want it to get burried)
Installation instructions here: https://github.com/python/cpython/blob/main/Tools/jit/README.md#installing-llvm
Let me know if you have any issues.
now that pep 668 is widely implemented. whats your way for quick scripting. i used to use python to run http requests with the requests package but no it seems that i have to create a venv for max 5 lines of code. whats the expected approach?
you can make a utility venv that you use for little things, and just install everything into it, the way you used to install things into your system python
Here is what i wanna say about Python i think it greatly helps people make programs and take a good run at programming.
I feel skeptical about selling and being a programmer is way farther away everyday for most but certain keep pushing the envelloppe and that's brilliant.
I am all for open source softwares and bringing people forward to help with issues in daily life with the amount of stuff we have to do on our own now we can still do better and i love that.
That's my opinion and i could be wrong but i just wanna say that because i love people that do good things especially when i see myself reflected in their project but that's just a small bit of the projects i am unable to create and sometimes i just wanna get people excited about an idea and that's where i feel the folks at home bring joy unbeknowst to me or something or other
love liam
Looks like it built successfully, thanks! My Verilog parser runs nominally faster, but nothing earth-shaking.
Yeah, it’s not much faster yet. Right now we’re just making sure it works and shaking out some of the issues with building, memory use, debugging, etc.
Let me know if you want a performance suite - pyparsing does a decent job, but parse performance has always been a weak spot.
I’d love to see that. We’ve got a C parser (Python implementation) in our benchmark suite, but always looking for more real-world code.
What sort of a speedup are you seeing vs. 3.12?
Or vs 3.13 (no JIT)
I'm actually just getting dressed for an event, so I need to break off else I'll be late. I'll post some numbers later this evening.
Is there a way to detect the presence of the JIT, via the sys or platform modules?
Nope, sorry.
Do you mean like “is it on”, or “am I literally in jitted code right now”?
"Am I running Python with the JIT enabled?"
If the former, the JIT is always enabled if you built with the flag.
But no way to tell between different builds.
Yes. But in my tests, I run many Python builds and versions.
So it would be nice if in the test output I could log if JIT'ing or not (like I currently log the Python version, and the version of the package under test).
Yeah, sorry. I forget if the configure options or compiler flags end up in sysconfig somewhere…
"Am I running a Python built with the enable JIT flag?" might be a better wording of my question.
This looks promising:
>>> pprint(sysconfig.get_config_vars()["PY_CORE_CFLAGS"])
('-fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -D_Py_JIT '
'-fno-semantic-interposition -flto -fuse-linker-plugin -ffat-lto-objects '
'-flto-partition=none -g -std=c11 -Wextra -Wno-unused-parameter '
'-Wno-missing-field-initializers -Wstrict-prototypes '
'-Werror=implicit-function-declaration -fvisibility=hidden -fprofile-use '
'-fprofile-correction -I./Include/internal -I./Include/internal/mimalloc -I. '
'-I./Include -DPy_BUILD_CORE')
I can search for -D_PY_JIT. The leading _ implies this might change in future, but for right now, it will help me distinguish if I'm running the Py3.13 built with the JIT enabled or not.
You may be able to find the configure options and search for --enable-experimental-jit too. Of course, I’m not sure if either of these work with Windows…
Yes, there is a CONFIG_ARGS item in that same dict, and it has the configure tag.
Right now I'm only testing on Ubuntu, and I doubt I'll do any of this on my WIndows environment.
Thanks for the pointer, I've never used the sysconfig module before.
As for my tests, they may not be very good benchmarks - I'm not really seeing any significant difference across 3.12, 3.13, and 3.13-JIT, for both my littletable and pyparsing test suites.
>>> '--enable-experimental-jit' in shlex.split(sysconfig.get_config_vars()["CONFIG_ARGS"])
True
Nope, no third paty libraries. They asked me to code without them.
!rule 6 4 @pulsar island don't advertise on this server
4. Use English to the best of your ability. Be polite if someone speaks English imperfectly.
6. Do not post unapproved advertising.
See above, this is not an ad board.
sorry for that
hello !
is it possible to detected type aliases like ```py
typed_asyncio_queue = asyncio.Queue[tuple[str, str]]
Not using the AST walker: Syntactically this is no different from any other indexing operation. You can check if the value is an instance of types.GenericAlias though.
in order to that i would have to actually execute the code though
huh, mypy's stubgen annotates that as ```
typed_asyncio_queue: Incomplete
Yes, static type checking is fundamentally impossible to do correctly and completely in Python.
it's interesting how the mypy static checker grabs it correctly though
(type alias) typed_asyncio_queue: type[Queue[tuple[str, str]]]
yet it's stubgen doesn't write it correctly
I'm guessing it knows that asyncio.Queue is generic-able somehow; perhaps this is hardcoded?
mypy/typeshed/stdlib/asyncio/queues.pyi line 22
class Queue(Generic[_T], _LoopBoundMixin): # noqa: Y059```
nice, i'm writing an stub generator using only AST
but i guess not everything can be parsed correctly with it.
regex might work for this actually
This is somewhat misleading in context. None of the static type checkers require evaluating user code (that would make them not static type checkers). It just requires following the spec, and looking at stubs for modules that use those (such as the standard library)
There are non-expressible concepts in python's type system, but that's a seperate discussion
parsing and traversing the AST, while resolving types is sufficient
I'm not implying they evaluate user code. They do make a few assumptions though which makes them either incorrect or incomplete for edge cases. (E.g. I could have re-assigned asyncio.Queue in user code in a way that type checkers don't notice.)
You told a user it can't be done by walking the AST, and led with "yes, ..." when someone said that would mean they needed to execute code.
Can't be done by walking the AST: Purely on the AST level, you can't distinguish a type alias like this from any other item access.
My "Yes" was in the context of doing isinstance(the_thing, types.GenericAlias).
And they would need to execute code in order to be always correct with the assessment.
And they would need to execute code in order to be always correct with the assessment.
so maybe i should be regexing assignments and then doing the isinstance?
No. Static analysis should not require a runtime isinstance.
you can do this as pointed out here #internals-and-peps message
so how would you try to parse that type alias?
You should make the same assumptions as mypy does, trust in the stubs etc. and accept that you're not 100% correct and complete.
you're as correct as the type system supports if you do that, which is constantly improving. when static analysis tools have full view of an application, and there are no reachable code paths that use untypable constructs, then short of pathlogogical things like intentionally replacing with a mismatched type using a debugger at runtime (Something static analysis can't predict), you have good information.
#bot-commands message
out of something like this?
Yes, you'd then use the ast to resolve types.
but then how do i differentiate it from a normal assignment?
If it's a module level assignment of a type to an identifier, it can be treated as a typealias. At runtime, this is also just normal aliasing.Prior to 3.10/3.12, this was the only option for how type aliases like this would be created
in 3.12+, there's the dedicated type statement syntax (3.12+)
There's also the ability to explicitly declare this by annotating it as a TypeAlias. (3.10+ or typing_extensions)
you can read more about it in the specification here. https://typing.readthedocs.io/en/latest/spec/aliases.html
thank you, i'll play around with it
Pyanalyze has reasonably approachable code demonstrating how to walk the AST while handling type information: https://github.com/quora/pyanalyze/blob/master/pyanalyze/ast_annotator.py If you're looking for examples, but the specification should be your guide as an implementor (There's also a conformance test suite)
currently i believe i'm making a lot of wrong assumptions, but the output is technically correct
i was initially just going to use mypy's stubgen, but it's very coupled with mypy itself
What's the best channel to talk about package managers? Packaging and distribution seems more from the perspective of someone who wants to distribute a package than someone who wants to setup an environment
possibly the PyPA server might be better: https://discord.gg/qUf7SHNM
#tools-and-devops also fits (and this question better fits #community-meta)
uh I mean, that discord has a total 625 people 🙂
I'll check out tools and devops,t hanks
and some of those literally work on the package managers themselves
I know, but my main purpose is trying to figure out the problems that people actually have, usage errors, a lot of which comes from people who are less experienced and so on. Some of it may also just be perception.
I will probably try asking there because "why not", but my point is that someone there could easily have some extremely technical issue with micromamba, that is totally valid, but not really representative at all of the reason why it's not more widely used.
Really I just want to understand what exactly ordinary python devs mean when they complain about the state of python package management.
I feel like I'm in many convos/interactions where it's taken as a given that python's package management situation is bad, and I want to understand why people feel that way, exactly.
my main purpose is trying to figure out the problems that people actually have, usage errors
the very same problems maintainers get barraged with via issues, dms, emails etc?
i would think package maintainers would be accutely aware of the issues most end users would have with their tools
well, anyhow, I will ask there, and see what shakes out, if anything 🙂
since we're in this already do you have any thoughts?
i wonder if there's a way to find the import name of all pip installed modules (programatically)
importlib should be able to find all importable modules
i want to actually import them basically
basically a pip install name isn't neccesarily the name they're imported with
my end goal is to basically pip freeze and then write their importable names to a main.py script if that makes sense
you want the importlib.metadata probably then
You can do something like this to get a list of importable top-level modules provided from a distribution that's been pip-installed:
import importlib.metadata
import inspect
def get_pkgs_associated_with_requirement(req_name: str) -> list[str]:
dist = importlib.metadata.distribution(req_name)
toplevel_txt_contents = dist.read_text("top_level.txt")
if toplevel_txt_contents is None:
if dist.files is None:
raise RuntimeError("Can't read find the packages associated with requirement {req_name!r}")
maybe_modules = [f.parts[0] if len(f.parts) > 1 else inspect.getmodulename(f) for f in dist.files]
return [name for name in maybe_modules if name is not None and "." not in name]
else:
return toplevel_txt_contents.split()
See https://docs.python.org/3/library/importlib.metadata.html#mapping-import-to-distribution-packages for going the other way
thank you, i think i can use this along with pkg_resources
!e ```py
import importlib.metadata
import inspect
import pkg_resources
def get_pkgs_associated_with_requirement(req_name: str) -> list[str]:
dist = importlib.metadata.distribution(req_name)
toplevel_txt_contents = dist.read_text("top_level.txt")
if toplevel_txt_contents is None:
if dist.files is None:
raise RuntimeError("Can't read find the packages associated with requirement {req_name!r}")
maybe_modules = [f.parts[0] if len(f.parts) > 1 else inspect.getmodulename(f) for f in dist.files]
return [name for name in maybe_modules if name is not None and "." not in name]
else:
return toplevel_txt_contents.split()
installed = pkg_resources.working_set
results = set()
for pkg in installed:
results.update(get_pkgs_associated_with_requirement(pkg.project_name))
print(results)```
@jade raven :x: Your 3.12 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "/home/main.py", line 3, in <module>
003 | import pkg_resources
004 | ModuleNotFoundError: No module named 'pkg_resources'
ah
It's best to avoid pkg_resources where possible; it's deprecated and has very bad performance characteristics
It should be possible in nearly all cases to do what you want with importlib.metadata, but it's sometimes annoyingly hard to find out what the equivalent idiom is...
results = set()
for installed_package in importlib.metadata.distributions():
results.update(get_pkgs_associated_with_requirement(installed_package.metadata["Name"]))
print(results)``` learned something new, thanks!
i don't quite like how it includes pycache however
I’m quite interested in a comprehensive python language test suite, similar to ruby/spec (not from the viewpoint of BDD). The number of python compilers/interpreters is growing pretty quickly, mostly all of them have some quirks and differences with the reference implementation (cpython). And the only language reference is the cpython source code itself, not even the documentation. Not sure if this is the best channel for such discussion, but maybe you folks know some existing suite or can point me to some community that would be interested in creating a new one?
The language reference is the documentation, not the source code. The CPython test suite is meant to apply to all implementations, and CPython-specific tests are meant to be marked explicitly
Definitely take a look at the https://github.com/python/cpython/tree/main/Lib/test directory and see if it suits your purpose. If not, maybe you can help adjust it.
How do i compare the first letter of two or more words and print the one that is first alphabetically ?
wrong channel, help questions go in #python-discussion or look at #❓|how-to-get-help
although just asking for the answer isn't really a helpful question to your learning
oh thanks
hmm than how should I ?
provide what code you have written to try to solve the problem, and also include what issue you ran into. Then people can help you understand why the issue happens.
Well the issue is that I have not thought of any way to do it. Thats the only reason why I came here to look for help
ok, continue in a different channel
try to solve part of the problem if you don't know how to solve the whole problem
hmm I will try I guess
would it be too much to ask for an MWE? i've haven't been able to make it work
<@&831776746206265384>
!pban 1036695883217117225 14d We are not a chain mail server. Don't post useless spam here.
:incoming_envelope: :ok_hand: applied ban to @merry field until <t:1713005560:f> (14 days).
hello how can i setup flask on pycharm 2023 community eddition
i downlooded the library and used the template code but it gave me a lot of errors
Please see #❓|how-to-get-help
This channel is about modifiying Python itself
ok thanks
sorry I'm newbie lol
🤨
We are not going to help you with malware here, full stop.
I'm not sure how packaging malware into an exe relates to security research.
sir this is a wendys
unless it's packaging to then reverse it I suppose.
Am i good guy? : False
well that clears it up
we aren't going to condone this sort of vigilantism here
also this is the wrong channel
Unrelated to what this person is trying to do, yes, there is no shortage of valid reasons why turning malware into an exe relates to security research.
you here to troll?
no just a super sigma guy working on malware 🔥
Wasn't there a PEP about arbitrarily tagged strings, something like s[sql]"SELECT * FROM table"? Or did I dream that?
!pep 501
Thanks, that's the one I meant, but it's not doing what I want.
well... it is deferred 🙂
what did you want?
A string tagged with some arbitrary other string (e.g. a language tag). Such that editors can syntax-highlight my SQL string inside my Python code correctly.
I think copying Markdown syntax would be great:
conn.execute(
```sql
DROP TABLE students;
```
)
I think PyCharm had something like that where you put some comment on top and it highlights it?.. though only in Professional
If it's just for editors, I think picking a comment style like # syntax: sql might be better than adding a new syntax element
I think it would be useful if these strings had this metadata available at runtime, but yeah, this would already be an improvement.

# language=html for example
Hi, I'm not sure if I'm supposed to ask this here or in python-help, I have a question regarding formatting long key-value pairs in dictionaries according to PEP 8:
PEP 8 lays out guidelines on how to indent code as well as where to put the closing bracket in a construct, but how does this all apply to dictionaries?
Which one of the following (if any) would be correct?
“python glossary = { "Operating System": ("An operating system (OS) is system software " "that manages computer hardware and software " "resources, and provides common services for " "computer programs."), } “
“python glossary = { "Operating System": ("An operating system (OS) is system software " "that manages computer hardware and software " "resources, and provides common services for " "computer programs." ), } “
Thanks in advance for any help ❤️
I'd suggest using Black to format your code and not worry about things like this anymore
It formats this as: ```py
glossary = {
"Operating System": (
"An operating system (OS) is system software "
"that manages computer hardware and software "
"resources, and provides common services for "
"computer programs."
),
}
This data seems like it might belong in a file, or something
sounds like he's put it in a file ending in .py
that is true 🙂
I meant something like a separate data file that is formatted in a way that makes sense for mostly-text data
Okay. Thanks for the response. I agree, that output makes sense in terms of readability.
I've heard of black and other formatting solutions, but right now I'm just starting learning Python and I'm intentionally just working through a CLI with stuff like Vim so that I completely understand what I'm doing and if something doesn't run or I have a formatting issue I'm the only source for a problem, so I can hopefully learn.
I agree. I'm working through some questions in a textbook and it hasn't introduced reading from files yet, so I'm just doing it like this for now.
well, if you've learned about pip and virtualenvs, you can install Black with pip install black, and format a file with black myfile.py (or :!black % in vim)
even if you don't want to run Black, my formatting recommendation is still just "do what Black does". You can do that manually by imitating its style, it's just more work 🙂
I know of pip, but not virtualenvs. I'll take a look at it. Thanks for pointing me in the right direction.
I agree. I think getting Black set up for myself is a better way long-term to see examples of properly-formatted code.
there's only one gotcha with formatting with black/ruff, which is that it really does a horrible job with multi context managers
as of python 3.10 or 3.11, you can insert parens around all of the context managers, and then it formats it nicely
but for whatever reason, black/ruff will not do this for you
so it's kind of a trick you have to know. doesn't come up often though
Okay, thanks for letting me know. I'll make sure to look out for that.
!e py import dis dis.dis('a and b and c')
is the optimizer meant to optimize out multiple returns like this?
@pliant tusk :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | 0 0 RESUME 0
002 |
003 | 1 2 LOAD_NAME 0 (a)
004 | 4 COPY 1
005 | 6 POP_JUMP_IF_FALSE 8 (to 24)
006 | 8 POP_TOP
007 | 10 LOAD_NAME 1 (b)
008 | 12 COPY 1
009 | 14 POP_JUMP_IF_FALSE 3 (to 22)
010 | 16 POP_TOP
011 | 18 LOAD_NAME 2 (c)
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/ZXIKAA5XMEIVSHJD4LRPFRVGHM
the last 3 instructions are all RETURN_VALUE
optimizer optimizes for code speed (pretty poorly), not for code size
code deduplication is not a big concern, but requires some work to be done in the compiler
i need help in RASA chat-bot api
ah makes sense
Code duplication is also occasionally helpful for the adaptive interpreter, since it allows the same bit of code to adapt in different ways (not that that's useful with RETURN_VALUE specifically).
def add(x, y):
if isinstance(x, int):
return x + y
else:
return x + y
the if statement is just useless since it does the same thing because the functions inside of it are same whenever goes into if or else
well yes that's what code duplication is
python does that in bytecode
Even with duplicate code like this, the compiler can't get rid of one because of tracing/profiling support. If you enable a debugger, coverage etc it needs to ensure correct line numbers are produced. If you disassemble a try: block, you'll see a NOP that had to be added for the same reason.
This sort of thing could theoretically be useful in Python 3.11+ with
!pep 659
Because the specializer can turn the first x + y into a specialized instruction for adding ints
In this case, it's unlikely to be a net positive because the cost of the isinstance() will dwarf the gain of specializing for int addition, but I can imagine this improving performance if you perform lots of additions in a function
seems like it ```pycon
def add(x, y):
... if isinstance(x, int):
... return x + y
... else:
... return x + y
...
for _ in range(1_000): add(5, 4) and None
...
for _ in range(1_000): add("a", "b") and None
...
from dis import dis
dis(add, adaptive=True)
1 0 RESUME 0
2 2 LOAD_GLOBAL_BUILTIN 1 (NULL + isinstance)
12 LOAD_FAST 0 (x)
14 LOAD_GLOBAL_BUILTIN 2 (int)
24 CALL_NO_KW_ISINSTANCE 2
32 POP_JUMP_IF_FALSE 5 (to 44)
3 34 LOAD_FAST__LOAD_FAST 0 (x)
36 LOAD_FAST 1 (y)
38 BINARY_OP_ADD_INT 0 (+)
42 RETURN_VALUE
5 >> 44 LOAD_FAST__LOAD_FAST 0 (x)
46 LOAD_FAST 1 (y)
48 BINARY_OP_ADD_UNICODE 0 (+)
52 RETURN_VALUE
Doesn't seem to be related to the channel topic.
I wonder if there's any tool(s) that can provide CPython-level stack trace. For example, l = [1,2,3] in Python may call PyList_New() in CPython, and I'd like to see PyList_New() in the stack trace reported by the tracer tool(s).
I found Py3.12 has support for Linux perf (https://docs.python.org/3/howto/perf_profiling.html), which basically gives what I wanted. But when I tried it on Ubuntu 22.04 with Python3.12.2, it didn't always work - symbols were hex numbers instead of human-readable strings. Not sure if this functionality was not fully supported yet.
Thanks in advance.
How did you install Python 3.12 on Ubuntu 22.04? The deadsnakes PPA? Or pyenv?
You should get human readable strings if you have debug symbols available. You may be able to get that by installing a python3.12-dbg package, or possibly by setting DEBUGINFOD_URLS=https://debuginfod.elfutils.org/ as an environment variable
Lots of profilers can do this - off the top of my head, Austin, fil-profiler, py-spy, Scalene, and Memray can all do this. Disclaimer: I maintain Memray. I also maintain pystack, which is a non-profiler tool that can get you the interpreter's stack, including C frames, from a core file or a running process. But all of these need debug symbols to be available
Okay, I used deadsnakes PPA but didn't install debug symbols. With debug symbols, perf tool worked. Thx.
I found pystack pretty interesting. Just curious, how's that implemented? Maybe not the details, but the big ideas behind it.
I think python-level tracing is pretty clear, with those builtin profiling modules. But the C-level tracing seems hard
It can't use those - remember, it works on core files and arbitrary scripts 🙂
Much like gdb. A remote process's memory is read using either ptrace or process_vm_readv, a core file's memory is read by interpreting the ELF LOAD segments it contains. From there, the tool uses version specific implementation details of the CPython interpreter to determine the Python stack, GIL state, local variables, etc, and uses elfutils (libelf and libdw) to unwind and symbolify C stacks
And then uses more CPython implementation details to stitch the C stack and Python stack together, at least for modern CPython versions
I see. Lots of things within a few sentences
I'll explore it more if I get a chance. Thanks for the help!
The ELI5 version is that it works much like a C debugger like gdb or lldb does, with a bunch of CPython specific black magic sprinkled in
Hello. I was wondering if there's any reason why heapq functions couldn't be applied to array.array objects? Currently these functions require the heap to be represented as a list, although arrays implement essentially the same interface as lists so it should just work as far as I can tell. Someone was asking in #algos-and-data-structs.
i guess there is a hardcored if not isinstance(arg, list): raise TypeError somewhere inside
iirc, heapq is implemented in python
you can try removing the check, and if it still works you can open issue/PR
It looks like it's just the C implementation of the module that only supports lists. If you block the import of the C parts, so it uses the python implementations of those methods, it appears to work fine with arrays.
should heapq have its own heap class
probably, yeah.
having a different interface to do operations on heaps compared to other collections is awkward
!d queue.PriorityQueue exists but ive never seen people actually use it in place of heapq
class queue.PriorityQueue(maxsize=0)```
Constructor for a priority queue. *maxsize* is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If *maxsize* is less than or equal to zero, the queue size is infinite.
The lowest valued entries are retrieved first (the lowest valued entry is the one that would be returned by `min(entries)`). A typical pattern for entries is a tuple in the form: `(priority_number, data)`.
If the *data* elements are not comparable, the data can be wrapped in a class that ignores the data item and only compares the priority number:
As you can see on the https://peps.python.org/pep-3124/ page the PEP is in a "deferred" state, so it wasn't implemented
Given that it was written 17 years ago, it's more or less dead
"generic functions" as described in that PEP are available as functools.singledispatch
typing.overload is only for static analysis purposes though
I'd probably just make two different functions
Or you could have a single function that accepts an integer or a date as an argument
!pep 744
>>> b''.splitlines(str)
[]
>>> b''.splitlines(print)
[]
>>> b''.splitlines(...)
[]
``` bug?
it is not supposed to take these arguments, it should have raised TypeError for too many arguments
!d str.splitlines
str.splitlines(keepends=False)```
Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless *keepends* is given and true.
This method splits on the following line boundaries. In particular, the boundaries are a superset of [universal newlines](https://docs.python.org/3/glossary.html#term-universal-newlines)...
I guess it converts everything to a boolean
not great but probably not something we can change
that makes sense
Looks like it might have been one of the many functions changed in https://github.com/python/cpython/pull/15609. The discussion in that PR thread is kinda interesting
later discussed in https://discuss.python.org/t/boolean-arguments/21662/14
I’m not sure if that it’s a good idea. The int type has a bool() method. Does it mean that 0 and 1 are considered as boolean? Other examples: >>> (3.14).bool() # float True >>> (5j).bool() # complex True The numpy.bool_ does not inherit from Python built-in bool type. While I’m not surprised by flag=1 instead of flag=True, usin...
I was just experimenting with memory leaks in python, I wanted to learn about how it works in python.
So I made a memory leak program, two functions that leverages two different methods to leak memory
Closure:
def leak_memory():
leak = []
def inner():
while True:
leak.append("Bug")
inner()
return leak
leak_memory()
Global variables
def leak_memory():
global leak
leak = []
while True:
leak.append("Bug")
leak_memory()
And I got some interesting results that I want to learn about.
First thing is that both of these methods take different times to fill up the memory, and from my testing, closure method is a little more than twice as fast as the global variable method.
For the next point a little context is, both of these methods take negligible amount of CPU usage
Second thing is in which I am more interested in is that, when Linux kernel intervenes after the memory is full, the CPU usage peaks.
With the closure method the CPU usage maxes out at 100% and the system freezes for rougly 2 seconds.
But with the global variable method, the CPU usage caps out at about 84%, and the crash is almost instant.
So can someone explain this behaviour to me please? (if you are replying please ping me! and also thank you ♥️ )
What version are you running this on?
closure method is a little more than twice as fast as the global variable method.
That's likely because accessing a closure variable is faster than a global variable
accessing a global variables requires finding an entry by key in the globals() dictionary, while the location of a closure variable (also known as "cell variable") is more static, just like with local variables
then why they take little amount of CPU?
both of these methods take negligible amount of CPU usage
I dunno, takes up 100% of a CPU core for me as expected
3.10
while running, it doesn't take up nearly anything
It is when the kernel intervenes that CPU usage peaks
and can you explain the CPU usage quirk?
as I said, both eat 100% of a CPU core for me while they're running
while running?
Well that's strange
okay so pardon me, my initial observation was wrong. It does take up some CPU usage, 8.3% to be exact. Both methods take up exactly 8.3% CPU usage.
But throughout
There is no going above 8.3% ever
So my guess is this probably means system i.e., the kernel is using the rest of the CPU
does your system have 12 cores?
!e
print(1/12)
@grave jolt :white_check_mark: Your 3.12 eval job has completed with return code 0.
0.08333333333333333
Sorry it doesn't have 12 cores, but 12 threads
the CPU usage is for the entire CPU, not per thread
so 8.3% means 100% of a single thread
this wouldn't be called a memory leak, though. it's just using up all the memory
yes I know that, but I didn't quite get how you got to the next part
What OS are you on and how are you viewing the CPU consumption?
yeah pardon me for that
Debian, using system monitor
alright never mind, got it, I just understood "fix error"s 1/12
If you use top, it will show you the "CPU usage" (in the %Cpu line), that's the whole CPU, all the 12 hyperthreads. But every process line lists CPU usage such that 100% is a single core
so a particularly hungry multithreaded program (like ffmepg) will shows as almost 1200% in the %CPU column
Alright thanks for explaining that
But I don't understand why does system uses max CPU to terminate the program
like all threads
maybe use a profiling tool to figure out what's happening
maybe GC gets heavily triggered around the time it gets killed or something
I am sorry but how do I do that? Do you mean profiling the system or just my program?
profile the program with a C-level profiler like perf
sure
okay thank you
No, the interpreter hasn't lost track of any memory. It still knows about all of those objects, and is prepared to free them as soon as you re-enable the GC
OS always keeps track of any memory, and is prepared to free it as soon as you kill the process
so memory leaks are impossible 😉
OS is the best GC
not totally true - you can leak memory by leaking certain types of resources without closing them, like shared memory and named semaphores. The OS can automatically free "regular" memory allocations when your process dies, but not all memory allocations are scoped to your process's lifetime.
Oops
:incoming_envelope: :ok_hand: applied timeout to @hazy patio until <t:1713128668:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
Hi , could you please help me with the error is giving me
#❓|how-to-get-help please
please check out the channel description and make sure you stick to the topics
ok
you dont really have a photo server
or screenshot server
not server
mb i emant channel
no, but you can send images in any of the offtopic channels.
hi
hi! i've got a tricky case of accessing internal attribute dictionary when self.__dict__ was overridden, how can I access it?
#1230932562499862578 message
one easy method is to use the default pickling methods (3.11+)
class MyClass:
def __init__(self) -> None:
super(MyClass, self).__setattr__("attribute", False)
__dict__ = {}
obj = MyClass()
print(obj.__dict__) # {}
print(obj.__getstate__()) # {"attribute": False}
!e
print(1/12)
@craggy heath :white_check_mark: Your 3.12 eval job has completed with return code 0.
0.08333333333333333
!e
print(1/1112)
@craggy heath :white_check_mark: Your 3.12 eval job has completed with return code 0.
0.0008992805755395684
#bot-commands
:incoming_envelope: :ok_hand: applied timeout to @normal cliff until <t:1713708211:f> (10 minutes) (reason: burst spam - sent 8 messages).
The <@&831776746206265384> have been alerted for review.
Anyone know why async_generator.athrow().close() and async_generator.asend().close() are No-ops? I think they should throw a GeneratorExit into the async generator and raise RuntimeError if the underlying generator doesn't close
i guess x=async_generator.athrow(); x.close() closes x, not async_generator
But then the AsyncGenerator is broken
Because it's ag_running_async=1
Consider calling await anext(async_generator)
And then someone closing the current Coroutine
You'd expect a GeneratorExit thrown into the async generator
I think
Or worse someone throwing GeneratorExit into the coro
python/cpython#118130 wow
python sure is unquestionable about its loyalty to monty python
It doesn't seem to break as far I can tell, it should just leave the asyncgen at the last yield it got to before being closed for further asend/athrow instances to continue.
can some one help me with an assigment im stuggling on
Hello, please read #❓|how-to-get-help
im trying to do it but im stuck
?
and the help channel dosent have what I need so im kinda at my wits end right noiw
How can I get a list of direct referents from the garbage collector? The gc module only lets you get all referents. I'm willing to write C to get only the direct ones.
Is inspect.getmembers() what I want?
sounds like an interesting problem. what are you working on?
get_referents() does what you want: https://docs.python.org/3/library/gc.html#gc.get_referents
I just figured that out.
I'm trying to write a GUI for inspecting references.
@spark magnet
To do what I originally thought gc.get_referents() did, it's just:
def find_all_reachable_objects(*objects) -> list:
seen = set()
stack = list(objects)
reachable = []
while stack:
obj = stack.pop()
if id(obj) not in seen:
seen.add(id(obj))
reachable.append(obj)
stack.extend(gc.get_referents(obj))
return reachable
sounds like a cool project. It could get into some tricky edge cases.
I am experiencing that :)
There are objects for which calling getattr() attempts to import cffi.
Functions also all hold a reference to globals()
So there's really no way to contain the scope without nailing every edge case, but it's unclear if nailing every edge case is even a good idea.
yes, and your code will have references to things.
The correct approach, I think, is bottom up rather than top down.
But then you have the issue of figuring out how to cut off every reference in your own tracing code. Still probably the better approach though.
If it's in the middle of awaits it breaks the async gen
Have you seen pyobjgraph?
I haven't. Just tried it out. Seems like exactly what I'm looking for.
Can anyone plz suggest me data analysis course which is both rigorous and indepth i saw few courses from harvard and they are not in depth and fully explain ain and out of concept
There's some discussion in PEP 584 (https://peps.python.org/pep-0584/#specification). The analogy is with list's + which also only accepts other lists
Hmm that sort of kicks the can doesn't it
Why did list do it then
Associativity reasons maybe? I'm not sure
Both lists and tuples support + and it would be strange if [1, 2] + (3, 4) == [1, 2, 3, 4] but (1, 2) + [3, 4] == (1, 2, 3, 4)
that particular example can be fixed by making the two types know about each other and coordinate on what the type of the returned object should be, but that doesn't scale to supporting arbitrary sequences
!e for an even more fun example: ```py
l = []
l += {1: 2}
print(l)
l = l + {3: 4}
@raven ridge :x: Your 3.12 eval job has completed with return code 1.
001 | [1]
002 | Traceback (most recent call last):
003 | File "/home/main.py", line 4, in <module>
004 | l = l + {3: 4}
005 | ~~^~~~~~~~
006 | TypeError: can only concatenate list (not "dict") to list
would you want [] + {3: 4} to be [3]?
What about {3: 4} + []?
I think commutativity, rather than associativity, is probably the reason. It's weird for a + b to differ from b + a
Fair enough, thanks for the explanation, I do think that's reasonable
"a" + "b" is not equal to "b" + "a"
sure, true. + is used for both sequence concatenation and addition in Python. but at least the result of both of those is the same type
you don't have the ability to do that
yeah ik
but i wrote @ everyone
and then
i got a dm
saying you shouldnt do that
and gave me a warning
ok so why are you talking about this in this channel?
bro i just picked a random channel to talk about it
next time, pick a more general channel instead of a topical channel
good idea
oh wait
so
i havea question
are you gonna ask a question about #internals-and-peps ?
there is no variable channel
so yes
is variable_name same as "variable name"
or
there aren't topical channels for the bare basics of the language
I'd be happy to continue in #python-discussion
I'm conducting this event interested register this
!rule ads
This is a Python server
!pep 667 was accepted, nice. No more PyFrame_LocalsToFast
Hey Everyone!
I just recently updated my windows.
Yesterday, I clicked "update and shutdown" option.
Today when I turned on my laptop, my data was lost as I reset my windows. My lock screen had the same wallpaper. Then I restarted my laptop and my Lock screen didn't have the same wallpaper.
ask at #ot1-perplexing-regexing
damn... #mailing-lists message
anyone have him added or friend w him?
This is off-topic for this channel. If you have some concern regarding this user, you can contact modmail.
Is there an explanation anywhere of why comprehensions are parsed and compiled as their own functions?
!pep 709
Excellent, thank ya ❤️
Though this mostly covers an implementation detail that changed in 3.12. The reason they use separate functions generally is so that they can have their own scopes. Carl's innovation in 3.12 allows the user-visible scoping behavior to stay the same while still preserving the separate scopes.
This was exactly the implementation detail that I was wondering about. It seems incredibly wasteful to create a frame for such a thing.
Has there ever been discussion of a str.isempty method that returns True for empty strings and space-only strings? It's trivial to get with not s.strip(), but isn't very self-documenting.
that sounds like you're describing str.isspace()
which of course is also not a very self-documenting name
that returns False for an empty string
I don't recall any discussion about this
good point
though I think that isspace is as self-documenting as isupper, islower, etc.
half of sys module is in snake_case, half is in nosepcase
!traceback
that_isinconsistent
still better than php tho
oh man you won't like german
We've started adding underscores to env vars in 3.13: https://devguide.python.org/developer-workflow/stdlib/#adding-a-new-environment-variable
I think we should prefer using them for new functions too. But it's less clear when existing ones don't.
For example, 3.13 is adding os.path.isreserved() which isconsistent with the other os.path.is* like os.path.isfile
Should the new one set the future trend as is_reserved?
isconsistent
yep, seems like
Could you instead open a help thread?
Read #❓|how-to-get-help and everyone can help you in your own thread
or at least post images in #bot-commands or an ot channel
i just wanna get code how to add photos there
You can add photos in your own help thread. This channel is not meant for this
i did
Hi Guys, I am new to api development in python. could anyone please suggest me some good resources where can i learn the concepts of api development with python
step 1. go to the correct channel
step 2. check pins
step 3. run !resources in #bot-commands
Dose anyone know how to make a esp for fivem
i believe this is a typo, should be return plain_pager
https://github.com/python/cpython/blob/main/Lib/_pyrepl/pager.py#L26
Lib/_pyrepl/pager.py line 26
return plainpager```
send a PR 🙂
Any landing page idea
Probably off topic for this channel
Ok
You might like to open a help channel with #❓|how-to-get-help
!cban 1176127939088232498 shady advertisements
:incoming_envelope: :ok_hand: applied ban to @burnt flicker permanently.
given a frame frame, would you agree that frame.f_locals is frame.f_globals is an ultimate way of determining whether the frame refers to module-level code, not one inside a function for example? do you have an idea for a better way of examining that?
def inside_f() -> None:
print(locals() is globals()) # False
print(locals() is globals()) # True
inside_f()
EDIT: hm, actually, I want to also include cases like ```py
class Foo:
print(locals() is globals()) # False (obvsly)
since they fire on module execution, so nvm
also, two questions regarding the current type system
- unpacking unions of tuples
don't you think it should be possible to unpack same-size unions of tuples?
currently, it's not possible to do something like
# sorry, I'm writing for 3.8+ here since OSS forces me to... sort-of
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
# OptExcInfo is a union of tuple[None, None, None] and tuple[type[BaseException], BaseException, types.TracebackType]
from _typeshed import OptExcInfo
from typing_extensions import Unpack
class CM:
def __exit__(self, *args: Unpack[OptExcInfo]) -> None:
pass
- unpacking typevars
I found a use case for unpacking typevars when I want to annotate conditional signatures.
like here: https://github.com/bswck/configzen/blob/3c605308c544709b4a719012d728dcf47433201c/configzen/data.py#L93-L97 (yeah, I know "mypy" isn't the one to blame)
configzen/data.py lines 93 to 97
# Unpack[DataFormatOptionsType] cannot be used here,
# because this functionality is not supported by mypy yet.
# Override the **options annotation in your subclass of DataFormat with
# the subclass of DataFormatOptions corresponding to your subclass of DataFormat.
def configure(self, **options: Unpack[DataFormatOptions]) -> None:```
@feral island do you think it's worth a pep? "Extending typing.Unpack capabilities"?
the use cases are I think quite niche, but on a more advanced level would be very cool and I don't think they're very hard to implement...
and actually if there's a more useful thing that needs someone to pick up and contribute a solution, let me know...
Yes, I think this would need a PEP. The TypeVar part I have seen other requests for, so that may be useful for more people.
There's a lot, mypy in particular has a ton of features it hasn't implemented support for yet (e.g., LiteralString, @deprecated, PEP 695 syntax)
oh, great! I'll pick up some then. thanks for such a quick reach out!
though I'd caution that for mypy the limiting factor is often reviewer time. I'll try to review PRs especially related to support for new typing features, but in many parts of mypy I'm out of my depth
if it's welcome, I'll review when I have time
definitely!

LOAD_NAME is a useful opcode too
That sounds like you're doing something very advanced that needs a careful understanding of the bytecode (a debugger?)
It's going to be hard to suggest an alternative without knowing more about what you're trying to achieve
yes, loading a different global should work with changing co_names. Other changes to the bytecode would need more work. Possibly the dis module can help but it's going to get hairy
you could technically abuse some other opcodes to load globals by abusing where the globals dict pointer is stored in a given frame, but it would be unstable
https://paste.pythondiscord.com/WW7Q this is an example from 3.11 of using LOAD_FAST to get the globals dict off the frame.
is there a way to keep track of status of pep 703?
There's discuss.python.org where some topics might be, and you can watch the pre-releases of 3.13. beta 1 came out this week.
oh ok - so it's targeted to 3.13...thx
3.13's version won't be at performance parity yet I believe
It won't be done in 3.13, it definitely won't be the default in 3.13. But there's progress being made.
https://github.com/python/cpython/issues/108219 is the top-level tracking issue
It'll be done in three phases, each taking at least one release: https://discuss.python.org/t/pep-703-making-the-global-interpreter-lock-optional-in-cpython-acceptance/37075
Maybe it would be helpful to link to the tracking issue from the pep?
this sounds like it runs counter to the server rules.
Most PEPs are regarded as historical and don't get updated (much)
Right, but most are also not multi-year projects.
When did async generators stop being provisional? They were released provisionally in 3.6
PEP 525, Asynchronous Generators (provisional)
Hello Everyone. I am new to python and coding. Could anyone suggest me a good source to learn python briefly and quickly. I also wanna learn pandas in python.
Welcome rthk 👋, this is not the right channel to ask questions about these things, but you can check out this https://www.pythondiscord.com/resources/ for resources we recommend
I'm only now learning that if you add an hour to a python datetime, you don't necessarily get a datetime an hour later 😢
Yeah
python's interpretation here is ultra weird, imho. it just increments the clock by whatever amount, as though DST didn't exist
So like, if you decide to add exactly 30 days (i.e. 30 * 24 * 60 * 60 seconds, not incrementing the date 30 times) as as time delta to a typical zoned time in late february
what you get is probably not what you want; and the more you understand timezones, the less likely it is to actually be what you want (I would say)
it's incredibly unfortunate that python doesn't have separate classes for naive, and timezone aware, datetimes, imho
i dont understand timezones at all, so i think this is perfectly what i want 🙂
hah
its' just not going to be that time interval later, which is pretty bizare. when people do stuff with timedeltas, that's generally what they expect
a pretty good writeup generally I think: https://dev.arie.bovenberg.net/blog/python-datetime-pitfalls/#1-incompatible-concepts-are-squeezed-into-one-class
It’s no secret that the Python datetime library has its quirks. Not only are there probably more than you think; third-party libraries don’t address most of them! I created a new library to explore what a better datetime library could look like.
can you elaborate? Are you talking about naive datetimes, or aware datetimes? Are you talking about adding a timedelta?
!e This works the way I'd expect: ```py
import datetime, zoneinfo
dt = datetime.datetime(2024, 11, 3, 1, tzinfo=zoneinfo.ZoneInfo("America/New_York"))
print(dt)
print(dt + datetime.timedelta(hours=1))
@raven ridge :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | 2024-11-03 01:00:00-04:00
002 | 2024-11-03 02:00:00-05:00
Not when you cross a DST change boundary
Oops sorry
Didn't see your last message
that's what you expected as well, then - right?
I thought that the jump occurs between 2 and 3
The simplest way really is just to add a day, on the Saturday before dst
You'll get the same time, but on Sunday, which is IMHO pretty much not what most people experienced with such libraries expect
the jump occurs at 2 AM exactly - so 1 hour after 1 AM EDT is 1 AM EST
Right
so, yeah, I agree with you - the behavior isn't what I expect, and somehow I never noticed that... 🤯
what happens when DST moves time 1 hour back?
lets assume today is a normal day, and tomorrow DST will start
what time will be after 23:59? 23:00 of the same day?
Exactly
When dst moves an hour back then you repeat the same hour twice
And you have a one hour range of times that are ambiguous
There's no way to unequivocally map them back to UTC
so there are two moments that are denoted using two indistinguishable times?
Yes
i cant express by emotions and feelings right now...
So, in New York time zone for example, in November there's a Sunday where you repeat 1am twice
no, they're distinguished by the timezone offset
at least for timezone-aware datetimes
eww
1 hour after 1 AM November 3rd EDT is 1 AM November 3rd EST
I mean, sure, "eww", but that's how DST works
The time zone offset isn't specified though anywhere. Usually the model is that the time zone offset is an output, not an input
it's an input via fold
sure
Anyhow I think personally this behavior is just dead wrong
There's 3 choices a datetime library can make her and this is by far the worst one
C++ seems to just disallow arithmetic on zoned times, afaict
Which is probably my preferred choice, though I suspect a lot of developers first reaction to that will be to complain about verbosity
As far as I can see the most reasonable mix of correctness and maturity is pendulum
Pendulum though repeats the really awful mistake of having a single type for both naive and aware datetimes
The best option is to not use datetime objects at all. Store the original text input you got from a user about their time and localize it on demand. Do any time shifting yourself. This avoids most of the problems, including the one where you stored a timestamp and not the time they specified + their timezone to then between when they gave you that input to schedule something, and months later a government decided "actually, DST is postponed a week" making the precalculated timestamp wrong.
Uh, for anyone reading, that is really terrible advice, especially "do the time shifting yourself"
It depends. One of the problems with time is that in different contexts, we mean different concepts by it. Sometimes it's "a globally unique point in time", sometimes "a human-specified label for an event time".
If it's the former, ideally you convert into UTC at the earliest possible moment, and convert to local time for display purposes only.
If it's the latter, then it depends.
For example, if I go on a transatlantic business trip, I want my online meeting times (which didn't change) to just be displayed in local time. But my alarm clock shouldn't be displayed in local time, but be reinterpreted in local time.
And for the last case @urban sandal mentioned he is absolutely right. Although I might save the entry as a combination of naive datetime (event time) and aware datetime (scheduling time) instead of a string in that case.
TL;DR: Time is complicated not because of Python design decisions, but because of the problem domain being complex.
python's design decisions are why people should do it themselves with understanding of the problem domain and an understanding of what is the right approach for their application. The design decisions rapidly fall apart the moment you are handling a human notion of time rather than a machine notion of time.
there would never be a single correct way to handle this problem that works for all kinds of applications, but there's also no way to fix it now, the presence of operators with expected behaviors, rather than functions which can be chosen for the correct kind of behavior needed ("should this be added absolutely, or added as natural wall time, and how to handle ambiguous timezones?") somewhat precludes fixing it. It isn't something that a single math operator can handle for all use cases.
Gonna add a clarification based on a dm I got about this: doing it yourself does not need to mean reinventing the wheel from first principles, but the moment you care enough about timezones and user perceptions of time to start looking into aware datetime objects, you should consider how much the tools you have already do what is right for your use case, and which behaviors if any you should create something else for.
Its also perfectly okay to decide "im okay with the behaviors other people picked for this" even if it isnt perfect. Design vs time tradeoffs are a fair and valid concern
Nobody said time is complicated because of pythons design
Time is complicated, and datetime has some really bad design choices
Believe me that me, and the author of the linked article are aware of this one paragraph shpiel about different concepts in time. I'd suggest focusing on the specific points
And you said yourself that you wouldn't store as a string. Would you "shift yourself"?
For anyone else following along, this is the article quicknir linked earlier: https://dev.arie.bovenberg.net/blog/python-datetime-pitfalls/#1-incompatible-concepts-are-squeezed-into-one-class
It’s no secret that the Python datetime library has its quirks. Not only are there probably more than you think; third-party libraries don’t address most of them! I created a new library to explore what a better datetime library could look like.
Can anyone guide me how do I start my DSA in Python
Hello again, I've recently come across the option for enabling the experimental JIT compiler, but the parameters it accepts is making my head swim:
For configure.ac:
no (aka --disable-experimental-jit with no argument)
yes (aka --enable-experimental-jit with no argument)
yes-off
interpreter
interpreter-off
For Windows build.bat
--experimental-jit
--experimental-jit-off
--experimental-jit-interpreter
--experimental-jit-interpreter-off
configure.ac also says that interpreter-off is apparently a secret option as well. But that doesn't mean much to me, since I have no clue what it does. What does each option actually mean? Even more confusingly, when I run build.bat with --experimental-jit-off the build seems to try to build the JIT even when I specifically said not to include the JIT, I'm not sure whether that's a bug or whether it's just me being stupid
perhaps this will help? https://discuss.python.org/t/pep-744-jit-compilation/50756/42
experimental-jit-off is likely yes-off:
Build the JIT, but do not enable it by default. PYTHON_JIT=1 can be used to enable it at runtime.
ahhh, so I was being stupid after all
Thanks!
I'll admit I was completely thrown off by "JIT interpreter"
well, at least at first
I'm looking for core contributor reviews of my PR to expand pickle importing for PyModule_Create modules to solve a long-standing issue in PyTorch: https://github.com/python/cpython/pull/119152
I've created a discussion post as well: https://discuss.python.org/t/request-for-review-of-gh-87533-expand-pickle-importing-to-support-non-package-c-modules/53833
Looking forward to receiving feedback and hopefully solving this issue once and for all!
There have been recurring issues with PyModule_Create modules in PyTorch. When trying to serialize attributes of these C-modules, pickle fails to import the C-module because they are not a packages...
PR: Expand pickle importing to support non-package C-modules I am looking for reviewers for a patch I desire to how pickle loads non-package modules, like those created via PyModule_Create. Erroring while trying to serialize PyModule_Create attributes has been a recurring issue in PyTorch for at least 4 years: Functions in torch._C._nn and to...
I'm not a core dev, but I'm a bit confused why you're treating this as an issue with pickle, rather than an issue with import. Or, really, with pybind11 - it should be creating finders on sys.meta_path so that the import works - https://stackoverflow.com/a/52714500 per https://github.com/python/cpython/issues/87533#issuecomment-1093904563
It seems to me that teaching pickle to import modules that can't be normally imported is worse than fixing those modules to actually be importable
I understand your argument. I'll look into what a solution like this might look like. Thanks for the links!
It seems to me that the issue is that torch is creating things that are "modules" in one sense (instances of PyModule_Type) but not in another much more common sense (things that you can import from)
Instances of types.ModuleType are exactly analogous to this
I mean that modules are supposed to be used with import, but they've created modules that can't be. That's the weird thing here. Pickle is just a victim of that. It's doing a thing that's supposed to work, but it doesn't, because these aren't modules in the normal sense of the word
<@&831776746206265384>
I've been playing around with str.translate and it's getting weirdly high runtimes when a character is mapped to "", any idea what's going on and why it's so much slower than the None case or replacements with 1 char strings strings? It's also not particularly fast when the replacement strings are longer but that's somewhat expected
In [24]: a = "abc"*50_000
In [25]: %%timeit t = str.maketrans({"a": None})
...: a.translate(t)
76.4 µs ± 807 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [26]: %%timeit t = str.maketrans({"a": ""})
...: a.translate(t)
5.83 ms ± 43.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [27]: %%timeit t = str.maketrans({"a": "", "b": ""})
...: a.translate(t)
3.66 ms ± 6.28 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [28]: %%timeit t = str.maketrans({"a": "", "b": "d"})
...: a.translate(t)
3.88 ms ± 54.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [29]: %%timeit t = str.maketrans({"a": "e", "b": "d"})
...: a.translate(t)
58.6 µs ± 500 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Seems like the "" case isn't optimized and is treated as a regular character-to-some-amount-of-characters translation. Might be helpful to add this special case here as well: https://github.com/python/cpython/blob/5adf78f546a5dc3f5b8eeaa209a2e8437ae96ac8/Objects/unicodeobject.c#L8692-L8695
Objects/unicodeobject.c lines 8692 to 8695
if (item == Py_None) {
/* deletion */
translate[ch] = 0xfe;
}```
Cool
!rule
The rules and guidelines that apply to this community can be found on our rules page. We expect all members of the community to have read and understood these.
why is```py
issubclass(ZeroDivisionError, ValueError)
False
!e print(ZeroDivisionError.mro())
:white_check_mark: Your 3.12 eval job has completed with return code 0.
[<class 'ZeroDivisionError'>, <class 'ArithmeticError'>, <class 'Exception'>, <class 'BaseException'>, <class 'object'>]
isn't ValueError for giving invalid arguments
so that covers division by zero as well?
it's impossible to exactly categorize every error. Is it a problem that ZeroDivisionError isn't a subclass of ValueError?
Yeah, it doesn't matter too much. If you want to handle an error from a certain operation, you need to go to the documentation and see what exceptions it raises.
by extension, TypeErrors, IndexErrors, and KeyErrors could also all be considered categories of errors that stem from an invalid value being passed to something somewhere
fair enough
cant you do except (ZeroDivisionError, ValueError):
we don't know what they needed to do
why aren't variables as keys invalid but valid for values?
!e ```py
sample = (0, 1, 2)
sample2 = {sample: "one", 1: 0, "henlo": {0: 1, 1: 0}}
match sample2:
case {sample: work, 1: 0, "henlo": {0: 1, 1: 0}}:
print(work)```
:x: Your 3.12 eval job has completed with return code 1.
001 | File "/home/main.py", line 6
002 | case {sample: work, 1: 0, "henlo": {0: 1, 1: 0}}:
003 | ^
004 | SyntaxError: invalid syntax
!e ```py
variadic = {0: 1, 1: 0}
sample = {0: 1, 1: 0, "henlo": {0: 1, 1: 0}}
match sample:
case {0: 1, 1: 0, "henlo": z}:
print("0:1, 1:0, henlo:", z)
:white_check_mark: Your 3.12 eval job has completed with return code 0.
0:1, 1:0, henlo: {0: 1, 1: 0}
You can find an explanation in the docs for the match statement: https://docs.python.org/3/reference/compound_stmts.html#grammar-token-python-grammar-key_value_pattern
If duplicate keys are detected in the mapping pattern, the pattern is considered invalid.
and
Key-value pairs are matched using the two-argument form of the mapping subject’s get() method.
.get definitely needs an existing key to work
I think krrt is asking why the first example isn't equivalent to ```py
case {(0, 1, 2): work, 1: 0, "henlo": {0: 1, 1: 0}}:
simply because it doesn't make sense at first glance why the key part should be a Load operation instead of a Store like all the other cases that a single name is used
Yes, that's probably why it works like this. {x: x} would be especially confusing
Basically this
Since I was able to use z as a value
>>> import gc
>>> class X: ...
...
>>> x = X()
>>> gc.get_referents(x)
[X]
>>> x.a = 1
>>> x.b = 'f'
>>> gc.get_referents(x)
[1, 'f', X]
>>> x.__dict__
{'a': 1, 'b': 'f'}
>>> gc.get_referents(x)
[{'a': 1, 'b': 'f'}, X]
>>> x.__dict__ = {}
>>> gc.get_referents(x)
[{}, X]
>>> x.c = 42
>>> gc.get_referents(x)
[{'c': 42}, X]
what is going on?
why isn't x.__dict__ always mentioned in gc.get_referents(x) ?
why does accessing x.__dict__ make it appear in gc.get_referents(x) ?
instance dicts are lazily materialized. Not sure whether there's good documentation of this, but here's a related issue: https://github.com/python/cpython/issues/106485
>>> a = 10
>>> b = 3.1415
>>> c = True
>>> d = False
>>> print(f'{a =:>10}')
a = 10
>>> print(f'{b =:>10}')
b = 3.1415
>>> print(f'{c =:>10}')
c = 1
>>> print(f'{d =:>10}')
d = 0
>>> print(f'{c = }')
c = True
>>> print(f'{d = }')
d = False
why do booleans get converted to int here?
bool is a subclass of int, and you're applying int formatting
use !s before the colon
this will apply formatting as a string
and can do !r for repr
!e print(f"{True = !s:>10}")
:white_check_mark: Your 3.12 eval job has completed with return code 0.
True = True
sweet, good to know 😄
Has anyone here actually used name mangling? I've only seen it used by students whose instructors have misguided ideas about Python OOP. I've never actually seen it used "correctly".
Wdym? You've never used a leading underscore once? Or does that not count
no, only double leading underscores (and no trailing) does name mangling.
Hm then I guess I haven't
I thought a single leading would cause it not show up in dir
It does. Double underscores in a class body cause the name to be mangled, which provides some protection against reuse of the name in subclasses
And no, I never use it on purpose either
Til
I don't think I've ever wrote something that was meant to be subclassed publicly to worry about this. In other words if I wrote something that supposed to be subclassed then I will do the subclassing and therefore know not make the mistake of overwriting
I see only two situations when you might want to use mangled names:
- if codebase you manage has pretty complicated inheritance tree, and there is a chance of accidental overriding of a name
- if your class is meant to be subclassed by user of your library
and you dont need mangled names in these cases:
- if your inheritance is so complicated, you are probably doing something wrong. And methods that you want to make mangled probably can be free functions as well
- usually in situations like this you are supposed to override only specified set of methods to provide necessary behaviour, and base class will act like a convenient glue that will make magic happen. Putting too many in such classes is probably not a good idea, you can extract everything else into different class/free functions
Sometimes there is a name that should be private, must not be accidentally overrided (otherwise it will break everything), and should be accessible from several classes that need it.
This name cannot be mangled, it will make it inaccessible from several classes, so one of the best solutions is to mangle it yourself, using library name instead of class name, like this: _mylib_crucial_method or this: __mylib_crucial_method__
this happens pretty often in many libraries:
__mypyc_XXX__inmypyc-generated code__array_XXX__innumpy__dataclass_XXX__indataclasses__attrs_XXX__inattrs__ctypes_XXX__inctypes
Here's a great explanation of the purpose of name mangling and how it's supposed to be used.
https://www.youtube.com/watch?v=HTLu2DFOdTg&t=2035s
The whole talk is well worth watching, but this links directly to the part that talks about the rationale behind name mangling. (There's a minute or two of buildup.)
Raymond Hettinger
This is a short, but thorough tutorial on the Python's built-in toolset for creating classes. We look at commonly encountered challenges and how to solve them using Python.
Hi! May I ask a question about development of python itself?
#dev-contrib message
I'm not familiar with that area but it's unlikely that we'd change the default on all platforms just because WASI doesn't support something
That would be a compatibility break
More likely we'd simply document that WASI has different behavior
Current issue is that if we leave the default value True, then in GitHub's CI/CD test, all instance with os.link() will fail (and imo I think it should fail because follow_symlinks=True is not supported), so should we really change lots of test code?
https://github.com/python/cpython/actions/runs/9325118442/job/25671590828?pr=119886
possibly we can make the default platform-dependent
but again, I don't know enough about this area to have a confident opinion
Usually the outcome of something like this is to skip the test on the exceptional platform I think?
Skip ALL os.link related test in WASI? Seems a little bit drastic,
And is just leave os.link(src, dst) always a error in WASI a good choice? I may think not.
If so, currently only you write os.link(src, dst, follow_symlinks=False) make it work in WASI.
There's another option: implement the logic required on WASI to follow the link, and error if it can't be followed and following was requested (wasi doesn't support absolute symlinks last I checked)
!pastebin
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
how does the cpython parser's memoization work exactly?
Is there anyone we should want to nominate to the PSF board?
why is it called a roadmap if there is no road
Where does async fall under this? Its one of the topics im not familiar with at all in python. I've had no reason to learn it and wouldn't even know how to start.
No web framework, network automation, and the DSA part was more using those data structures than developing them. What does the DSA portion even mean. Like you've developed them or just used them.
This roadmap is... confusing
it doesn't seem on topic for this channel either way. And it doesn't seem particularly universally applicable. I've been a professional Python programmer for a decade and only have experience with about 3/4ths of these things
I've been avoiding regex for a decade and this roadmap won't stop me
The "Advanced" section also seems more miscellaneous than anything
also, putting expressions under "Advanced" seems odd?
putting "Exceptions" under "Basic" also seems odd to me
i guess they are mixed up with "Expressions" 😄
I mean, knowing how to read error messages is basic.
Been working with Python for last two decades
Bro this is learning way if you confusing in python about what I should learn first? This will help to clear cover all topics, first see basics and after see advance. Learn and understand all the topic develop new things. This procedure can help for beginners. 👍
Basic and advanced both are same
That is help to know some procedure that's all
And others are packages, libraries, concepts etc..
yeah but this is not where to discuss roadmaps
goto #pedagogy
guys i have learned basics of python but due to my college exams i have not practiced since two months so how should i start over again ?(i have my notes though)
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
this is not the best place to ask questions like this
use #python-discussion
ok cool
Good night,
I've created a virtual environment, installed pandas, and I'm trying to import it, but it's not working. Any insights? I'm just a newbie here.
probably you need to choose "my venv" in editor, you are using Python 3.115. Try to click it 🙂
came back just to that, just found it. LOL
manhy thanks @sage gust
No problem, bro 😉
hello, please let me know if wrong channel. I am trying to make a tool that instruments over all sources of external data in a program, writing them to an external database / log store. Examples are anytime a socket is read, file is read, current time is read, current memory usage read, etc. The goal is to be able to reproduce the state of a python program (especially long-running, such a server) at any time from the smallest data footprint. Hence why I only want to store external inputs into the system, intermediate pure computations don't need to be stored. At the same time, I need to make sure I capture every possible source of external data, which in some cases will be at module initialization and therefore at import time. What would be the best layer to do this instrumentation? The options I see are 1) at the python level, looking out for standard library functions that introduce external state, 2) looking at the python bytecode level, or 3) looking at linux system calls. This would be a decorator, so I can work with the target function as an argument.
It's going to be very hard to do this fully reliably from the Python level, since there are many different ways an application could ultimately do IO. One option would be to look at audit events (https://docs.python.org/3/library/audit_events.html#audit-events). Using monitoring of system calls could also work.
This table contains all events raised by sys.audit() or PySys_Audit() calls throughout the CPython runtime and the standard library. These calls were added in 3.8 or later (see PEP 578). See sys.ad...
Thank you. I'll look at both these options. I imagine with either option though, getting the actual data read from the IO will be difficult right? At a glance, many of the audit events seem to track only metadata.
Yes, I think the audit API is designed to tell you that an event occurred but not necessarily what it returned
I guess it depends on how robust you want this to be, but I don't think anything you do at the Python layer can really do this robustly.
Makes sense, appreciate the help.
if you want to see when the app has read the time, and what time it read, I don't think any of the 3 layers you proposed will do the trick. Looking at the Python source code or Python byte code, you'd have no idea if the Python interpreter itself read the clock (maybe as part of initializing a PRNG or something). Or if a native extension module like tensorflow did.
And reading the current time doesn't necessarily cause a system call on Linux. It makes a call into the vDSO. I maintain a profiler (Memray) that works by intercepting PLT entries pointing to functions it needs to detect calls to. I think that's the approach you'd need to take - you'd need to patch the executable and every loaded shared library to call your function instead of the vDSO to get the current time, and your function would call the vDSO and then note the result it got. You would also need to intercept the syscalls - they still exist, and something could choose to use them instead of using the vDSO.
you might also want to look into time travel debuggers for inspiration. Time travel debuggers need to detect IO so that they can replay it if you rewind and then redo
you might also be able to get something working using eBPF
Thank you, I'll look at memray too for some insight. Does the linked man page include every syscall with the vDSO specialization? There's not too many, and most of them are time related, so to get an initial MVP I could perhaps eat that loss in robustness by just sticking to syscalls.
yes, that is the complete list. In principle more can be added at any time, but in practice the things that have been added historically have been things that applications tend to call very, very often
as for the sys calls, I found ptrace which allows intercepting them, w/ access to both data and metadata. i'll see if the time travel debuggers also use that. thanks for all the guidance
ptrace is one way to intercept syscalls. The usual downside to it is that you can't be selective - it fires for every syscall, which slows the program down quite a lot. But if your goal is truly to see all IO that the program is doing, that seems like it's not a downside for you at all - there are practically no syscalls you would want to ignore. So yeah, ptrace sounds like it might be just the thing.
eBPF is another way to do it
also - do you know about strace?
if all you need is logging, you might be able to get literally everything you need to know using strace -o strace.out -s 1024 python yourprogram.py
@raven ridge I'll also take a look at eBPF and strace, Logging is one component, and it'd be the default (run asynchronously, with no strict schema) for all IO in calls such as data = sock.recv(64). However, if user does data: Annotated[bytes, InDatabase()] = sock.recv(64) then I would create a relational DB table ahead of time with two fields, timestamp of talking to the socket and the bytes data. Then, would write to it on each execution and would actually want to block the declaration of data until after its written, to get the same blocking behavior as if the database persistence was written imperatively. So ultimately I might settle on a mix of solutions
Does Py_BuildValue("s", str) (which calls PyUnicode_FromStringAndSize) intern the python string? How can I convert a cstring that will for sure be interned?
I have just discovered PyUnicode_InternFromString.
Is there an ABI stable way to create a class from inside a C extension?
!rule 5
5. Do not provide or request help on projects that may violate terms of service, or that may be deemed inappropriate, malicious, or illegal.
Hello, your message was removed for violating server rules.
has anyone worked with power apps and model driven apps before?? I need help
Wrong channel I believe
wrong channel
I'm curious, do the new copy and patch JIT and tier 2 Interpreter make use of the API introduced in PEP-523? https://peps.python.org/pep-0523/
I believe "no", but that's a question for @final geode if ever I saw one!
They don't, no
Why's that?
Nope! It piggybacks on PEP 659 (specializing interpreter). Tools using PEP 523 actually inhibit a lot of the optimizations we do, like tracing through function calls.
Does any tooling exist for running down refcount issues in C extensions?
Too many increfs or too few decrefs will lead to a leak. You can use gc.set_debug(gc.DEBUG_LEAK) or a memory profiler (like Memray, which I maintain, or valgrind) to try to figure out which objects are being leaked.
Too many decrefs or too few increfs will lead to objects being destroyed before you meant them to, and use-after-free bugs. That'll likely cause crashes, hopefully relatively soon after the problem occurred. You can catch some of those with the PYTHONDEVMODE=1 environment variable, or with python -X dev. You might catch more of them with PYTHONMALLOC=malloc+debug. Valgrind can help here. If you're on Linux, export MALLOC_CHECK_=3 can help, too.
these tools aren't going to get you any further than telling you which object's reference count was mismanaged, where it was first created, and where it was destroyed (if it was destroyed and you've got a use-after-free bug). Once you know which object's reference count was screwed up, you'll need to audit all your code that deals with that object to figure out which piece of code was responsible for screwing it up
Memray is exactly what I was looking for, thanks.
sorry, you can ignore the nuitka column, but is there a reason 3.7 seems to be faster re: for loops vs other python versions?
the numbers are in seconds in runtime
i dont see anything that might cause this on "What's new in 3.8" page
like for loops in general are faster in 3.7?
sorry, might not be for loops in general
https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_concurrent_imap/run_benchmark.py#L13-L22
pyperformance/data-files/benchmarks/bm_concurrent_imap/run_benchmark.py lines 13 to 22
def bench_mp_pool(p: int, n: int, chunk: int) -> None:
with Pool(p) as pool:
for _ in pool.imap(f, range(n), chunk):
pass
def bench_thread_pool(c: int, n: int, chunk: int) -> None:
with ThreadPool(c) as pool:
for _ in pool.imap(f, range(n), chunk):
pass```
pyperformance/data-files/benchmarks/bm_concurrent_imap/run_benchmark.py lines 9 to 10
def f(x: int) -> int:
return x```
the only obvious difference between 3.7 and future versions is that for uses SETUP_LOOP and POP_BLOCK, but 3.6 also has them and is still slower
why isn’t _typeobject part of the limited abi?
we want to be able to change structs
"change" how? PyObject is part of the limited abi, though only ob_type and ob_refcnt are exposed. Couldn't PyTypeObject also be exposed, to allow direct access to its slots?
adding PyTypeObject to the limited API wouldn't make it impossible to add new fields to PyTypeObject in the future, it would just be a guarantee that some subset of the fields that currently exist would always exist
and that guarantee is already sort of being made via the slot IDs - PyTypeObject.tp_as_number.nb_int isn't part of the limited API, but the limited API is still guaranteeing that it will always exist because Py_nb_int is part of the limited API and defined to be used to get/set PyTypeObject.tp_as_number.nb_int
I guess the question boils down to: given that the type slots are guaranteed by the stable ABI to exist, would it be too onerous to keep their offsets within the structure stable?
I guess that would prevent moving them out of the struct for some reason. We’ve already had weird situations where we’ve needed to make some static type state (like the list of subclasses or weakrefs) per-interpreter, and having it be a member of the struct itself is problematic.
Granted, I’m not an expert on a lot of this C API stuff, but in general I don’t think “let’s do it because we can” is a great motivation for this sort of stuff. To turn the question around: would it be too onerous to keep the structure unstable?
The cost of keeping it unstable is that the way to do lots of things with the limited API is different than how they'd be done with the unstable API, which slows adoption of the limited API. There's lots of docs about how to do things with the unstable API, and most of those docs don't also say how to do things with the stable API, so every difference is a point of friction that makes it harder to learn to use the stable API
Eh. I’d argue that we should probably be moving away from getting and setting PyTypeObject members directly. Heap types have lots of advantages, and the few cases where touching type members is needed (tp_dealloc comes to mind) should probably have dedicated functional APIs to abstract them away.
I get what you’re saying though.
Perhaps it'd be good to take a pass over the examples in the C API docs to prefer using PyType_GetSlot over direct slot access where possible. That'd be another way to bring the two APIs closer - encourage using the limited API subset even wherever possible
Yeah, even if only to call out that the limited API approach exists and provide an example.
When you step back and think about it, it's quite weird that the unstable ABI that changes yearly is documented as the default way to do things, and the stable ABI that lets you build forwards compatible extension modules is the poorly documented alternative...
Hello! I'd like to contribute to Cpython. Where do I start? Is there newcomer friendly tags in the issue tracker?
I've put together an introductory blog post on building CPython with the JIT (for POSIX systems), and looking at its guts with PYTHON_LLTRACE and -X pystats - would anyone be interested in taking a quick look and making sure I'm not saying anything outrageously wrong? @final geode
I've thrown a preview export of the post up on an S3 bucket for now 🙂
How to Build and Run CPython's New JIT Compiler
Awesome! I'll give it a read in a bit.
i'm trying to figure out if pep 234 cares if an iterator is destructive or not -- i have an api that essentially iterates records in a paginated table via rest -- and what is the pep expectation on holding old pages in memory or not. Like, what is right -- and how would one augment or change this behavior? like if my iter today starts being destructive when yesterday it wasn't... big change
What do you mean by destructive?
meaning if you can iterate the object twice or not, in 234 it points out file's are like this (or, to quote, "some iterators are destructive: they consume all the values and a second iterator cannot easily be created that iterates independently over the same values.")
ah
I guess it's more of a property of an itera_ble_ than an iterator
In case of a file the file is an iterator, so it has to be non-reusable in iteration
(unless you call file.seek(0))
would that imply the expected behavior of all iterables is that you 'drop' previous values once consumed?
i'm mostly confusing myself, i think.
e.g. if you have a list foo, you can call iter(foo) as many times as you want to iterate over it. You can't do the same with (arbitrary) file objects
It's not really a property of the iterator itself
fair, but if I have an object bar which is an iterator... or an inerable?
wdym?
If you're writing something from scratch, I'd probably make it non-rewindable unless you need that feature specifically. Users of your code can save the results of your iterator if they want to rewind them. For example with more_itertools.seekable
say class Bar: __iter__(): blabla __next__() blabla -- i could for x in iter(Bar()) all day but only for x in Bar() once?
an __iter__ of an iterator should return itself
i.e. ```py
def iter(self):
return self
maybe this is where i shot myself in the foot, then:
def __iter__(self):
self.__is_iter = True
self.rewind()
return self
Yeah that would be confusing
follow up question, is doing with Bar() as b: to have non-rewind iteration an offense of sensibilities?
what do you mean?
meaning it'd look something like
def __iter__(self):
if not self.is_in_with_statement:
self.rewind()
return self
The reason this would be confusing is that iter() is usually expected to be a no-op for an iterator. Suppose you have a function/class accepting an iteratble, like map. It will definitely call iter() on the argument. So if you do e.g. py def deal(cards): it = iter(cards) burned_card = next(it) hand = itertools.islice(it, 5) print(burned_card, hand) it would be really surprising if islice also included the "burned card" in the output, which would happen if cards.__iter__ did some rewinding
yeah, seems like the true way forward is a major version release and breaking backward compat here.
If you're changing the behaviour, it is a breaking change yeah
Well, what is and isn't backwards compatible is a philosophical topic... but going from rewinding to non-rewinding and vice versa seems like it
yeah, it's just... do i honor the PEP or just have a quirk. Unless you know of some PEP on seekable/rewindable iterables...
the __iter__ of an iterator is required to just return self. See https://docs.python.org/3/glossary.html#term-iterator -
Iterators are required to have an
__iter__()method that returns the iterator object
Another interesting env var is PYTHON_OPT_DEBUG=4 (this ranges from 1 to 4), which tells you what the optimizer is doing
Ah, thank you! I think I had confused some of the uses of DPRINTF in, say, optimizer_analysis.c (which seems to rely on PYTHON_OPT_DEBUG) with the DPRINTF in ceval.c which relies on the global lltrace level... if I'm reading those right?
Yup those are different DPRINTFs !
I'll clean that up 😅
I wonder if Python is going to follow in the footsteps of PHP in terms of JIT
PHP initially also directly emitted native code straight from it's opcodes, passing it to the DynASM backend
but after a while they ripped out the code that directly emitted native instructions and moved to an IR framework more akin to traditional JITs
This has been a problem for me too in the past
there's also no good way to type that afaik
I remember reading this discussion about that topic: https://discuss.python.org/t/repeatableiterable-type/42106 (and in the linked stackoverflow question)
but it's not very comprehensive
the general consensus, i think, was that an Iterable is not promised to be repeatble (but it's okay for it to be) but a Collection usually is
Especially the statement
The whole point of Iterable is that you can get an iterator on it, and arguably it was a mistake to make Iterator support Iterable in the first place. It's not perfect, but is simple, which is usually more "pythonic" than a more technically correct annotation that is very complicated.
Which is very opinionated but I haven't found any more discussion about it unfortunately
maybe some typing experts have an opinion on that though
it feels like it's just open to interpretation
i think a collection pretty much always is, fwiw
not sure where this is written "formally"
I don't think any type checkers or linters care about it since its not formal
it would be nice to be able to catch errors like that
Well, eventually a person needs to verify something 😛
a type checker (in python at least) also cannot verify that Mapping isn't mutating stuff
even though that's clearly against the spirit of the Mapping interface (which is why MutableMapping exists)
but surely no type would be so insiduous as to syntactically satisfy Mapping while actually performing mutations - oh no, wait, defaultdict says hi 😂
yeah yeah obviously there are always bypasses i just hoped there would be a way to signify the type of inp here
def iterate_twice(inp: ...):
for a in inp:
pass
for b in inp:
pass
such that if inp is NOT a subclass of this type (for example Iterable you would get a warning
i mean i know theres no way to promise that it WILL work i just hope theres a way to warn on some types that clearly won't always work
(and yes i know i can use tee for this)
def iterate_twice[T](inp: Iterable[T] & ~Iterator[T]):
yield from inp
yield from inp
``` 🥺
well, negative bounds are sketchy, but also I would say that's just not sufficient guarantee anyway
i have been hearing about intersection types so much lately lol
Collection seems like the most reasonable solution
yeah collection works
but the problem is that i want an error when i forgot to write Collection
and said iterable instead
python's type checker cannot check if you used something more than once
you'd need rust's type checker for that 😛
i see
is this something that a tool like ruff could handle
or is this beyond static analysis?
well it's obviously not beyond static analysis generally speaking if there's already a language statically analyzing it
but you need to design the language around it from day 1, I would say
for it to work rigorously
yeah I meant 3rd-party-analysis
obviously, Ruff could have a heuristic that could catch common cases
yeah
