#internals-and-peps
1 messages ยท Page 25 of 1
but i think you're right, i don't know if there's a better way than hoping a linter would catch it
but maybe if an Iterator wasn't an Iterable this would be trivial? Since then it would be safe to assume that see my next comment for better explanation__next__ does have side effects but __iter__ doesn't?
There are iterables which are sort of fundamentally single-use, for example a file
julia has an iteration API that looks something like
new_iterator, value = next(old_iterator)
```, which is great, since you can make `next` a side-effect free function. But well, an iterable going over every line of a file can't really be reusable like that.
I think lists in java are not iterable but will return an iterable that is NOT themselves
meaning if you required (using intersection types or smthn) for something with __iter__ to not have __next__ then that would work, right?
I'm a bit lost, work for what?
i meant that a file would be an Iterable but a list wouldn't be
sorry, maybe im confusing how iterators work so this is non-sense
small sidenote, java lists are iterables and aren't iterators, just like Python. However, Java iterators are not iterables, unlike Python.
yes
i meant the opposite my bad
so a list would have __iter__ that returns a new object which has __next__, but a file would have only __next__
IG a for loop would then try to call __iter__, and if it fails, call __next__ directly?
yeah, exactly
honestly i would have expected a different type of exception to denote this, like StopIteration vs DeadIteration or something
collectons.abc.Collection is probably the best type there. Or, you can update the code to work with any iterable: ```py
def iterate_twice(inp: ...):
inp = list(inp)
for a in inp:
pass
for b in inp:
pass
As things stand today, iterators that don't define __iter__ are fundamentally broken, in a way that breaks Python syntax
would probably work better with tee, no? not that it really matters
There's a relevant note in the docs, https://docs.python.org/3/library/itertools.html#itertools.tee:
In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use
list()instead oftee().
damn
cool
btw on that topic
is that something there's a PEP for? I remember reading about "...when we get negation types..." but couldn't find anything about it
There's some discussion here (which links to the discussion on intersection types too) https://github.com/python/typing/issues/801
Not sure what the current state of the proposal is
thanks!
well, mistae or not, the interpreter itself depends on that behavior in places. See https://github.com/python/cpython/pull/29170#issuecomment-957457007 for instance
yes, I meant to say that's it's too late now since it would be breaking
just talking ideally
It's not clear to me that what you're proposing would be ideal, either
for line in file: is a nice pattern, and you wouldn't be able to do that if files didn't have a __iter__
which proposal is that?
Looks great! A few things I noticed while reading:
- "The internals of the pull request itself.": Due to a GitHub bug, the PR looks empty. You should probably just link to the actual commit for this: https://github.com/python/cpython/commit/f6d9e5926b6138994eaa60d1c36462e36105733d
- "the flags for the windows builds are the same": They aren't, but they're similar. They're spelled
--experimental-jit,--experimental-jit-off, and--experimental-jit-interpreter - "
--enable the-experimental-jit=yes-off": This option should be--enable-experimental-jit=yes-off. Probably also worth calling out thePYTHON_JIT=1environment variable here (PYTHON_JIT=0will also turn the JIT off in the other modes). - "
--enable-experimental-jit=interpreter": Maybe worth clarifying that this runs the same code as the JIT, but without actually jitting anything. Also worth calling out that it doesn't require LLVM, it works anywhere, and it's quite a bit slower. - I don't see any mention of needing to install LLVM. Probably worth calling this out and linking to the
Tools/jit/README.mdfor instructions. - "*Except in frames that are owned by the C Stack, i.e. C extensions calling into Python.": That's not really what "owned by the C stack" means. It's pretty subtle and not too important here, so I'd just leave it out (basically, these are "secret" shim frames that we sneak in whenever we re-enter the interpreter loop).
- "and creates an nice HTML summary": It creates a markdown summary, not HTML.
- "
--enable-experimental-jitjit": Repeated "jit" here. - For the deeper dive, might be worth directing interested people to the
jit_stencils.hfile now sitting in their build directory. ๐
like @flat gazelle said, a for loop would try to call __iter__ and if it falls call __next__
i think it's a mistake to conflate Iterable vs Iterator, with single pass vs multi pass iteration
I mean most languages have something similar to Iterable and Iterator, and the distinction is never related to single vs multi pass.
iterator vs iterator is just for things like for loops, as well as other things as well (e.g. flat_map type functions) to understand when a type can be iterated, without needing to implement the iterator API directly on the type
most languages don't really do great at expressing that something can only be used once - in Rust you can express this, and the way it gets expressed is basically by what the type of the argument is in the Iterable (IntoIterator) to Iterator transformation
Like, for a Vec<T> (the equivalent of python's list[T]), IntoIterator is implemented separately for:
- Vec<T> - yields an Iterator<T>
- &Vec<T> - yields an Iterator<&T>
- &mut Vec<T> - yields an Iterator<&mut T>
In the first case, it really is single pass as the transformation of the Vec into an Iterator consumes the Vec - you can't iterate a second time
I'm not really sure about rust, but I'll give the Java example. This is the first stackoverflow answer to Iterable vs Iterator in Java:
An Iterable is a simple representation of a series of elements that can be iterated over. It does not have any iteration state such as a "current element". Instead, it has one method that produces an Iterator.
An Iterator is the object with iteration state. It lets you check if it has more elements using hasNext() and move to the next element (if any) using next().
Typically, an Iterable should be able to produce any number of valid Iterators.
The last line explicitly mentions multi-pass, but even if it wasn't there, the distinction is that the Iterable does not have an iteration state. An object with an iteration state, by definition, will only be iterable once.
@halcyon trail ^
this definition also applies in Python
the last line says "typically" exactly because some Java iterables are consumed by iterating over them
which is exactly the same situation as in Python
not much of a Java dev, but yes, what godlygeek says is what I would expect
Isn't the difference is that in Python an Iterator extends Iterable? Meaning any typed way to create a distinction is hard?
it's true that in Python an iterator is an iterable. but other than that, the same definitions hold.
I'm not sure that iterators being iterables makes any difference. You're assuming that an iterable can be used to produce any number of iterators. That's not guaranteed by either language.
@thick hemlock i missed the start of this discussion, so I might be missing where you are headed.
the problem is basically this: what do you do when you have a class that holds some state, outside of the iteration state.
you want to be able to iterate over it - but iterating over it will "consume" the associated resources?
An example of this would be something like a socket - once you read the data it's gone, you don't even have the option to re-open it like a file.
I do think that makes a difference in relation to typing, no?
I mean, assuming these 2 interfaces weren't related, like in Java, in this function
def foo(it: Iterable):
pass
it is, not garunteed, but typically, safe to iterate it twice as it has no state.
however in this:
def bar(it: Iterator):
pass
you can assume that once you exhuast it, you can't iterate it again
that's why both python and Java, as godly says, "occasionally" have classes that are Iterable, but can only be Iterated once.
you don't want to make them directly Iterators because someone may not intend to iterate over them immediately
so you wouldn't want to store that Iterator state
so a function that accepts something like a socket would require an Iterator, but a function that doesn't want to consume it after iterating will ask for an Iterable
but there's no way to not consume it - that's my point
so a socket would be Iterator, not an Iterable, since it must be consumed
@thick hemlock tbh, i think you are putting too much on the types here, and "Iterator extends Iterable" is not a Python-style sentence.
assuming these 2 interfaces weren't related, ... in this function ... it is, not guaranteed, but typically, safe to iterate it twice
OK - why would that be useful?
that's not desirable either - that means you always have to have all the state ready to perform the iteration inside the original class
can you elaborate? I don't think I follow
the point is that when you create an Iterator, you get to call a function that returns a different type, and can do any initializaiton you need
as an example you might want to create a buffer
that's not true, you can have some flag in your class to see if iteration has started and only then do the state initialization
the python statement is: an iterable has .__iter__() that must return an iterator. An iterator has .__next__()
you can lazily do the init but that has its own trade-offs as well
@thick hemlock there are no other guarantees. __iter__ might return the same object every time.
laziness isn't free either
if you type the function as accepting an Iterator, you may or may not be able to iterate over it twice (but probably can't).
If you type the function as accepting an Iterable, you may or may not be able to iterate over it twice (but probably can).
Aren't those two things equivalent, as far as how generic code would need to deal with them? You can't assume that it's consumed in either case, you can't assume that it's repeatable in either case.
Like, if you really want this in the type system so bad, then the actual solution would be to have another interface: MultiIterable
MultiIterable has the same API as Iterable, i.e. you can get an Iterator from it
except it conceptually "promises" that you can iterate over it multiple times
but probably can
why
as it happens, MultiIterable already exists, and it's called Collection ๐คทโโ๏ธ
because most iterables are collections
it's not exact technically since Collection also requires a length, but like in 99% of cases this is true
If you want to accept something in generic code that you are guaranteed to be able to iterate over multiple times
Collection[Whatever] is your best bet
@thick hemlock are you trying to guarantee re-iterability in the type system?
either that or simply accept an Iterator and construct a list from it
I'm aware there's no guarantees, it can do something completely wild and return a random set of elements on each call.
However, if the inheritance tree would be different, the iterator would have only __iter__, and not __next__. The actual iteration in a for loop will only happen using __next__ in this imaginary world, so it's much less likely for it to hold state about the current iteration if it doesn't have __next__
I wouldn't say guarantee, but annotate, yes
if the for loop iteration only used next then you'd need to manually call iter yourself
inheritance tree: python doesn't have an iterator type that inherits from iterable.
though that is also, not gauranteed, as noted
the current status quo is that, if you need to do two passes over the thing, you need to copy all of the elements into a temporary (regardless of whether the thing is an iterator or non-iterator iterable).
In the world you're proposing, that's all still true, right?
sorry, i'm confused
it depends what you mean by "guaranteed"
yes
so - what's the point, then, if it doesn't change anything about how you interact with the object?
short of deliberate obfuscation, it's hard to understand how someone wil be constructing a Collection[Foo] that doesn't allow iterating more than once
that's about the collections.abc.Iterator abstract base class, not about iterators in general
Maybe that's the gap then? I'm not sure of the difference. I thought the ABCs define this tree?
the ABCs are just a convenient way for you to implement your own containers
I guess I'm a bit confused about using "probably" when writing generic code that should work with any iterable
or was that your point?
that was exactly my point
if it needs to work with any iterable, you need to cope with the possibility that the iterable is only able to be iterated once (like data streamed over a socket). That's true even if you only allow non-iterator iterables
my point about inheritance is that it's not relevant to this discussion. things are iterables or iterators based on the methods they have, no matter how they got them.
the overwhelming majority of iterators don't inherit from collections.abc.Iterator
I guess mypy et al just "magic" that in
yeah my bad, inheritence is irrelevant there. I just want that the objects that have __iter__ will not have __next__, and vice versa
ah actually you'd use the ones from typing I suppose
they just use structural subtyping
or are the ones from typing deprecated now? I forget
but why do you want that? It sounds like it doesn't give you the guarantees you wanted.
I thought the stuff in typing predated protocols
yeah you're supposed to use the collections.abc.*
type checkers treat collections.abc.Iterator and friends as protocols
maybe it was once magic, I dunno. These days it's definitely not magic, though
it's hard to keep track - feels like every single python release has a massive breaking suggestion for "best practices" of typing
they aren't breaking: the old ways work for a long time
That deprecation was in 3.9, about 4 years ago
Honestly if I had ignore the python inhibition against "star imports" and made my team a small prelude of types, I'd be in better shape
does it not guarantee it in the same way that a Collection is "gauranteed" to give me what i want?
and there's another one for List becoming list
yeah it's the same pep585
i thought you wanted to know which iterables could be iterated more than once?
was Union to | there too
collections don't give you the guarantee you want in an even more important way: another thread may have modified the object in between your two iterations over it
I think the point is the same point I have when I make any annotation of a type which is a collection.abc, to communicate what sort of things I need to do with this container
Different pep 604 iirc
forbidding __iter__ on iterators won't tell you that a file can only be iterated once
I mean.... this feels a bit extreme - data races are another thing entirely.
by that reasonikng even your single pass might not be okay because of a race condition.
If it only has __next__, then iterating it will mutate it's state, which means that I can't just do the for loop again and expect it to work
assuming locks are used while the object is being mutated or iterated over, two passes are thread safe (not a data race) but not guaranteed to give you the same results
yes, that's an assumption. Nothing about the Collection API says anything about locks.
I just don't really see the point dragging threading matters into this.
sure, fine
that's ultimately not handled by the type system, either way
btw, even the collection API doesn't garuantee multiple passes are safe as far as i'm concerned
an open file is iterable. It has __iter__. How do I know if I can iterate it twice or not?
if you receive a Collection[Foo] and your function doesn't otherwise mutate it, it's reasonable to assume you can iterate over it twice, IMHO
they're proposing that a file should not have __iter__
and get the same results
assuming Foo is like, some simple piece of data or something
you can't! I'm suggesting (and I know this is a breaking change, which is why i said it's imaginary) that an open file will only be an Iterator, not an Iterable
i guess I am wrong: an open file is an iterator
hm?
@grave jolt so far I have "best practices" changes in two python versions - 3.8 and 3.9. I will find more...
in the current tree, being an Iterator means it's also an Iterable
ok, i understand now.
sorry - 3.9 and 3.10
@halcyon trail
I mean it depends what you mean by "guarantee" - as I said, in practice I think it is guaranteed, given the nature of a collection, unless someone is abusing it
import typing and import typing as t fans fr in shambles after the collections.abc.* thing
That doesn't solve the problem, though. Imagine that an iterable holds a reference to a file, and does py def __iter__(self): return self._the_file You're still consuming the file, even though the object provides a __iter__ and not a __next__, and it's still not going to work if you iterate over it twice.
but again - semantic requirements are up to people to agree on
yes, and I think that point holds true with my suggestion - someone can also abuse it, but making it harder will make code safer
Is accessing something via the Mapping API guaranteed not to mutate it?
In a sane world yes, but not in the python standard library
well, there's just no way to enforce semantic requirements. "abusing it" means "people can write bad code"
good luck finding a language or a type system that can save you from that.
Haskell
can't have bad code if you don't have any
๐
@thick hemlock it's already pretty hard to abuse Collection - it has a length, which means you have to already know exactly how many elements you have, which i fyou're retrieving it from a file/socket/database etc, you pretty much never do.
what sort of iterable would do this that doesn't reperesent the file itself?
wait so do you still disagree lol
think of something like importlib.resources.Resource, for instance
give me a sec to read on that as im not familiar.
it represents a blob of data that may or may not correspond to a file, but if it does, iterating over the thing would be done by iterating over the file
The doc says this:
The Resource type is defined as Union[str, os.PathLike].
ah, sorry, I meant importlib.resources.abc.Traversable (that happens to not be iterable, but imagine it was)
either way, the point is, that if you know that iterating your data will consume it, then you should only implement __next__ and not __iter__. This is a way for the person writing the class to communicate that
in the current state, even if it's consumable, you will implement __next__, and then __iter__ returning yourself, right?
yep, you're required to
if you can't promise that you won't consume the iterator, you should implement __next__
so everything generic that wraps an arbitrary iterable would implement __next__ and not __iter__
for readers jumping in the middle: this is a disccusion about a change of behavior, im not saying this is the current state - I will try to say __new_next__ and __new_iter__ to make the distinction clearer as this is getting confusing
I'm just curious what about this is better than MultiIterable?
Like, as long as we're changing existing stuff - that seems strictly better
I don't feel like you really answered this question: #internals-and-peps message
yes, in this world __new_iter__ will be used only for a non-consumable and used to create something which is consumable,
I thought I answered here: #internals-and-peps message
basically this idea lets me annotate whether it's consumed or not after iteration
like, we all agree that by virtue of its API, an Iterator is always single pass.
so now the question is whether the Iterable promises that it can produce more than one Iterator that are in some sense "the same", i.e. each of them will yield the same sequence of results
the very obvious way to do that is to have a separate type MultiIterable
it has the same syntactic requirement as Iterable
but it doesn't communicate that, right? If the end state is that regardless of whether the object passed to you has __new_iter__ or __new_next__ you may or may not be able to iterate it twice, you haven't actually communicated any new information
but it has the semantic requirement on users, that you expect it to be able to produce multiple iterators
This seems strictly better than "implement next directly on it but not iter"
that sounds reasonable, though the caveat is that it won't be able to use structural typing, and each class would need to tag whether it can or can't be iterated twice
this new api will explicitly communicate that. you can't force anyone to do anything, as mentioned before, but it's an API you're expected to follow just as Collection is today
Agree, I'm not even sure this is a great idea, i'm just trying to get Wolf to see this is purely better than their suggestion.
okay wait let me think about the multiiterable thing
In practical terms, as I've said, Collection is already basically MultiIterable, and it adds a new piece of API (len) so you can tell them apart structurally
technically technically, you could imagine cases where you can iterate well multiple times but not have a length but this is getting super niche
(this is a great example of the downsides of structural typing though)
yes I guess that is the upside and downside
if you're talking about an API addition that won't break anything then obviously your idea is better
but at least in my mindset, it doesn't "naturally" happen, a.k.a someone can forget to mark it
with the API I propose as long as you do what the API says you should in __new_next__ and __new_iter__, you can annotate that purpose for everything
well, it does happen naturally in the sense of - if you can iterate something twice, that generally means you already have the elemnts in memory, which means you can have a length
so the length ends up being a market
*marker
like, yes, someone could somehow implement a Collection that doesn't really iterate the same way twice - but they can also do that with your new_next stuff too
can't stop people from writing bad code
The point is that if you want a reasonable type to accept into a functions that ensures you can iterate multiple times -
def foo(x: ?):
for e in x:
if e.blub():
return e
print("No blub found, printing everything!")
for e in x:
print(e)
it's perfectly reasonable today to annotate this as x: Collection[Foo]
also, i can think of one concrete use case @raven ridge - if this is not annotatable then the person calling my function may call list() before passing it to be sure, and i, inside my function, will call list() again to be sure
at least, I wouldn't have a problem with this
basically the fact this is not annotatable means that there's no way for an API to say whether or not it will be consumed
it is annotatable though...
not if it doesnt have a length?
And how common is this? What are your examples of types that can be iterated multiple types but don't have a length?
Even if such a type exists - it still solves the original problem pretty well. If you keep annotating Collection[Foo] down the call stack, then worst case the very top caller will have to do list(weird_multi_iterable_type_without_len)
so you create a single redundant list, rather than doing it repeatedly ๐คทโโ๏ธ
but again - I'd be curious where all these examples of multi iterable types that don't have length are
Multi-iterable as in, Iterables but not Iterators?
I guess it's not totally clear exactly what it means.
you could have something that's Iterable, that lets you iterate over a bunch of results from a database.
maybe technically you can get iterators from it each time - but each time you do, you're just doing a new query on the database.
so you can "multi iterate" over it, but you probably shouldn't do it light
*lightly
well honestly if it were more common I would've brought it up separately. I originally thought about this because a colleague asked me how to annotate it, and I didn't know so I looked it up - and then I realized that's not possible. I'm not sure what their use case was. The only reason I brought it up again was because someone asked a very similar question: #internals-and-peps message
the idea of Collection is basically - the only time it's really cheap to iterate multiple times is when it's already all in memory. And when that's the case - you have a length.
I am going to link that colleague this dicussion and I will ask them what their use case was
if you don't have a length then you don't already have the elements and you're probably doing significant work to retrieve them. So creating a list and then multi-iterating over that is generally better anyhow.
I mean the simplest and most common explanation is just that your colleague wanted to iterate multiple times without overhead - but also wanted to be able to pass both, say, lists and sets
If you both call tuple() instead of list(), the second call would be free, fwiw
actually, if a module like psycopg2 (PostgreSQL wrapper), which has a mechanism for choosing the class of Cursor you want to receive (currently supports DictCursor) would have a way to export a RepeatableCursor which would specifically be repeatable, even while not knowing the length in advance
this would be useful because sometimes you want to iterate the result-set twice
good to know! is that because tuple is immutable so it doesn't need to bother copying if it's already a tuple?
i mean it sounds like you're talking about lazy caching, are you not?
I am
I mean yeah, there exists uses for such things - it just tends to be very niche
I think it can be useful in any IO consumption which certainly isn't niche
lazy caching for something like this could be very error prone - you could keep the RepeatableCursor around and try to re-iterate expecting fresh results, but get the cache
i mean, no?
not sure what you mean?
it's not useful for "any IO consumption" - it's only useful if you decide to do lazy caching
lazy caching of IO is relatively niche - there's a lot of issues around invalidation
yep, if list() is called on a list, it makes a new copy. If tuple() is called on a tuple, it just returns a new reference to the same tuple - since it's immutable, there's no point copying it, because both copies would be identical for as long as they both live
I meant that lazy caching can be useful for any IO consumption, sorry
executing an SQL statement will give you some sort of cursor that is a snapshot of a results. It's just on the server until you ask for it
i think if you're in this ultra specific situation then you would simply annotate with the exact type
I suppose stuff like itertools.Cycle or itertools.Count could be an iterable but not an iterator, just like range. (I don't quite get why the two are so different).
x: RepeatableCursor and then you simply know that its doing caching for you
you won't get different results from re-iterating a cursor
Like, the point of things like Collection etc aren't that they're the only protocols you can have - they're just the most useful and most common
I would want that my linter would warn me if I tried to iterate twice on a non-repeatable cursor though
also to your point, if you're already doing laziness
then your RepeatableCursor can implement len just fine
it simply internally has a list, which it needs anyway
and if len is called and the list hasn't been populated yet, it populates it
and then returns the length of the list
actually an SQL cursor would probably want to explicitly avoid implementing len, since it's much more expensive than doing COUNT on the SQL side
so this cursor would probably not encourage you to get the length by consuming the results, unless you actually need the results
i mean unless you expect its a common use case that someone creates a RepeatableCursor, and never uses it
it doesn't really matter
you could also simply cache the length separately, if you prefer
Even in this niche example it's not clear there's any actual problem here ๐คทโโ๏ธ
This is getting too specific, but you could use part of it - like if you decide after the 3rd element it's enough
but again this example is very niche
to be clear, I did not at any point suggest this was a very common use case for me
.bm
the worst thing about the status quo is probably the simple fact that some Iterables are "consuming", and you can iterate them twice, and mypy will not complain about it
is there a mypy bot here
And if that's violating some rule here sorry :/, I explicitly didn't go with this to the discourse since I know it's probably not worth the implementation hassle for the use cases given the status quo
no it's fine, I'm not a mod but I don't think you violated any rules or anything like that
(I do think that is, generally, quite a bad thing)
it was all completely civil
good to know!
||@ Jelle||
LOL!
Oh here's another example. You might choose to make a linked list iterable, but not provide length via __len__.
interesting
yeah actually I do think, to generalize, a use case where you wouldn't want to annotate a Collection is one where a length is expensive
that can happen in a couple scenarios and data structures
Well, you can store a length in a linked list, I'm pretty sure deque does this
Although if you're manipulating nodes directly, it might not make sense to do that
(also if you're using the same class for lists with and without cycles)
!e ```py
from typing import Iterable
class example(list):
def len(self) -> Iterable:
return [1, 2, 3]
print(len(example())) # 3```
huh, is the parent return of int being enforced
:x: Your 3.12 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "/home/main.py", line 8, in <module>
003 | print(len(example())) # 3
004 | ^^^^^^^^^^^^^^
005 | TypeError: 'list' object cannot be interpreted as an integer
in practical terms it almost always makes sense to store the length of a linked list
maybe not for some kind of exotic multithreaded skiplist or something, idk
as a good practice though, I think it's fair to ask to be able to annotate the bare minimum your function needs. You could also annotate a lot of thing as list when you don't actually need all that functionality, but you shouldn't
well, Collection is extremely close to the bare minimum here - there's very good concrete reasons why something that is cheaply multi pass iterable will basically always have length
but it does come partly back to the structural typing
I suppose you could do this with a "marker function" that doe snothing, except satisfy the structural subtyper lol
at least from my POV, the reason the collections.abc tree is so big, is exactly to be able to annotate the bare minimum
If you're using some centralized store like deque, then yes. You should also just use a deque in that case.
But if you're manipulating linked list nodes directly (so a node is the list, kinda like Haskell's List), you won't be able to do that.
I mean stuff like LeetCode problems for rotating a linked list
well, some things you just can't annotate/statically analyze at all
true
see quicknir's above example of defaultdict potentially breaking the expectations of some users about Mapping or dict
and i do think as a community we should strive for that set of un-annotatable things to be as small as it can
well, unfortunately, Python is turing complete ๐
Haskell/Agda/other languages with advanced type systems and immutability guarantees might do a bit more, but still
well, not small as it can be then, but, uncommon as it can be
If you formalize all the properties (expected by users informally) of a typical class like starlette.Request, you'll need to write some kind of research paper
if your linked list is simply defined as like Node, then sure, but that's not generally how people use linked lists
(outside of haskal and leetcode maybe)
i actually worked on something related to that last month, that effort is called HAR files
I thought that just records a history of requests and responses
yes but for that you need to standardize what a request object looks like
btw Wolf, since we talked about that most common error, here's roughly how you scan a file line by line in Rust, similar to what we write inpython
let file = File::open("foo.txt")?;
let reader = BufReader::new(&file);
for line in reader.lines() {
println!("{}", line?);
}
// trying to use reader again would be a compiler error
but also I don't see how this is related lol
the mistake we talked about in python, where you can iterate a file twice and not get a complaint
damn! a compiler error? how does that work?
doesn't happen in Rust
how does rust know though
@thick hemlock so, this goes back to what I was saying about the signature of IntoIterator and such
what in the signature of intoiterator says that?
in this case, its the signature of lines() - it takes "self" (the BufReader) by value
in Rust, takign by value means you "consumed" the thing
"moved" from it
yeah
how would partially consuming work then?
there is no partially consuming
in python terms, if I hit break while consuming a file, i can consume the rest
so, if you wrote for line in reader.lines() { .. } and then did a break, you wouldn't be able to consume the rest
you'd need to hang onto the iterator
ohh okay
feel like this would be possible as static error in ruff if __new_iter__ and __new_next__ existed
shortcut to PHD idea
to "consume the rest" in rust you'd need to do something like
let lines = reader.lines()
while let Some(line) = lines.next() {
... // break early
}
// lines is still usable here
I meant that the real code has classes with more complex interactions than just iterating over a thing. There are loads of expected properties that are hard to formalize. For example (github link):
- calling
request.stream()from two different async tasks and using the iterator simultaneously is a really bad idea - calling
request.body()andrequest.json()in two different tasks is a bad idea because they both callrequest.stream()
starlette/requests.py line 199
class Request(HTTPConnection):```
all the static type checking has costs too; before I wrote that snippet I had to double check a couple of signatures
Actually this stuff would probably be mitigated with Rust's borrow checker or other linear typing construct. But you can imagine some other "business object" with a series of complex states
If you want you can model this in rust with type state pattern
All you really is the ability to model "consume" and then you can have as many distinct statically states as you want for some entity
Yeah, that's what I mean by linear typing
readJson : ( [L]RequestStarted ) -> ( JSON | NetworkError | JsonError, RequestConsumed )
yes, I know, I'ma ddressing the second part of what you said
it seemed like you were saying that X would be mitigated by the linear typing, but you can imagine something else with a series of complex states
which made it sound like linear typing wouldn't be sufficient to model a series of complex states
Yeah I don't understand what my point was
I think I meant more like, you might have runtime requirements that would not be practically modeled with a series of distinct states
Like if you have a bunch of continuous parameters (numbers/strings/something else that I haven't thought about) that need to align in some way
oh for sure
and it might be practical, but also just not worthwhile
type systems aren't the answer for everything
even the stuff i've shown in rust, it's cool but unless you're otherwise committed to avoiding GC I think the cost is pretty high
Yeah, like if you have a combination of states you can have stuff like ```rs
trait ThermalState {}
struct Cooling; impl ThermalState for Cooling {}
struct Heating; impl ThermalState for Heating {}
trait AlarmState {}
struct NoAlarms; impl AlarmState for NoAlarms {}
struct MaintenanceAlarm; impl AlarmState for MaintenanceAlarm {}
struct JustRun; impl AlarmState for JustRun {}
struct NuclearReactor<Alarm: AlarmState, Thermal: ThermalState> {
/* ... */
}
it might be worthwhile in a nuclear reactor, but not in a rubber ducky store
assuming none of the duckies are nuclear powered
Well, you can play games with combinations of traits in many languages
What is unique to rust here is
impl Cooling {
fn start_heating(self) -> Hearing { ... }
}
The point being that start_heating takes by value, so force the state into a new type
cooling will be unusable after you call start_heating on it
yep
Iโm sure itโs old hat to many of you, but I ran across this error message when building Python today and laughed out loud in astonishment. Very nifty Easter egg.
.xkcd 2200
That's a great one
a thousand times i've seen this and it's still great each time
Compile again with -D RANDALL_WAS_HERE in your CFLAGS!
that is awesome
Thank you for the awesome feedback, you and @round path both! I'm implementing the adjustments and giving it a final proof, that post should be live in the next couple of days.
Iโm going to hold off on publishing until gh-120437 is fixed - building on linux with โenable-experimental-jit and โwith-pydebug on the 3.13 branch breaks something.
The bug is triggered by running ./python -m ensurepip, but I would think itโs a JIT issue and not an ensurepip issue, thatโs just where I happened to stumble on it.
Hi there! I was unable to find any existing proposal or idea discussion, maybe this is the place where I can find help or thoughts. I'm curious if there is any particular reason for raise being not an expression but statement. Why users can't write code like:
do_something(on_event=lambda e: raise SomeError)
valid_x = x if is_valid(x) else (raise ValueError("invalid"))
Maybe it would be possible to change this to the expression, or allow function-like syntax (like it was for print function years ago) - so one can do raise(something) to force expression behavior. What do you think?
Some languages in a wild already have such a behavior, e.g in scala
val x = if true then 1 else throw RuntimeException("boom")
// val x: Int = 1
I think this is one of the reasons raise isn't an expression: you would be able to create more complex expressions that potentially do an important action (raise an exception). Instead of: py if is_valid(x): valid_x = x else: raise ValueError("x") or just py if not is_valid(x): raise ValueError("invalid") If you like this style, you can always make your own function: ```py
def throw(exc):
raise exc
Scala is a more "expression oriented" language so maybe it makes sense there
yeah, statements is something you can't compose...
Apart from what @grave jolt said, I think allowing this would make the grammar ambiguous, because raise without arguments is legal syntax. That's the reason yield in an expression often requires lots of parentheses.
You could do the same as yield and require (raise) though
@feral island > raise without arguments is legal syntax
That's a point ๐ค
Thank you!
In Python 2 raise x, y, z was also legal (I think it's type, value, traceback), that would have made an expression form even harder. Probably still fixable by requiring parentheses though. Python 3 simplifies the syntax, but this may help explain why it wasn't made an expression when the syntax was first added.
To me this reminds of https://peps.python.org/pep-0463/ but you could argue it's not related at all
EAFPers in shambles
rejected
why must the coredevs always thwart golfing
I still think exceptions are a bit more annoying to deal with than they should be
Though not sure what I'd do about it apart from better documenting what can be raised in the stdlib
never liked EAFP cause of the verbosity
What an odd thing to say in the rejection notice, the glossary prefers EAFP https://docs.python.org/3/glossary.html#term-EAFP
if raise was literally a function call, you'd also create a lot of cyclic references
Wouldn't the function just drop itself from the traceback, in order to maintain backwards compatibility with today's tracebacks? And wouldn't dropping the frame representing the raise() call itself resolve the circular references you're talking about? Or are you talking about something other than the local variables of the raise() function
why, for a function with this signature: py def foo(ham, *eggs): ... is ham a valid keyword argument?
eggs wouldn't be able to receive any arguments when ham is given as a keyword argument ```py
foo(1, ham=2) # multiple values for 'ham'
foo(ham=1, 2) # positional argument follows keyword argument
It's kinda sorta available as a function call in (() for () in ()).throw(e)
It's not, this is (one of the reasons) why / was added
foo(ham=1) works though
you can consider eggs a funny kind of default argument taht doesn't let you assign it directly :)
This is actually very interesting to me
I was also under the impression this is something Python "recommends"
But looking at your link I don't think this is recommended
EAFP and LBYL are both mentioned
I gotta say it's wild to see how hard exceptions are still working (not just in python) to solve the same issues for so long. Multiple exceptions, transporting across threads, using conveniently as expressions
I feel like non exception solutions have solved the propagation issue a lot better and more easily
Are you talking about Result like solutions? That have an .Ok or a .Err option?
Pretty much
Particularly in conjunction with a propagation operator
Can you explain how this propagation operator works in "Python terms"? I'm not too familiar with Rust
Sure
Say foo returns a result.
let x = foo()?;
Expands very roughly to
let f = foo();
if f.is_err() return f.err();
let x = f.ok();
I'm going out of my way to avoid pattern matching here
But that means you have to handle the (potential) error right where you call the function. Exceptions let you handle them where it makes sense.
How is that "handling"? It's propagating
The only difference is that propagating happens explicitly via a single character instead of implicitly
Exactly, but if I want to handle the error five layers up, I need to touch all code in-between (even if it's "just one character"). I personally prefer it if the code "in the middle" doesn't have to concern itself with every possible error code "further down", if it doesn't want to.
i think there is value in indicating which lines can possibly error
So assume I have a function that returns a result which is a string of the response of an SQL query, or an error that occured during that.
I can think of a few problems, but the first is that there are many types of errors that can occur here. IO errors, DB errors, Transient errors, etc.
Ideally I would want different layers to handle each type of error. How can I do this here?
That is a downside but the benefit is that you can see error points when reading - it's a trade off. But meanwhile non exception methods easily handle all the other issues with exceptions I just mentioned. That's why I view exceptions as more dysfunctional overall
There's a lot of ways but the immediate answer is that Result can just do the same thing that exceptions do
Erase the type, and have handlers downcast to handle particular kinds of errors
can you use some pattern matching magic to only propagate on specific values?
You can do stuff like that too, but it's probably easier to start by looking at Result code that is more analogous to exceptions
in the case of rust, you'd use a dynamic dispatch pattern a la
fn handles_some_errors() -> anyhow::Result<()> {
let db = connect_and_do_things(); // returns result with multiple errors
if let Some(ClusterError { cluster }) = db.downcast_ref() {
println!("damn that failed, {cluster} is down");
}
// do other things with db
}
Hmm okay I'll look at some code examples
for maximum ergonomics
Pretty much
anyhow::Result is a class that only has one type parameter, for the result
It carries any error
I do dislike exceptions because they break the control flow
Yeah, so there's also that part of it, with Result you need ? to propagate. So it's visible. But I do think that's a trade off.
But things like handling multiple errors or using fallible things as expressions are just pure wins I think
Is something like returns's Result container something you've tried and had success with? Or are there missing language constructs that make it less viable as a 3rd party package?
https://github.com/dry-python/returns?tab=readme-ov-file#result-container
Yes, not having ?
? is a magic language level operator
Without something like ?, propagating Result is kind of verbose
And you propagate errors a lot more than you handle them
Stupid question. Is an iterator always an iterable? why/why not?
The glossary says:
Iterators are required to have an
__iter__()method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted.
but I vaguely remember someone saying that it's not the case
There was a rather extensive discussion on this, but an iterator should be an iterable, but certain language constructs will work despite an iterator missing __iter__. However, some won't.
Do you have a link to the discussion? I'm curious why it could be controversial
what a snarky canadian
Lol!
man i need help i don't understand this
!e shouldn't this raise a TabError instead? (using exec since discord converts tabs to spaces) py exec("""if True: \N{TAB}1 \N{SPACE}2""") Doing it in the opposite order (space first, tab second) gives a TabError
:x: Your 3.12 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "/home/main.py", line 1, in <module>
003 | exec("""if True:
004 | File "<string>", line 3
005 | 2
006 | ^
007 | IndentationError: unindent does not match any outer indentation level
!e ```py
exec("""if True:
\N{SPACE}1
\N{TAB}2""")
:x: Your 3.12 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "/home/main.py", line 1, in <module>
003 | exec("""if True:
004 | File "<string>", line 3
005 | 2
006 | TabError: inconsistent use of tabs and spaces in indentation
!e I'm guessing it would show that error if there was a matching outer indentation level:
exec("""if True:
\N{SPACE}if True:
\N{TAB}1
\N{SPACE}2""")
:x: Your 3.12 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "/home/main.py", line 1, in <module>
003 | exec("""if True:
004 | File "<string>", line 3
005 | 1
006 | TabError: inconsistent use of tabs and spaces in indentation
Ah, well, it doesn't reach it in this case.
!e what if there's more spaces?
s, t = " \t"
exec(f"""if True:
{t}1
{s * 9}2""")
:x: Your 3.12 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "/home/main.py", line 2, in <module>
003 | exec(f"""if True:
004 | File "<string>", line 3
005 | 2
006 | IndentationError: unexpected indent
I'm officially a cpython contributor
https://github.com/python/cpython/pull/114090
What happens if you use a fake name in the CLA?
you're a bad person
(it is basically honor system)
what if I just don't want to reveal my real name to the PSF
@elder blade asked the PSF about that a while ago and they said that the email and address are most important and using an internet alias for the name is fine.
I think that's what I did
(That advice was to a specific instance of someone asking rather than a general policy afaik though, so it may be worth asking the PSF first yourself if you want to use a fake name)
I signed with my psuedonym!
Almost leaked my home address sending the whole document, so the psuedonym will have to do
I suppose it's better to sign it under your psuedonym, because they will be able to know which signed CLA is which contributor... in hindsight though I am not sure I won anything writing "Bluenix" instead of my real name ๐คทโโ๏ธ
Here's a reminder to everyone to always store copies of all documents you sign! ๐ It should be easy-as-cake to find if necessary
Knowing which CLA is which contributor is done by matching the email address in the git commits against the email address in the signed CLA
Hmm, if you sign using the CLA bot do you even have to enter your name and physical address now ๐ค
You do not; I'm pretty sure I never provided a physical address.
this seemed off topic for the discourse, so if you don't mind me asking, @feral island, is there a specific reason you chose to do PEP 749 for the implementation tweaks instead of just halting acceptance of PEP 649 to when a draft PR was out?
or was it just too much work to draft a PR before an official acceptance? cause the current situation seems quite odd with PEP 749
I am not the author of PEP 649, so I don't feel comfortable editing it
oh I see
Most of what's in PEP 749 I'd feel comfortable changing myself in an implementation PR, but the creation of a new stdlib module and the proposal for the future of from __future__ import annotations feel important enough that they should be called out in a PEP
yeah no doubt the annotationlib thing should get proper approval
Is Larry also helping you implement 749? wondering if the question is relevant to him or maybe this is just a weird case with the author vs implementor
I haven't heard from him for some time
I do feel like the future of the future import should probably have been defined better in PEP 649
But overall I'm happy that you did formalize it since the discussion there seems productive
thanks for the answer!
I'm just ready for 3.8 and 3.9 to be deprecated so we can use | as optional without importing futures.
i honestly still like Optional[Foo] over Foo | None
a tab is equivalent to 8 spaces
so 1 space is technically an unindent
You live in a nice world if you can stop using versions once they become deprecated haha. I still have to (thankfully not often) work on projects in Python 2
(this was changed in 3.13) (i looked at the wrong file)
I meant for personal stuff
I prefer to import typing the least I can
fair enough lol. Sometimes even if I'm on a 3.10 project I just write Optional to simply avoid thinking about the version
why? also typing is still crazy ubiquitous, Sequence and Mapping should be two of the most common type annotations
Okay, to rephrase, I prefer to use typing's verbose forms the least I can. I don't actually mind the import time
What I meant to say is that I don't have anything specific against Optional I just prefer the "native" syntax in all places even if it's a bit less explicit.
same goes for the new typevars
idk, Optional[Foo] is very very marignally more verbose than Foo | None
and it's also just like the vastly more common way to do it across languages
this case takes this branch https://github.com/python/cpython/blob/main/Parser/lexer/lexer.c#L499
Parser/lexer/lexer.c line 499
else /* col < tok->indstack[tok->indent] */ {```
col (of the space indent) is 1, as opposed to the previous col (of the tab indent), which is 8
so the python lexer takes that as a dedent
It's hard to explain it, I don't know. It's like how using list, dict, and tuple as annotations is an improvement. It's hard to say why because as you state it's not verbosity
maybe just feeling "first-class"?
i guess, I mean i think it's a pretty different situation
in one case you just drop the capitalization and avoid an import
but the previous dedent level (before indenting) is 0, and 1 != 0 https://github.com/python/cpython/blob/main/Parser/lexer/lexer.c#L506
Parser/lexer/lexer.c line 506
if (col != tok->indstack[tok->indent]) {```
so it raises an IndentationError instead of TabError
Yeah obviously the list annotation has no downsides, but I was comparing it to show the fact that I really care about that change even though it doesn't mean anything.
unironically I'm not a huge fan of the list/dict thing - tuple I don't mind so much.
it basically gives a higher precedence to a type annotation that is less likely to be correct
!e ```py
s, t = " \t"
exec(f"""if True:
{t}1
{s * 8}2""")
:x: Your 3.12 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "/home/main.py", line 2, in <module>
003 | exec(f"""if True:
004 | File "<string>", line 3
005 | 2
006 | TabError: inconsistent use of tabs and spaces in indentation
wdym?
because of the Sequence thing?
yeah
Yeah that's unfortunate. Unfortunately making Iterable/Sequence into builtins probably isn't going to fly
so you're saying because the list import is "easier to reach" than other collections.abc's it makes lazy people make wrong annotations?
yes
this was alread a problem before
honestly I made this mistake - I was new to python typing and I thought List was appropriate to use a lot, I didn't yet know about Sequence
in my experience, people who care about annotations usually try to find the best annotation whether it requires an import or not
people who don't will use a list annotation for most things anyway
(and that's okay)
my worldview on it is basically that if you don't really care about typing you shouldn't need to import typing
idk - I'm a living breathing counter-example ๐
and if you care it's okay that you import it
are you? It makes sense to me that a new user (past you) wouldn't bother with Sequence
honestly the collection inheritence tower was a bit overwhelming to me when starting
so in a way I'm glad it's not all a builtin? Maybe only Iterable I would want because of generators
I'm not a new user
new to python typing specifically - sure
It's not that I didn't "bother" with Sequence - I just didn't really know about it
I'm familiar with the "tower" from other languages - e.g. Kotlin
that's fair
anyhow, list being first classed and Sequence not makes it "feel" like you shoudl annotate with list a lot and Sequence more rarely but it's actually the reverse
it would be funny to do a survey of Optional[Foo] vs Foo | None on the python reddit or something
I agree, I just think there is value for common types to be "discovered" first as to create a gradual introduction to typing
I have been asking for the water cooler category on the discourse for a few weeks now
I hope eventually it will happen, this could really fit on it.
from a quick google skim or two it seems like Foo | None is more popular - or at least people think it's better because its newer
who knows
I want @feral island 's opinion ๐
I think that FastAPI recommends Foo | None in their docs
It's better because it's less confusing. Optional[] does not mean the argument is optional
I think whatever FastAPI recommends doing the opposite seems appealing
which is probably a very high traffic page
someone gave that argument. I think it's a pretty funny argument given multiple languages use Option or Optional. but I guess it is what it is.
I've definitely seen people write foo: Optional[int] = 0
if I create a dataclass like so:
@dataclass
class A:
param: Optional[int]
because the argument is optional
they still have to supply a value
it's not optional
but that's valid? I don't really understand
It's not what they meant
hard for me to tell I guess what you mean
they meant that the Optional annotates you don't need to pass a value
The parameter is an optional one that takes an int and doesn't have a reason to accept None. But people write foo: Optional[int] = 0
but that's not what it does
I don't find this confusing, nor do I think it's particularly compelling that this would confuse people - but obviously, everything will confuse someone
I'm sure someone has been confused by this, N > 1 times
Do you think this also isn't confusing?
no, I don't think it's confusing
I mean I would expect this annotation to mean "passing param is optional, you don't have to"
that's what it says
I wouldn't ๐คทโโ๏ธ
it says that its optional whether or not param contains an int
i think maybe this is a thing for people who are less used to static typing, primarily
To me it reads as "This dataclass accepts a param which is optional, and it's type is an int"
I do understand that it actually reads "This dataclass accepts a param which is optionally an int", but it's confusing
like, most statically typed languages have names for this kind of thing that is either Option or something very similar to Option (like Maybe) - nobody in Haskell is getting confused that a function parameter of Maybe type, only maybe has to be passed.
i do wonder how often Optional appears without assigning a default value
these days you'd say param: int | None
which I like more
Kotlin
!e here's all the cases where TabError can be raised: ```py
tab = '\t'
space = ' '
space_8 = space * 8
for i, example in enumerate((
col (8) == TOIS (8), altcol (8) != TOAIS (1)
f"""if 1:
{tab}1
{space_8}2""",
col (8) > TOIS (1), altcol (1) == TOAIS (1)
f"""if 1:
{space}1
{tab}2""",
col (8) > TOIS (2), altcol (1) < TOAIS (2)
f"""if 1:
{space}if 2:
{space * 2}if 3:
{tab}4""",
col (8) < TOIS (16), altcol (8) != TOAIS (1)
f"""if 1:
{tab}if 2:
{tab * 2}3
{space_8}4""",
)):
try:
exec(example)
except TabError:
print(f"TabError raised for example {i}")
else:
break
else:
print("All examples raised TabError")
yeah, I think @halcyon trail is arguing against that though, no?
but "nullable" often differs from optional subtly
it's not 100% consistent, but often
mk wait
python's "optional" is really closer to "nullable" fwiw
:white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | TabError raised for example 0
002 | TabError raised for example 1
003 | TabError raised for example 2
004 | TabError raised for example 3
005 | All examples raised TabError
they're saying they like Optional better: #internals-and-peps message
That's why Optional is a bad name ๐
that's probably actually the strongest reason ๐
but only 1% of people know or care about that
Nullable isn't great either, as python doesn't have a Null, but Noneable is terrible
!e a weird case where TabError isn't raised ```py
tab = '\t'
space = ' '
exec(f"""if 1:
{tab}if 2:
{space * 8}{space}3
{tab}4""")
:warning: Your 3.12 eval job has completed with return code 0.
[No output]
Noneable?
this kind of debate is one of the reasons in the docs we're leaning to using more English to describe types instead of strict annotations.
@thick hemlock usually, optional types support nesting, nullable types do not.
Option[Option[Foo]] in python is meaningless - Union types in python are automatically collapsed
I think you're greatly underestimating
which is actually a big headache sometimes
we don't need a name for the concept, it's just a union with None
Uhm I'm underestimating what percentage of people know the difference between nullable and optional types? I don't think so.
Underestimating the people that annotate it wrong because they don't know the difference
that's totally different, I don't think my comment that you responded to was saying what you think it was
there is no possible way to resolve this. it's opinion/guess vs opinion/guess.
I was replying to ned
he was talking about something else anyway so it doesn't matter
sorry, my bad! misunderstood then
i dunno, that specific union is actually much more common than others
^
But it's not any different from others. You can still talk about "union with None"
I agree and I also don't know if it's actually needed in the stdlib, but it is something a term for would be useful for me
I mean in a purely mechanical sense, sure, it' snot different. Practically it is - that's why you see distinctiosn around it in almost every language.
i often say "okay, so for this feature we would need to make this field optional" and what I actually mean is nullable
if you're using for example type annotations to help you deserialize something
it would be extremely common to have special handling for Option[Foo]
"we would need to allow None for this field"
that's a good point, I've written code like that
I have too, and python really makes you suffer
because you can have a type annotation like Option[Union[Foo, Bar]] - okay, you tell your framework how to deserialize Union[Foo, Bar], right?
And now it has the generic logic for Union[T], it knows how to deserialize T - life is good, right?
sure, that works actually
not actually though because python just collapses your type
anyhow - it is what it is I guess. but I will take note that Foo | None is the idiomatic recommendation, plus what Ned said makes me reconsider
(plus recalling my own bad experiences)
we just have a mixed C++ and python codebase, and in C++ it's std::optional - so Optional comes very naturally
if we were to continue that earlier topic, are there more things in typing you wish you had as a builtin?
I personally would like Literal to be easier to write, but i don't really know how that would work
Maybe Any but that ship has sailed because of any()
but also maybe that's too permissive to even have as something that should be easy to reach
Yes, it runs into the same issue that you're making something easy that arguably shouldn't be easy
mhm
Any is sometimes useful for me when transitioning codebases without annotations
I'm not entirely sure I agree with that argument myself
It feels like it goes against "practicality beats purity". People often have to use Any in practice
I'm actually not a fan of Literal - like, values as types is no joke and I think people often misunderstand how it works
heck, python doesn't define clear rules for it so I'm not sure that I understand how it works in python - despite being very very familiar with how it works in another language (C++)
The only place I'd ever consider using literal in my own code is to type legacy - otherwise use an enum or something
I don't know, I feel like if you're using Any you're probably just being lazy about your type. I've seen people annotate a JSON-able dict as dict[str, Any] which is plain wrong
it's wrong but until relatively recently it's the only thing mypy really supported
i love the recursive types being supported
but even specifying here another level would be useful
there was a whole discussion here a while ago, where I brought this up, and people were saying that properly typing json as a recursive union didn't matter ๐
I use it a lot when working with discriminators in pydantic. Are you aware of that use case or should I elaborate more?
and it goes with enums
This is the basic example in pydantic's docs:
class Cat(BaseModel):
pet_type: Literal['cat']
meows: int
class Dog(BaseModel):
pet_type: Literal['dog']
barks: float
class Lizard(BaseModel):
pet_type: Literal['reptile', 'lizard']
scales: bool
class Model(BaseModel):
pet: Union[Cat, Dog, Lizard] = Field(..., discriminator='pet_type')
n: int
discriminator is an OpenAPI thing that basically lets pydantic change its behavior from "try parsing as cat, try parsing as dog, try parsing as lizard" to "check the discriminator field and only parse as that"
which leads to much better error messages
this is just not a good way to do this - its giving every single Cat a pet_type field which is entirely redundant
well, I guess it's not necessarily, because pydantic is kind of its own weird little world
but in the rest of python, that's what that would mean
You do need this pet_type though
you don't need every instance to have it - it's the property of the class
python instances already know their class
I mean, in Python you don't, but this is not relevant to just pydantic, any serialization of this to JSON will require this information
but the types above annotate as though every instance of Cat has it ๐คทโโ๏ธ
my point is that you can achieve this in other ways - I've written generic code that serializes and deserializes Unions
This is also useful in non-discriminator use cases. Consider an enum for, let's say, an image processing type. I want my implementation to use that enum but to annotate it only supports 3 of the 5 processing types
so I would use a literal here to say that
I don't know any better way
sure, you could have this only appear at serialization but pydantic models are generally just used for that and not as internal entities in your app so it's fine
btw, what the rest of the python world would probably do in that example is something like pet_type: ClassVar[str] = "cat"
the point is that the Literal annotation doesn't actually give you anything, doesn't improve type checking in any way
it's just a string fed to pydantic internals for serialization/deserialization
that is fair and I'm not gonna re-ignite the discussion about runtime usage of annotations vs type checking
what about this?
you just create a new enum, that has 3 of the 5 types, and a function to convert between them (or throw)
it's pretty much exactly the same thing, except the enum is being explicit about the fact that it's effectively a new type
The problem is that people who aren't very familiar with typing tend to fall into this trap - static checks good, so more static checks better, so they try to heavily use Literal for things like this
and then it doesn't always behave in ways they understand - there are things that may be obvious to a user, but which python's static type system cannot understand
what behaviors are hidden here? I've always viewed it as a pretty simple feature
people who have actually worked with statically typed languages, where these things need to be 100% correct (they're not just annotations after all) tend to understand that pulling values into a type system is no joke
Python is different in that though
you can't just treat it as it's statically typed
Yes, because the static type system isn't terribly well defined
so when you work with Literal in particular, mypy vs pylance vs whatever will very often give you different answers
I've seen it a lot
I know this is oversimplifying and ignoring a looot of use cases
but really all I wanted with this example is for someone to get a warning when they explicitly pass a value they shouldn't
I don't want to create a new type for this
you already did?
wdym?
Literal[1, 2, 3] is a type
sure - but it's still a type. So just say - you don't want to write a few lines out of band ๐
i was about ot say that in the end most people's reasoning is just to save a few characters
not very compelling
I don't know why you don't find that compelling
and it's less nice for the person using your function - they can get auto completion on both the enum, and the enumerators
idk, I feel like "optimize for the reader not the writer" is pretty widely accepted - that's what it's based on
saving a few characters to me just isn't a big deal. Having to consider the ramifications of a more poorly defined type system, less readability, less help for tooling - all these matter more
I feel like you have to take into account that the typing system is not like in static language. The person can simply annotate it as being the enum ImageProcessingType and not give any annotation to what values of it are supported. This is often the difference in PRs between telling them to "just add a literal annotation" which wouldn't get pushback, or to create a new enum for this which would.
I am not saving a few characters to codegolf
I'm saving characters to encourage people to use type hinting.
When it's simpler and shorter to annotate something - people will do it more. Easier annotations create better annotations.
I don't have an issue with people not type hinting
I have seen a lot of people avoid TypeVar because of this, and I'm so glad PEP 695 passed
It's still saving a few characters no matter how you slice it - there's much bigger obstacles to typing a codebase than this
If people choose not type things because of this - they're going to have a lot of untyped code ๐
I'm talking about situations where the codebase is typed but I want a new function to be typed better, if possible.
I would pass the PR if the annotation was simply ImageProcessingType and the first line would check the value and raise an exception if needed
but I would prefer if it specified the values
its not "typed" vs "untyped", it's how well it's typed
An enum is obviously typed at the call site. Passing "foo" it's very far from clear it's one of 3 legal values
Also it matters what characters you're saving. That type being inline saves you from needing to find another name for the enum
which is often hard for these subsets
Anyhow I just don't think we'll agree, and I think the convo reinforces my previous beliefs.
Hard for the writer but good for the reader ๐
I agree with the sentiment
it's just not always practical
something something practicality beats purity
Uhm you can disagree with me but please don't call it impractical
It's fully practical
I do it. It's very easy.
what do you mean? I didn't say it's impractical, I said it's not always practical
there are situations where I can do what you suggest
in others, not
There's no situation where you can't do it - it may take a couple extra lines, yes, but you can
Again, I think that's idealistic. Codebases are not either "typed" or "untyped". It's a spectrum
This is becoming bad faith so let's stop
Oh, sorry
I didn't mean for that to happen at all, I was actually okay with the discussion
sorry it felt like that :(
I did not mean at all to say anything in bad faith, sorry if any of my messages sounded like it
Have a good day/night!
There is an ergonomic issue with Literal regarding type inference.
If you write x = [Color.red, Color.green], x is inferred as list[Color]. But it would be silly to infer the type of ["red", "green"] as list[Literal["red", "green"]]
Literals are extremely popular in TypeScript and you can do e.g. ["a", "b"] as const and it will infer it as readonly ["a", "b"].
However, TypeScript has a lot of uses for literals outside of just enum values, so that probably warrants some extra tooling
That's a very good example
does python exponentiation use exponentiation by squaring for integer exponents
If the exponent is less than 61 bits long, it uses the left-to-right binary exponentiation algorithm (algorithm 14.79 in this book ).
If the exponent is larger than that, it uses "left-to-right k-ary sliding window exponentiation" (algorithm 14.85 in the same book).
I'm not sure I totally understand the second algorithm, but the first algorithm is the exponentiation by squaring that you already know, except working from the most significant bit of the exponent down to the least significant.
Essentially this: ```py
def pow(base, exp):
result = 1
bit = 1 << (exp.bit_length() - 1)
while bit > 0:
result *= result
if bit & exp:
result *= base
bit >>= 1
return result
You can see for yourself here: https://github.com/python/cpython/blob/a86e6255c371e14cab8680dee979a7393b339ce5/Objects/longobject.c#L4848
Objects/longobject.c line 4848
long_pow(PyObject *v, PyObject *w, PyObject *x)```
why donโt 'ba'[:10] cause an index error
slicing a sequence doesn't raise an error, instead it crops the range
It would be a huge pain to have to write user_input[:min(10, len(user_input))] insead
in other words, you almost never want it to error in the case when you have more elements
:white_check_mark: Your 3.12 eval job has completed with return code 0.
0 4
(unlike for example list/string indexing, where JavaScript made a mistake in the early days, and returned undefined when the element doesn't exist at an index)
I mean, fwiw, I think it's perfectly reasoanble to error here too
and some languages do
which
rust for example
Java I think
i probably prefer erroring tbh
if you want to say "I want up to index 10 or the length, whichever is smaller" - why not be explicit and say that
but I dont' find it a hugely common use case
isnโt python about explicitness ๐ค
python is about having a mantra that says that it's explicit ๐
Well, Rust is different in that it doesn't treat strings as "arrays of characters for some definition of 'character' (since we're using unicode)"
it's nothing to do with strings really - we can just talk about vectors as well
fn main() {
let v = vec![1,2,3];
let u = &v[..5];
}
this will panic in rust
yeah that is true
another reason why it makes sense in rust is that rust offers non-throwing/panicking API for slicing.
the same way that python has a .get for say dict, that does not throw
so, this behavior is useful in conjunction with that
fn main() {
let v = vec![1,2,3];
if let Some(u) = &v.get(..5) {
println!("{:?}", u);
}
else {
println!("Oops!");
}
}
it's really convenient implicitly cutting off
this prints "Oops"
v.get and v[] have to be consistent with each other
if you implicitly cut off you would not be able to easily branch on whether the slice fit. It would also feel less consistent with regular indexing.
you can always have a different named function for that if you want - at least you can in rust, since things like ..5 are just a type you can pass anywhere
I think it makes more sense for the default behavior to "fail fast"
that's also part of the zen of python ๐
Errors should never pass silently.
I guess that's the tradeoff that was made when slicing was first developed. Kinda hard to go one way or another for me
the problem I guess for me is that in python the "erroring" version is so awkward that nobody will ever write it- people will just say "yeah, i'm confident these strings have at least 10 characters"
I think you literally have to do
assert len(s) >= 10
x = s[:10]
in rust, the default errors, so you a) have no choice, and b) it's still a one liner s[..min(10, s.len())]
With strings, it would be let s2: String = s.chars().take(10).collect(); in Rust
unless you want a slice of the first 10 bytes for some reason
s.chars()[..10]?
no, chars() is an iterator
so to use slice you need to turn it into list smh
You don't need to do that
You can just get the byte index of the nth character
String slices.
ah, and then split_at
that is also possible
s.split_at(s.char_indices().nth(10).unwrap_or(s.len())).0??
I'm confused
Just use char indices and then slice
Oh I misunderstood, for some reason I thought slicing a &str returns a &[u8]
Still, ```rs
{
let split_pos = s.char_indices().nth(10).unwrap_or(s.len());
&s[..split_pos]
}
I'm actually not quite sure what people use string slicing for most of the time in Python
for slicing strings
For sure. You could write a convenience function. But orthogonally to the rest of the convo, rust emphasizes performance more so it makes sense to encourage a performant approach
Slicing utf 8 strings by character just isn't a cheap operation
So Python is hiding the cost somehow
python has 3 different representations for strings
all of them have O(1) access
If str has O(1) access then that just means that it's storing a map from characters to bytes
I.e. it's using extra space
Basically it's just storing that char_indices you get above on the fly
it doesn't tho?
#internals-and-peps message
Then Im not sure what to say, obviously there's something going on
Utf-8 is a variable length encoding
How is Python getting you the nth character in O(1) time?
Uh the message you linked to literally says it wastes space
Just not the kind of waste I was suggesting
Basically it's not really a uft 8 string
no it's not
it uses latin-1, ucs2 or ucs4 depending on the characters in the string
Yeah strings aren't stored as UTF-8 internally
Yeah I was a bit surprised by that
they're stored as either ascii, latin1, ucs2, or utf32 depending on the largest codepoint in them. and the utf-8 representation is cached the first time it's needed.
But anyhow, different trade offs
What rust is doing makes sense for rust. Most times you're doing slices it's after finding a character or something like that and you have the byte index
A recent suggestion to change this: https://github.com/faster-cpython/ideas/issues/684
Yeah so basically what he is proposing is closer to how I already assumed (incorrectly) it worked
Oh wow this is very recent
Does the file size matter for the python interpreter?
I mean is there a size x lines of code (or bytes) where it's better to have two files and one import the other rather than all code being on one file?
at some point it's going to run into issues but that's only at really huge sizes. You should think about this in terms of readability not in terms of what the interpreter supports
Personally I'd start thinking about splitting up a file at 1000 lines or so, or just if it feels like it's becoming hard to understand
I won't edit the file, it's generated
If it's generated code I wouldn't worry about it
cool, thanks
if it is generated you can run into another set of issues
python parser is written in C, and there are some assumptions about parsed code
one of them is that code is not nested very deeply (current limit is somewhere about 30, iirc)
if you violate these assumptions, parser can crash the process
my knowledge was a bit outdated
this (with 99 levels) works perfectly: ```py
if 1:
if 1:
if 1:
if 1:
if 1:
if 1:
if 1:
if 1:
... # a lot more lines
print(42)
with 100 levels it errors: `IndentationError: too many levels of indentation`
i guess it was changed several versions ago
it will not be nested very deeply, 10 levels maybe of blocks of ifs and classes and whatever
it's just long, not sophisticated
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
for _ in [0]:
print(42)
``` this works (20 levels)
if more - `SyntaxError: too many statically nested blocks`
I'd try to solve it another way long before 20 levels of that :}
But I get what you mean
if perfomance is important and you want to micro-optimize the code, there are other things you could do
for example: if you have a lot of try-except/with blocks in one function, and exceptions occur pretty often, you can split this function into several to make exception handling faster
this is because in recent versions "zero-cost try-except" was implemented, that stored all information about try-except blocks (and with blocks, because they are basically the same thing) in one table. And if exception occurs, this table is decoded to figure out where is code that handles this exception. More try-except -> bigger table -> slower exception handling
it is not important now, and if/when it will be I'll write C or something else probably
I've been working on a fresh branch for superinstructions... suffice to say something's gone a little wrong ๐
For some reason, with more than one instruction, it's not unrolling the switch that handles multiple ops. Basically:
//-D_JIT_OPCODES={1,2} -D_NUM_UOPS=2
opcodes[] = _JIT_OPCODES;
for (int i = 0; i < NUM_UOPS; i++){
uopcode = opcodes[i]
switch (uopcode) {
#include executor_cases.c.h
}
}
#include in a switch block 
Tools/jit/template.c line 112
#include "executor_cases.c.h"```
Does anyone know of a way to specify in argument clinic that an argument should be a mapping?
I know I can do a dictionary subclass check but as far as I can tell protocols are unsupported by argument clinic
Given that PyMapping_Check basically documents that it's impossible to check whether or not an arbitrary object is a mapping, I'm guessing the answer is "no", though I've never used argument clinic and I don't know for sure
!d PyMapping_Check
int PyMapping_Check(PyObject *o)```
*Part of the [Stable ABI](https://docs.python.org/3/c-api/stable.html#stable).*Return `1` if the object provides the mapping protocol or supports slicing, and `0` otherwise. Note that it returns `1` for Python classes with a [`__getitem__()`](https://docs.python.org/3/reference/datamodel.html#object.__getitem__) method, since in general it is impossible to determine what type of keys the class supports. This function always succeeds.
The two C-implemented functions I could think of that accept a mapping (str.format_map and dict.update) don't use argument clinic ๐
Ok! Sounds like I should just take object and do the check in the implementation
Is it even possible to determine whether an object is definitely a mapping before doing anything mapping-y with it?
Lots of objects have a __getitem__ (sequences, generic classes, probably something else)
Yeah that is true
With further restrictions
you could base the definition off of collections.abc.Mapping, which should make lists not mappings
https://docs.python.org/3/glossary.html#term-mapping
A container object that supports arbitrary key lookups and implements the methods specified in the
collections.abc.Mappingorcollections.abc.MutableMappingabstract base classes
Actually dict.update accepts some weird thing, not a mapping
In [19]: dict.update??
Docstring:
D.update([E, ]**F) -> None. Update D from dict/iterable E and F.
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k]
If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v
In either case, this is followed by: for k in F: D[k] = F[k]
Type: method_descriptor
(Something with just getitem and keys is not a mapping according to a glossary)
But the documentation for dict() says that it wants a mapping (https://docs.python.org/3.10/library/stdtypes.html#dict); in fact that overload only requires keys() and __getitem__
These informally defined "protocols" tend to be very squishy
... def keys(self):
... return range(len(self))
...
>>> dict(mymap([1, 2, 3]))
{0: 1, 1: 2, 2: 3}
No, it's even worse. dict.update documentation (which is... different from the docstring? what?) says that it wants a dictionary
https://docs.python.org/3.10/library/stdtypes.html#dict.update
@lapis spade See #โ๏ฝhow-to-get-help and make a help channel in #1035199133436354600
OK
when did python3 drop int objects and just use c-long objects and make that better instead?
3.0
Hmm.. are there any more resources on using llvm-bolt with CPython other than the docs ? I am running into this issue on Ubuntu 24.04 and I can't find any good links for this error :/
BOLT-INFO: 62781 instructions were shortened
BOLT-INFO: removed 158 empty blocks
BOLT-INFO: UCE removed 918 blocks and 55709 bytes of code
BOLT-INFO: padding code to 0x1400000 to accommodate hot text
BOLT-ERROR: library not found: /usr/lib/libbolt_rt_instr.a
make[1]: *** [Makefile:854: profile-bolt-stamp] Error 1
make[1]: Leaving directory '/home/ichard26/Downloads/Python-3.12.4-bolted'
make: *** [Makefile:883: bolt-opt] Error 2
I installed llvm-bolt from apt.
wasn't there an idea to partially bring this back into python for faster-cpython
i havent seen any issue on that repo like that
ok nvm
they just did a different way to spell the current implementation on <=3.11

can features be removed after an (No new features beyond this point.) point?
they'd probably make it to the next minor release
removed
3.13.0 beta 1: Wednesday, 2024-05-08 (No new features beyond this point.)
like i see this, can these features be removed before an RC?
i mean a feature removal is sort of a "new feature"
no major changes to the minor version are expected to come after the beta freeze
immediately after beta of the current dev minor is released the main repo would shift versioning to the next minor release
some bug fixes could be backported but that's about it
Maybe if a feature is found to be too problematic to release it can be pushed back?
Shouldn't be something that regularly happens though
For example if it's found to be breaking a lot more use cases than expected
You should just say the specific situation though
Yes, features can be dropped or removed during the beta phase
so basically RCs are what can be reliably relied on for no more changes?
or really not set in stone until final release?
I think so people can release wheels during the RC phase and they'll continue to work (for possibly an extended definition of "work")
ah okay
sounds like a generally sensible motivation yeah
I'll read about it more though
is there an PEP or somewhere where this is formally laid out?
weird how that applies only to the ABI since other things can break as well
but there are probably good reasons for it
Not sure, can't find it now. https://devguide.python.org/developer-workflow/development-cycle/#release-candidate-rc has something but doesn't mention the ABI part
but https://discuss.python.org/t/python-3-12-0-release-candidate-1-released/31137 mentions no ABI changes
Itโs finally happening! Python 3.12.0 rc1 is here! As a reminder, until the final release of 3.12.0, the 3.12 branch is set up so that the Release Manager (me) has to merge the changes. Please add me (@Yhg1s on GitHub) to any changes you think should go into 3.12. This is the first release candidate of Python 3.12.0 This release, 3.12.0rc1, i...
thanks, i think something like that should be formally stated somewhere, i'll open a discuss thread on this
General-use tools and libraries (e.g. mypy or Black) should also be developed outside the python organization, unless core devs (as represented by the SC) specifically want to โblessโ one implementation (as with e.g. typeshed, tzdata, or pythoncapi-compat).
https://devguide.python.org/developer-workflow/development-cycle/#release-candidate-rc
i'm confused by this, aren't repos inside the github.com/python github.com/psforg, effectively part of the python org?
no
are you confused by the fact that mypy is inside the python org despite this statement?
yes.
I guess that's mostly for historical reasons
Well Black is actually OK as the psf/ organisation is meant to house community projects anyway.
https://www.python.org/psf/github/, as linked on the org page for https://github.com/psf
I think the main reason for this policy was that ลukasz unilaterally put black in /python/ ๐
how was that received?
Haha :)
https://www.python.org/download/alternatives/
i can't seem to find the part to add a PR to this https://github.com/python/pythondotorg/tree/main/downloads
https://github.com/python/pythondotorg/blob/2a45f8cbb0254f7f0a62e5aba0e6dd2349686559/fixtures/sitetree_menus.json#L2006
is the closest i got
fixtures/sitetree_menus.json line 2006
"title": "Alternative Implementations",```
I am not entirely understanding your question, but I can't find that page either
I'd probably forward you to https://discord.gg/MbVsKyrc, this doesn't really have anything to do with Python internals
Wow thanks embed
Better
this channel isn't neccesarily just for the internals.
wait is this an official server?
yes
Basically
Its where the core team hangs out
But not everyone likes it
Some of the core team still only uses Discuss/mailing lists/etc
Thereโs three Discord servers
The docs one and the PyPA ones are public
And then thereโs a private one for core devs
that's fair
I'll note that the PyPA and CPython project are independent of each other, although there's considerable overlap in core team membership.
Huh so the devguide docs for asan/ubsan seem rather (to put it lightly!) out of date: https://devguide.python.org/development-tools/clang/
I tried just setting -fsanitize=address and -fsanitize=memory, and when I build CPython I get a bunch of errors from frozen modules, which is I assume is because they are immortals if I understand correctly?
Aha, I thought I remembered there being asan builders https://github.com/python/buildmaster-config/blob/main/master/custom/factories.py#L244
I should update the dev guide based on this
from tkinter.filedialog import askopenfile
``` seems like i found a bug for 3.13t
python3.13t crashes entirely on this import
https://github.com/python/cpython/issues/118973 seems to be the closest issue, not sure if i should make my own
Free threaded I presume
free threaded yeah
I presume the crash is the same reporting an access violation?
i don't get that same behavior no, probably because i'm not using a debugger or something
ahh, maybe i should comment on that being the crasher
a case when Exception > BaseException
don't you think this should change?
oh, it did.
interesting.
i can't reproduce this on 3.12.
in 3.11 i can still reproduce this behavior.
thank you irit ๐ https://github.com/python/cpython/issues/77757
I'm wondering if that isn't a breaking change. maybe someone based off their workflow on the fact that this allowed to dodge base exceptions? 
dodging base exceptions seems like "something happens? let it be. it is what it is."
hey, you're here as well
typehints fans are everywhere
!delete #type-hinting
โ deleted
amazing
how does object()+x manage to call x.__radd__ if there's no default implementation of object.__add__ which returns NotImplemented
there is a default impl for the comparison operators though
type(rhs).__radd__(rhs, lhs) is tried before type(lhs).__add__(lhs, rhs) iff type(rhs) is a strict subclass of type(lhs). Which, if type(lhs) is object itself and type(rhs) is not, will always be the case ๐
This is generally true for all binary-operation dunders
I think it is, but not 100% sure. I don't know why there's the default impl for those on object, but not for the other dunders
hmm its true for them too
Is there a way to enable PyStats builds for windows? I don't see it in the build.bat options
I don't particularly need it, but I'm wondering if it's something that I need to worry about while poking at PyStats
hm i think theres a default impl for gt/ge/lt/le because the same C function object_richcompare has to handle all the comparison ops including eq and ne 
Objects/typeobject.c line 6412
object_richcompare(PyObject *self, PyObject *other, int op)```
yes, I think that's right. At the C level you must have either none or all of those
Greetings! I'd like to add a way to detect which REPL (basic or PyREPL) is in use, mostly to be able to test that the PYTHON_BASIC_REPL environment variable is working as it should. That's a very weak use case, could you help me come up with better ones?
I've described a couple of lame use cases on a Discourse thread[1], but I'm bad at coming up with interesting ones. Maybe it would help to conditionally enhance the REPL with readline/history recording if it was detected the basic one was in use?
Fair enough. Thank you for your time in considering this. For context, I imagined a scenario where the instructor would ask students to run a script to make sure their environment is properly configured (e.g. in a venv with the right name, with the right Python version, with the right directories to put code on etc.) and whether PyREPL is enabl...
Why would like to do this if you can't find strong use cases?
what's the motivation?
I added a unit test that only checks that PYTHON_BASIC_REPL is set on os.environ when passed as an env var, it should ideally check that the right REPL was chosen. And not checking that bothers me. Not a good reason, I know ๐ฆ
A CPython unit test?
That sounds like good enough reason to put something in test.support within CPython at least
Not sure about making it a public API, that would require a use case external to our own test suite
Is there a discord sever where I can unban myself
Yes, a CPython unit test.
also here's a possible way to tell the difference ```% PYTHON_BASIC_REPL=1 ./python.exe
Python 3.14.0a0 (heads/pep649-inspect-dirty:6e078fd344, Jun 12 2024, 19:54:37) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import sys
sys._getframe(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sys._getframe(1)
~~~~~~~~~~~~~^^^
ValueError: call stack is not deep enough
^D
% ./python.exe
Python 3.14.0a0 (heads/pep649-inspect-dirty:6e078fd344, Jun 12 2024, 19:54:37) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import sys
sys._getframe(1)
<frame at 0x102dc75b0, file '/Users/jelle/py/cpython/Lib/code.py', line 91, code runcode>
That's awesome, maybe good enough for test.support ๐
If we're doing "cursed ways to detect whether the REPL is written in pure Python or not", here's my contender:
~/dev/cpython (main)โก % PYTHON_BASIC_REPL=1 ./python.exe
Python 3.14.0a0 (heads/main:ead676516d, Jun 25 2024, 10:30:23) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import inspect
>>> len(inspect.stack()) > 1
False
>>> exit()
~/dev/cpython (main)โก % ./python.exe
Python 3.14.0a0 (heads/main:ead676516d, Jun 25 2024, 10:30:23) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import inspect
>>> len(inspect.stack()) > 1
True
Oh, this actually might be a non-cursed way of figuring out whether the REPL is written in pure Python or not @glass mulch: pure-Python modules always have a __file__ dunder:
~/dev/cpython (main)โก % PYTHON_BASIC_REPL=1 ./python.exe
Python 3.14.0a0 (heads/main:ead676516d, Jun 25 2024, 10:30:23) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> hasattr(sys.modules["__main__"], "__file__")
False
>>> exit()
~/dev/cpython (main)โก % ./python.exe
Python 3.14.0a0 (heads/main:ead676516d, Jun 25 2024, 10:30:23) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> hasattr(sys.modules["__main__"], "__file__")
True
Non-cursed is even better! Let me test something real quick...
That does solve the "how was Python initialized REPL-wise" part ๐
Technically that only tells if the entry point is Python, not if the REPL is
it may work for the two possible REPLs that exist today, but definitely doesn't work in general for all possible REPLs
Fair enough, but that might be good enough in the context of CPython's test suite
yeah, definitely true - I didn't see that was the context, I assumed this was for a library
for CPython's test suite, it's definitely true that the set of possible REPLs is known ๐
on that note I just love the new REPL
I have been missing that as a built-in feature for so long lol
is it actually better than ipython
I can't even remember the last time I used the "standard" python repl
haven't used it enough yet but gonna go ahead and say probably not
I use ipython where I can
but I use the REPL a lot on servers and generally a lot of things that aren't my main computer
so being builtin is a big advantage
i guess it depends what you're doing, what workflows you use, etc
you can throw micromamba on that server and have python, ipython, and all the third party packages that your heart desires inside of2 minutes
those servers are not always connected to the internet
then yeah, that's annoying
we have servers like that as well, but micromamba is what my work uses for C++ and python environments, so we have our own conda-forge channel hosted, so that works.
also sometimes for troubleshooting I want to run a REPL in a docker container
why must the coredevs always thwart golfing