#internals-and-peps

1 messages Β· Page 5 of 1

wind shell
#

it did more loops

#

but it's faster

rose schooner
wind shell
#

the first did less loops

#

actually faster

#

so, the conversation was good but i've gotta go.

#

bye

rose schooner
#

just so it won't take too long to time something

dusk comet
#

Try it again with bigger initial fast_thing:
fast_thing=(1,)*1000

radiant garden
#

testing performance with your shell time is like measuring the temperature of tea by tossing the thing into lava and seeing how quickly it evaporates

#

but also starting the stopwatch the moment you toss the liquid

#

it tells you something, and it might even be related to what you want to know!

quick trellis
#

attempting to redefine function with modified ast; should this be enough?
func.__code__ = compile(tree, '<preprocessed>', 'exec')

quick trellis
#

wait no that's dumb

#

or not
not sure if im losing info like this or what

#

maybe i need a way to compile only the FunctionDef node

#

is that possible

quick trellis
#

^ solution: compile(tree, '<preprocessed>', 'exec').co_consts[0]

gray galleon
#

does python have a frozen dict type

feral cedar
#

no

gray galleon
#

so if i have to store dicts in sets i have to convert it into an immutable type like tuple or frozen dataclass?

raven ridge
#

yes

gray galleon
#

how about making a hashable dict subclass

raven ridge
#

I'm not sure - that might work

feral cedar
#

i think there's a frozendict that uses a HAMT on pypi

gray galleon
#

also its weird to think that not all collections in python are recursive data structures
sets cannot contain sets

feral cedar
median palm
gray galleon
unkempt rock
#

hello everyone
AS a full stack developer, I have 6 years exp in python
If you have any problem, just feel free to ask

gray galleon
#

is there a reason why reduce is tucked in functools while map and filter isn’t

feral island
gray galleon
#

isn’t map and filter just as rarely used as reduce since comprehensions exist

feral island
spark magnet
feral island
#

e.g. the mypy source code has about 11 uses of map(), none of filter and reduce

gray galleon
#

i guess comprehensions and genexprs become more unwieldy when you introduce multiple collections so there is still merit in using map?

feral island
#

if you want to map/filter using a named function, it's shorter to use map/filter instead of a comprehension

#

e.g. this one in mypy str_ver = ".".join(map(str, python_version))

#

the alternative would be something like ".".join(str(part) for part in python_version)

#

which is longer and requires you to come up with a variable name

gray galleon
#

str(_) for _ in python_version 😎

gray galleon
#

does python use chaining or open addressing for hash collisions

quick snow
fallen slateBOT
#
**PEP 603 - Adding a frozenmap type to collections**
Status

Draft

Created

12-Sep-2019

Type

Standards Track

native flame
#

that looks interesting

dusk comet
dusk comet
#

Also "oct" and "memoryview"

grave jolt
#

also id

dusk comet
#

id is useful for debugging

rose schooner
#

why shouldn't set be hashable? it's the only built-in iterable that automatically removes duplicates and stores only hashable elements

#

i get that it's mutable but that doesn't give a good enough reason as to why it shouldn't be hashable

#

can we not do dynamic hashing like ```py

python pseudocode; added .last_hashed_size field

def set_hash(self):
if self.hash == -1 or len(self) != self.last_hashed_size:
... # hash the set
self.last_hashed_size = len(self)
return self.hash

flat gazelle
#

there is frozenset

#

Hashing set by contents doesn't make sense for the same reason hashing a list by contents doesn't make sense

rose schooner
elder blade
#

That's a good thing, because now it is clear in the code that you'll need to reinsert it wherever it is stored

flat gazelle
#

it is more or less impossible to write a hash-based data structure that can handle the hash of its contained elements changing

#

the hash being stable is fairly important since that's what decides where in the datastructure the thing goes

dusk comet
#

If hash chnages, you will get weird behaviour (segfault, memory leak, idk)

rose schooner
#

after considering the implementation of set and dict i've concluded that the worst that can probably happen is duplication of set-type elements

flat gazelle
#

I doubt you can segfault, but it will not actually work

quick snow
#

!e syntactic sugar omitted for brevity

from functools import reduce
from operator import xor
class Set:
    def __init__(self, elements):
        self.elements = set(elements)
    def __hash__(self):
        return reduce(xor, self.elements)

d = {}
s = Set((2,3,4))
d[s] = 42
s.elements.add(1)
print(d[s])
fallen slateBOT
#

@quick snow :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 13, in <module>
003 | KeyError: <__main__.Set object at 0x7f2ae91b05d0>
wind shell
#

I don't really know why PEP-8 specifies that the space between function has to be 2 lines. I don't like it

flat gazelle
#

!e

from functools import reduce
from operator import xor
class Set:
    def __init__(self, elements):
        self.elements = set(elements)
    def __hash__(self):
        return reduce(xor, self.elements)

d = {}
s = Set((2,3,4))
d[s] = 42
s.elements.add(1)
s2 = Set((1,2,3,4))
print(d[s2])
```this also won't work
fallen slateBOT
#

@flat gazelle :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 14, in <module>
003 | KeyError: <__main__.Set object at 0x7f7cfdc80690>
dusk comet
dusk comet
rose schooner
halcyon trail
#

You won't be able to retrieve the object, which makes the data structure useless

rose schooner
#

wdym by "screw up the data structure"

#

is that the issue i said earlier

#

after considering the implementation of set and dict i've concluded that the worst that can probably happen is duplication of set-type elements

halcyon trail
#

If you mutate a key while it's in a dict

#

You can duplicate but obviously part of that is also that you won't ever find the original element

halcyon trail
#

Well that defeats the whole purpose of the data structure

flat gazelle
#

the only correct way to do this operation is to remove the element and rehash and reinsert it.

halcyon trail
#

Seems like a good enough reason to disallow mutable keys

flat gazelle
#

which you can already do with frozenset

halcyon trail
#

Right

#

In languages with mutation control like C++ and rust

#

Mutable types can be keys

#

They just prevent mutation while it's in the dict

#

Python doesn't have that option

rose schooner
flat gazelle
#

that is, remove, mutate, readd

#

if you can't afford the copies

rose schooner
#

yeah that's probably faster

flat gazelle
#

gods that is a minefield upon thinking about it

#

good luck

halcyon trail
#

I think Java and Kotlin don't try to enforce this the way python does

#

I'm not sure if it's that big a minefield

#

But it is definitely a minefield. Only the size is in question πŸ™‚

flat gazelle
feral island
#

(because those are going to look in the wrong hash table bucket)

halcyon trail
#

Right

halcyon trail
#

I guess crashing cpython with python code is always a bug

#

Hard for me to get used to the managed language mentality πŸ˜›

feral island
#

and sys.setrecursionlimit(100000000)

halcyon trail
#

I'm not familiar with code objects

#

For setrecursion limits, sure, same with forcing enormous allocations, manually sending in signals into the python interpreter

feral island
#

enormous allocations should just fail with MemoryError

rose schooner
rose schooner
fallen slateBOT
#

@rose schooner :warning: Your 3.11 eval job has completed with return code 139 (SIGSEGV).

[No output]
halcyon trail
#

I don't see how you can avoid memory failures

#

Linux doesn't tell you that you allocate more memory than is available by default

#

Allocate a lot of memory and the OS will just eventually kill your program

#

Windows is different

radiant garden
#

if creating a MemoryError fails with oom you're probably in trouble and are about to get sniped by the OS

#

But I'm not sure that counts as crashing as opposed to getting killed externally

flat gazelle
#

The way the Linux kernel works by default is by letting you allocate more memory than is available, and if you use too much of it, you get killed. To my knowledge you can't actually stop that from happening as a process

rose schooner
flat gazelle
#

The kernel won't tell you that you are overallocating and will die soon.

#

At least afaik

quick snow
#

You can definitely get MemoryErrors on Linux. malloc() can return NULL, and when the OOM killer gets you is a configuration thing IIRC

radiant garden
#

You'll raise it if you try to allocate something silly

halcyon trail
#

When does python raise MemoryError

#

Is it based on something other than malloc returning null

quick snow
halcyon trail
#

I don't disagree it can be useful

#

But not nearly that often I think

grave jolt
raven ridge
flat gazelle
#

indeed

#

you can get a memory error on linux by default via something silly like [0] * 2 ** 40

brave ore
full parcel
#

Is there any function related programing in python please suggest me best online website

quick snow
raven ridge
#

it's usually disabled in favor of overcommitting, though, because fork makes it extremely easy for a child process to inherit many gigabytes of memory that it has no intention of actually using. In practice, overcommitting is usually the better call for a POSIX system. That rationale doesn't apply to Windows, though, because it has CreateProcess instead of fork+exec

gray galleon
#

but its still an option

gray galleon
gusty vessel
#

fwewe

#

few

tawdry pond
amber nexus
rose schooner
#

3.9 is like 2 years ago

amber nexus
#

For 10 bucks I think that's a very solid deal regardless, yeah the interpreter changes but a lot of the fundamentals are remaining the same

rose schooner
amber nexus
#

Different economies I suppose πŸ‘€

white wren
#

Hey

paper echo
#

i'm happy to spend $15 on an e-book or donate to someone making good blog content, but i don't really want to purchase a physical book that will go out of date soon

broken sluice
#

Is there a forum here to discuss small change proposals to cpython internals, or even python's requirements as a language? I have a change I want to have an online discussion about, the scope is quite small though so posting a PEP would seem like too much red tape for what it does

#

are there core devs present on discord?

quick snow
charred wagon
broken sluice
#

I'll make a publicl shared doc - where I could keep track of all the counterpoints made, and respond to them just once. Let's try that at least ... I will post the link to the doc when I'm done writing it

broken sluice
#

https://docs.google.com/document/d/1et5x5HckTJhUQsz2lcC1avQrgDufXFnHMin7GlI5XPI/edit?usp=sharing
It is shared for viewing and commenting
I'm open for a discussion in here or comments on the doc... thanks!

halcyon trail
#

I would just figure out if there's a reason for that

#

The whole purpose of that variable is to make hashing deterministic for testing or reproducibility purposes

#

No obvious reason to exclude None

flat gazelle
#

for the same reason object() and function hashes are non-deterministic, it's just computed off of the memory address, rather than any deterministic value. Which I do agree is kind of dumb.

broken sluice
#

I do mention the possibility of making the hash not constant, but rather a deterministic function of the hash secret (which makes it so if you specify PYTHONHASHSEED, it will be constant across your runs).
That's arguably a better choice from a practical POV

halcyon trail
#

It didn't even occur to me they would apply that to None

#

Im not really a fan of allowing hash by address.to start with
Most languages don't allow it

dusk comet
#

im getting same results in every interpreter launch
im not using PYTHONHASHSEED
3.11

grave jolt
#

that's because None happens to be on the same address every time

broken sluice
#

Yes. the values are different every run only on systems that apply ASLR

halcyon trail
#

It's not cooncidence

grave jolt
#

For me the values are different, I'm on Linux

halcyon trail
#

Those sessions are open at the same time it looks like

#

The binary only gets loaded into memory once when you.run an executable multiple times

broken sluice
#

but ASLR is a pretty important infosec feature, so "disable it if you want your hashes to be stable" is a pretty bad argument

grave jolt
#

.wiki ASLR

neon troutBOT
#
Wikipedia Search Results

Address space layout randomization
Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In

IOS
Darwin 21. iOS 16 is based on Darwin 22. In iOS 6 the kernel is subject to ASLR, similar to that of OS X Mountain Lion. This makes exploit possibilities

halcyon trail
#

Yeah no argument

grave jolt
#

ah icic

dusk comet
#
>py -c "print(hash(None))"
-9223363242347854132

ok, now it is different

halcyon trail
#

Did you me close all the sessions and relaunch?

grave jolt
#

store some random piece of data?

dusk comet
halcyon trail
#

You wouldn't, folks who use address hashing deserve what they get πŸ˜›

flat gazelle
#

I meant specifically for None, of course object() will have a non-deterministic hash no matter what

grave jolt
#

ah

#

I misunderstood then

flat gazelle
#

same with user-defined functions

dusk comet
#
>>> x = object()
>>> hex(id(x))
'0x189ab5d5070'
>>> hex(hash(x))
'0x189ab5d507'
>>> assert hash(x) == id(x) // 0x10
dusk comet
halcyon trail
#

Java has the same default hash implementation

#

Smh

broken sluice
#

id and object.__hash__ are both deterministically calculated from the object's memory address, but not in the same way

grave jolt
#

but that would break like... everything innit

flat gazelle
#

yeah, that's probably not on the table

#

considering all of CPython is built on passing pointers around

grave jolt
#

your table is very non-deterministic

broken sluice
#
Py_hash_t
_Py_HashPointerRaw(const void *p)
{
    size_t y = (size_t)p;
    /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
       excessive hash collisions for dicts and sets */
    y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
    return (Py_hash_t)y;
}

and I think id simply returns the address itself, but I'm not sure

#

@grave jolt : if CPython moved objects in memory it would alter their id and hash under current implementation, thus break correctness

grave jolt
#

yeah that's what I meant

#

well, that would be the least of the problems

broken sluice
#

In languages where objects are allowed to move, something like id has to be stored inside the object (and even then, taking the initial address as the id is wrong, because something else could be allocated there after the original is moved away)

spark magnet
dusk comet
broken sluice
#

It uses the same hash as object, if tp_hash is 0, it will call Py_HashPointer (from what I can tell)

#

So assuming there is merit to my proposal, how am I supposed to actually make it? It got shot down horribly on the forum, maybe because my initial arguments for it were weaker.
On github, a core developer just closed my PR and the issue, and that was it.

I don't know what else can be done at this point

rose schooner
broken sluice
#

Yes

spark magnet
#

@broken sluice I think the general objection will be that hash() was never intended to be useful beyond the current process. If you need something like that, you need to implement it with the guarantees you want.

broken sluice
#

But if that is true, why does the PYTHONHASHSEED feature exist? and it is not deprecated ... and people rely on it for debug/research purposes

#

and as I said in the doc, you can't implement your own hashing strategy in Python

spark magnet
broken sluice
#

the builtin containers do not accept a hasher as an outside parameter

spark magnet
broken sluice
#

besides, is it really a counter argument?
I mean sure, Python isn't obliged to support this, but this seems like going out of its way needlessly to break something, at face value

spark magnet
broken sluice
#

Consider my example:

KeyType1 = tuple[int] | tuple[int, int]

@dataclass(frozen=True)
class KeyType2:
    foo_id: int
    bar_id: Optional[int]

does it make sense that one of these key types hashes deterministically and one doesn't?

#

well, the docs explains why I think that hashing a monostate variable to a constant is the "common sense" thing to do. That's why I said going out of its way. But in terms of implementation you are right, it simply inherits from object

spark magnet
#

generally, changes to Python need to have real-world use cases rather than "it makes sense" arguments. Again, you might have them in the doc, I haven't read it.

feral island
halcyon trail
#

I'm a bit confused because this is such a standard technique

#

Regression tests and even unit tests commonly use a randomized seed for say rng which they log. If you have a failure you want to reproduce, you rerun it feeding the logged seed

#

Granted that things which use address.hashing will not be reproducible regardless, but many things will be. Anything that has a notion of value semantics, structural equality, etc

#

And optional fits into that

#

(rather, none fits into that but that's usually seen in an "optional" context)

spark magnet
#

just to be clear, I am not arguing against this proposal. I'm trying to help explain the core dev mindset, to help @broken sluice

halcyon trail
#

Sure, I'm just saying, the use case seems pretty clear, no?

#

Reproducibility always matters

#

Salting hashes makes sense for prod, in testing it's ok to salt hashes but you need to be able to reproduce any test run as closely as possible

#

I would have assumed that that's the purpose of pythonhashseed rather than a stopgap

#

Sets for example still don't have defined iteration order, it's easy to write code that accidentally depends on set iteration order. Say you do and your tests usually pass, now one night they happen to fail. It's not great if you can't reproduce that quickly and easily

spark magnet
#

I'd be in favor of making the change

feral island
#

making only hash(None) reproducible isn't a very thorough solution for that problem though. Hashes of e.g. types and function objects are still nondeterministic

halcyon trail
#

Sure, you can't solve that problem completely if you've.chosen to rely on identity based hashing

#

But it's pretty debatable to say that None is an identity based type

#

To put it mildly

#

None being a Singleton is an implementation detail

spark magnet
#

it's not an implementation detail, it's important that x is None behave the right way.

#

but being a singleton is a good reason for it to have a fixed hash

halcyon trail
#

It's a monostate type

spark magnet
#

how is that different than singleton?

halcyon trail
#

Singleton is an implementation detail of being a monostate type, in most cases.

#

Python chose to make is None idiomatic

spark magnet
#

can you explain the difference between singleton and monostate?

halcyon trail
#

A monostate is a type with only value

#

A Singleton is a type with only one instance

spark magnet
#

ok, do we agree that you can't have a Python where None isn't a singleton?

halcyon trail
#

Well, because of breaking is None checks, sure

#

Not for any other reason I can think of

#

If python just did == None which is generally what you see, then it would truly be an implementation detail

#

My main point I guess is that None usually shows up as a value as part of an implied Optional semantic

#

Since always having None isn't very interesting

#

And Optional[T] is something that has structural equality if T does, without fail

dusk comet
halcyon trail
#

And structural hashing, generally, as well

grave jolt
#

actually, why did is None become idiomatic? pithink

halcyon trail
#

Idk. The funny thing is that it was encouraged, but it's arguably pretty terrible

#

Is None cannot be overloaded

#

But I don't think that argument is compelling

flat gazelle
#

there are types like numpy arrays which implement == elementwise

#

so probably for those

halcyon trail
#

Fair point I suppose

#

I was going to say that performance reasons are the main reason to use identity rather than structural equality for None

broken sluice
# spark magnet how is that different than singleton?

can you explain the difference between singleton and monostate?

A monostate type is a type whose instances can have only a single possible state
In Python since all values are held by references, the only thing that makes sense is to store such a type in a singleton. But that's not generally the case in all other languages.

Regarding hashing, the common sense thing to do when hashing a monostate type is to return a constant. That's what I'm arguing, at least.
I am saying that the Optional "None" type (i.e. a disengaged Optional) is a monotype variable - hence the proposal. Even though in Python, None has other meanings, but we are kind of stuck with None representing a disengaged optional

spark magnet
broken sluice
#

If it were practical to separate the Optional None from the "null reference" None (or whatever otehr meanings people ascribe to it) that would be a great solution, unfortunately it doesn't seem practical at this point

#

If Python implemented Optional[T] as a Union[T, Unit] and had Unit hash to a constant in the first place, we wouldn't be having this discussion right now
There was a choice to overload None which caused this complication, it didn't have to be this way...

and the concerns of "value types" are universal to programming, I don't think they become irrelevant just because we are writing in Python
even if Python puts less emphasis on these ideas

halcyon trail
spark magnet
#

@broken sluice I don't know what Unit is in this case, and I don't understand what you mean by "overload None".

spark magnet
broken sluice
#

So maybe I can't convince Python devs if that's really the prevailing attitude

#

We say reproducibility is important sometimes, you say it never is, so there's a stalemate

halcyon trail
#

Just think the attitude is a bit unfortunate. No language is an island after all.

#

Also I don't think ned said reproducibility never matters? Maybe I missed it

broken sluice
#

denball did:
Hashes makes sense only in the same process. If you are saving hash and using it in other process, hashes are not obliged to mean anything useful

which is effectively the same thing, worded differently
if hashes are different, your second run won't be the same as the first. It is enough that iterate a set anywhere in the program, and it will diverge

dusk comet
broken sluice
#

I did that, but it's a lot of trouble for devs to maintain such code
having to say Union[T, Unit] instead of Optional[T] everywhere
having to convert None to Unit and back

#

and all that, for what? what is the gain from that approach over modifying None?

dusk comet
broken sluice
#

let's say you write an optional to JSON. Must convert from Unit to None
it gets read back into None, must convert back to Unit
it is a hassle
You're not being honest if you claim it is trivial to sanitize None out of an entire large program, and keep it that way
and again, is it a good idea to ask people to do that? only because they want reproducible runs?

spark magnet
broken sluice
#

You are right, but like I said, if people really want to think a certain way, nothing I say will ever convince them otherwise

#

and that's what it looks like to me :/

spark magnet
#

it might be that they need more help to see the advantages, or how little it costs to make the change.

broken sluice
#

so let's discuss the cost ... what is the actual cost?
I mean, apart from CR
Python devs will know more than I do about that for sure

spark magnet
#

it sounds to me like it would be a dozen lines for the change, and perhaps a dozen for a test.

halcyon trail
#

Yeah, that makes sense. Seems very low cost, even if the benefits arent going to be seen as huge

halcyon trail
#

See some of the above examples I gave around testing

dusk comet
#

Why would you need hash(None)? Are you using it as key in dict? Item of set? Are you hashing tuple with None?

halcyon trail
#

A data class with an optional field?

broken sluice
#

Scroll up, there is an example where the hash of None injects non determinism into seemingly harmless key types

KeyType1 = tuple[int] | tuple[int, int]

@dataclass(frozen=True)
class KeyType2:
    foo_id: int
    bar_id: Optional[int]
halcyon trail
#

Being used as a key

#

Extremely common examples

broken sluice
#

Anyway I made a PR for this and it was closed. I think if I try to open another one, it might be viewed as trying to spam. and I can't re-open the closed PR AFAIK

#

Also in the implmentation, we need to choose between two options

  • None hashes to a constant
  • None hashes to a deterministic function of the hash secret, so it only stays constant if PYTHONHASHSEED is used
flat gazelle
#

I would go with constant tbh, PYTHONHASHSEED matters for hash collision attacks, which is impossible with None

#

get ready for the which constant bikeshedding

broken sluice
#

I put 0xbadcab1e in my PR ... but I don't mind changing that πŸ™‚

halcyon trail
#

Also as to why it was closed , did you read the GitHub thread?

#

Looks like it was closed by a bot because you didn't update news (patch notes roughly speaking)

broken sluice
#

I did - rhettinger explained himself there

#

No, the bot did not close it; I did update the news, via blurb

#

and the bot had green status on my change
rhettinger then posted the following on the issue

Thanks for the suggestion but this doesn't make sense. The default hash for every object is its object id. There is nothing special about None in this regard. Also, hash randomization was added intentionally for strings and bytes β€” we're definitely not in business of trying to make hashes constant and we don't want people to come to rely on a particular hash order or value.

and closed my PR

halcyon trail
#

Ah sorry

#

That's incredibly lame, fwiw

#

I can respond fwiw not that I'm anybody

#

I can't seem to actually find rhettingers comment

broken sluice
#

my opinion doesn't count for anything either
so I don't know, maybe we should leave it for someone who has the cred to push such a change

#

you don't see it there?

halcyon trail
#

Ah there we go

#

I can respond there, probably won't help. Maybe on a mailing list you need to raise this first? Happy to support you there as well

broken sluice
#

I could try that

spark magnet
broken sluice
#

I sent the mail already

spark magnet
broken sluice
#

Now also opened another thread in the forum, hopefully I don't get an infraction for that or something

feral island
#

so his opinion should have some weight

broken sluice
#

Fair point, however, it's one thing to think this change is wrong, another to say this: "There is nothing special about None in this regard".

He could say "catering to this use case is not worth the effort"
or "this will hinder planned changes x/y/z" or all kinds of other reasons. But saying something like that just makes it seem like he gave it only a moment's thought at best. Even if he in reality has extremely good reason for what he's saying

swift imp
#

Are they saying they want the hash of None to be constant throughout multiple sessions? I thought it's hash was based on id but it's a singleton so that's constant for the lifetime of the program.

broken sluice
#

yes, multiple sessions.

swift imp
#

Why

#

I don't understand the issue

#

Or should I say the benefit

#

I'm not seeing it in the issue

broken sluice
#

You are debugging a program, or there is a unit test failure.
The test runs a lot of code. Somewhere the program computes a set of keys, then iterates on it and does more things under the loop. Let's say the keys are frozen dataclasses with optionals in them. because of the non-determinism of the hash, the set are organized differently each run. So anything downstream from that point diverges every run. Now it might cause flaky tests, failure to repeat problem cases even if you log the inputs, etc.

#

and all of that for what? No one ever told me who actually benefits from hash(None) changing every run

rose schooner
#

well this has been going on for the duration of my sleeping time

pliant tusk
broken sluice
#

In theory, yes. In practice, given the exact same operations history and same hash values, the set will be in a perfectly identical state each run, and thus will iterate its contents the same order. You don't know which order and don't care, but it will be the same

pliant tusk
#
  • my point is that code that relies on deterministic set order is already a bug
broken sluice
#

but the code doesn't rely on it

rose schooner
# broken sluice In theory, yes. In practice, given the exact same operations history and same ha...

i have no idea what you're talking about ```py

py -c "print([*{'afsaf', 'blak', 'clf', 'hae', '01s'}])"
['clf', 'afsaf', '01s', 'blak', 'hae']

py -c "print([*{'afsaf', 'blak', 'clf', 'hae', '01s'}])"
['blak', 'hae', 'clf', 'afsaf', '01s']

py -c "print([*{'afsaf', 'blak', 'clf', 'hae', '01s'}])"
['hae', 'afsaf', 'blak', 'clf', '01s']

py -c "print([*{'afsaf', 'blak', 'clf', 'hae', '01s'}])"
['afsaf', 'blak', 'clf', 'hae', '01s']

broken sluice
#

yes, try that with PYTHONHASHSEED=constant

#

your bug was that the hash values weren't the same

rose schooner
#

ok

broken sluice
#

read carefully the scenario I mentioned. I did not say the code needs to assume anything about the order in which it will read things from the set. Any order is legal

rose schooner
#

well i'm not too careful of a reader or too expert of a developer to understand what's even being talked about here

pliant tusk
#

because of the non-determinism of the hash, the set are organized differently each run. So anything downstream from that point diverges every run. Now it might cause flaky tests,
^ that is stating a reliance on set order

broken sluice
#

I can just paste a small example then, few mins

rose schooner
broken sluice
#

code:

from dataclasses import dataclass
from typing import Optional


@dataclass(frozen=True)
class Key:
    foo_id: int
    bar_id: Optional[int]


set_of_keys = {
    Key(foo_id=i, bar_id=None)
    for i in range(10)
}

for key in set_of_keys:
    # If we perform any downstream logic here based on keys, behavior will diverge
    # # between subsequent runs, even though set_of_keys is a constant input
    print(key)

run 1:
Key(foo_id=9, bar_id=None)
Key(foo_id=4, bar_id=None)
Key(foo_id=7, bar_id=None)
Key(foo_id=2, bar_id=None)
Key(foo_id=0, bar_id=None)
Key(foo_id=3, bar_id=None)
Key(foo_id=8, bar_id=None)
Key(foo_id=5, bar_id=None)
Key(foo_id=6, bar_id=None)
Key(foo_id=1, bar_id=None)

run 2:
Key(foo_id=6, bar_id=None)
Key(foo_id=1, bar_id=None)
Key(foo_id=9, bar_id=None)
Key(foo_id=4, bar_id=None)
Key(foo_id=0, bar_id=None)
Key(foo_id=7, bar_id=None)
Key(foo_id=2, bar_id=None)
Key(foo_id=8, bar_id=None)
Key(foo_id=3, bar_id=None)
Key(foo_id=5, bar_id=None)

broken sluice
#

these things are not at odds with one another

pliant tusk
#

im saying that if the code is non-deterministic when running normally, it should either be configured to enforce order explicitly when debugging if that is what you want

broken sluice
#

imagine the set is a set of legal choices for some search algorithm
it might be correct to go over them all in any order
but maybe something goes wrong and you want the entire thing to take the exact same steps at each point
it generally makes life easier when you're debugging
I thought it's kind of obvious, but maybe not

pliant tusk
broken sluice
#

I know there are workarounds

#

very often people use sets to deduplicate for example

#

now sure they can do such a thing without sets

#

but it's very easy to fall into the trap

pliant tusk
#

function_to_produce_consistent_order(list(set(items)))

broken sluice
#

and again, for what purpose do you want to make our lives harder
what is the external cost ..

#

If you did list(set(items)) you broke it

#

if your items can be sorted, you can stick a sorted at the end and you're OK. But not all keys are comparable like that

#

and again, this is placing the burden on researchers who might not be too keen on preserving determinism

pliant tusk
#

set order is non-deterministic by design, it is not something that can/should be able to disabled

broken sluice
#

it has undefined order. No requirement on the order. That is not the same thing as non-deterministic

#

It's a crucial point

pliant tusk
#

how are those different?

broken sluice
#

non-determinism is a behavior, not a requirement

pliant tusk
#

If i run code in IronPython, PyPy, or CPython that uses sets, it can act differently

#

by design

broken sluice
#

try to create a set of ints, that will show non deterministic behavior, being fed the same data/operations

#

really try it

#

something like, starting from these fixed inputs and these fixed operations, I run it once and get one order, I then run it again and get another order

#

again, i'm not saying anything about what the order is, any requirement at all

#

I don't care if it's different in another Python runtime

#

If all you're doing is debugging something, you don't care about that

pliant tusk
#

tbh i dont care enough about this to do any of that, just if you are debugging something that loops through a set like that where a specific order of items breaks something you should build a proper test harness for that code to test your possibilites

#

not depend on the language doing it for you when it explicitly says it doesnt

pliant tusk
broken sluice
#

if the tests fail it's an indication the code or the test is wrong
I agree that tests should be extensive and then they'll catch order dependency bugs
and you know what? maybe for UTs this is good enough

but what about running the entire thing on some fixed input? let's say you have an input where the program did something that doesn't make sense and broke on an assert
what if the odds of that happening again are 1:1000, and it takes 30 min of compute time to get to that point

#

I mean, reproducible behavior has value, of course you can get by without it. You can get by without a lot of things

pliant tusk
#

then you need better tests lmao

#
  • and better logs
broken sluice
#

look, if you think reproducible behavior has no value at all, the discussion can end there

pliant tusk
#

my point is that if you have code that is that complex, and has the potential to fail in odd edgecases, it should be possible to debug post mortem without needing to rerun the code

broken sluice
#

I understand that, and maybe if I wrote all the code that I'm responsible for, the situation was better. It's a lot of code that was written in a hurry, not everyone writing the operations research code we have is even an engineer.
It's a nightmare to debug it, and non determinism isn't helping

Yes, it would be nice if all of the complex behavior was broken down to small component and tested extremely well in isolation
In reality it isn't, though

you can just say, "well sucks to be you" but I am just asking who's actually benefitting from the non-determinism

#

if the answer is no one, then why have it

#

again note that if one day someone makes sets iterate completely at random every time they can. My change does not contain any contractual guarantee for sets. It can break at any time in the future and I am OK with that

pliant tusk
#

afaik, the non-determinism of sets is a speed optimization

broken sluice
#

no, I don't mean that (sets are still deterministic today!) I meant the non deterministic hash of None

#

again I could be wrong - show me a history of operations on a set that takes in only constant data with constant hashes and ends up iterating its data in a different order every run. That would be non-deterministic sets

pliant tusk
#

^ probably possible as hash(obj) relies on id(obj) which is the address of obj which is non-deterministic from python

broken sluice
#

but then it is not the set that is non-deterministic, it is the hash

#

that is the point

pliant tusk
#

thats like me saying "its not the door that opens, its the door knob"

broken sluice
#

No, it is not some philosophical statement...

pliant tusk
#

my point was that if a dependency of an operation is non-deterministic, then the operation is non-derministic

#

the set inherits it

broken sluice
#

sets being non-deterministic means you can take things with fixed hashes, say construct the set {1, 2, 42} and iterate them, and let's say it returns: 2, 1, 42. Then you create another set exactly the same way and it iterates them in a different order

pliant tusk
#

but you can make sets out of things that do not have fixed hashes

#

so yea, a subset of sets can be deterministic, but all sets cannot be deterministic

broken sluice
#

right. I agree. then all bets are off.
but I usually avoid doing that. The researchers do too

pliant tusk
#

I am trying to point out that what you are asking for is not just a deterministic hash of None

broken sluice
#

they tend to use ints enums and such in keys. and if we set PYTHONHASHSEED we are generally ok.
Expect when they try to use Optional[int] and then we're not

pliant tusk
#

its a deterministic hash of everything

broken sluice
#

not everything.

#

that can't be done anyway.

#

what types are used as keys? tuples of ints, strs, maybe bool, enum, maybe even frozensets of those things

pliant tusk
#

!e If it is truely justNone then just do this locally and call it a day ```py
from fishhook import hook
@hook(type(None))
def hash(self):
return 0xdeadbeef

print(hex(hash(None)))```

broken sluice
#

all can be hashed deterministically, if only optional None didn't screw us over

fallen slateBOT
#

@pliant tusk :white_check_mark: Your 3.11 eval job has completed with return code 0.

0xdeadbeef
broken sluice
#

I thought of that.
Won't that break the world though?

#

if there is any set or dict anywhere in the Python runtime that was already created based on the default hash of None we are screwed

pliant tusk
#

yea it would probably break things

#

you can also use LDPRELOAD

#

but at that point just define your own class inplace of None and use that

broken sluice
#

A C extension to patch the tp_hash descriptor?

pliant tusk
#

it would work, but again, its a bad idea

broken sluice
#

I guess I might do that. it is better than asking people to use non standard idioms for Optional

#

after all what they mean is Optional

#

not Union[T, SomeMadeUpSentinelJustSoICanHashItToZero]

pliant tusk
#

you can use a .pth file to hook Optional and swap out None with your custom class

broken sluice
#

but Optional isn't actually a class

#

there is no such type

#

It is in fact, just a T | None

#

I'm warming up to the C extension idea
after all nothing in the code will rely on it. we can always throw it away if we want

pliant tusk
#

then hook types.UnionType with fishhook and swap out the None there

broken sluice
#

I don't think static type checkers will understand that

#

but I'm not sure

pliant tusk
broken sluice
#

let's say your dataclass has a field that says
x: Optional[int] = None

A type checker will see that and consider it legit
and it is

how can your hook transform it into
x: Union[int, SentinelType] = Sentinel

#

seems really difficult to rewrite the program in such manner with no help from the programmer, in the general case at least

#

maybe with descriptors on all fields

#

actually
maybe the solution is to patch the hash function generated for datacalss

#

(but there's also NamedTuple...tuples, etc)

#

C extension is less headache I think

#

In fact, easiest is to just backport my PR into whatever version of Python I'm using

halcyon trail
#

Probably you missed it because it was discussed earlier on, but it's pretty similar to how test frameworks that work with RNG start with a singular seed that may default to being taken from e.g. an OS source of randomness, but can actually be fed in at the command line

#

The idea being that if your tests fail sporadically, you can look at a failing test, look at the logged seed, and feed it back in to reproduce it exactly

#

Nobody should be depending on set iteration order on purpose, the point is that if you accidentally depend on it subtly, when you get a test failing you want to reproduce it exactly

spark magnet
halcyon trail
#

Sadly I'm not surprised

#

Do you have the link handy?

spark magnet
halcyon trail
#

Hmm why is it gone?

spark magnet
#

Β―_(ツ)_/Β―

#

i haven't read the whole thread, or know how to tell these things

halcyon trail
#

I mean with respect people who didn't seem to understand the main ideas got into writing very long posts that were mostly irrelevant and conversation just got derailed

#

I think the origin proposal needs to be stated in a more tightly focused way

spark magnet
#

could be

halcyon trail
#

Like the whole discussion of ordered set

#

That's totally irrelevant here. Not sure if the original post by corak didn't do s good enough job steering away from that

spark magnet
#

my recommendation generally is to try to say yes as much as you can, and avoid saying no. It will keep the discussion where you want it.

raven ridge
spark magnet
#

but i also know it can be very frustrating to try to convince them, and sometimes you just can't.

spark magnet
raven ridge
#

that still doesn't make it deterministic for most objects, just a for those from a very small handful of types. So, yes, if you depended on iteration order of a set, and the only things in the set were either things that already have consistent hashes across runs or None's, then this proposal would allow you to reproduce the behavior. But if the set contained any other element that has inconsistent hashes across runs, you've still got the same problem as you had before.

spark magnet
#

and none is the only non-deterministic hash there.

raven ridge
#

those might get you a long way, but they don't get you all the way. Is something that makes a test reproducible only in very specific circumstances really that valuable?

halcyon trail
#

Optional is a member of many value semantic data classes

spark magnet
halcyon trail
#

Hettinger is wrong when he says that None is no different than any other type. None is one of a handful of basic building blocks for creating types that people use

#

And it's used very extensively in types with value semantics

#

Just like strings and numeric types

spark magnet
#

you tell them I'm in favor! (this means nothing)

halcyon trail
#

Obviously if you have classes that use identity semantics as keys things will be non reproducible, but that's not very surprising, and using identity semantic classes as keys is a choice and the impossibility to reproduce results is just one of the downsides

#

Personally I never use identity semantics in hash keys

#

The question I guess is how to reopen this discussion without people getting so upset

#

I don't know why that thread escalated so quickly

rich cradle
#

i think conflating set iteration order and making a consistent hash for None blew it up pretty fast

spark magnet
halcyon trail
#

Yeah that's very unfortunate

#

Online discussions you have to be laser precise

#

Also from the get go it's better to suggest that None has a salted hash the same way strings do. It maximizes security at no real performance or reproducibility cost

spark magnet
raven ridge
#

they do, but that's an implementation detail... I think folks were bristling at the idea of promoting that from implementation detail to supported feature

halcyon trail
#

I think simply because there's almost no downside you should salt it. For it's, the performance hit was significant

halcyon trail
spark magnet
raven ridge
#

Python didn't salt its hashes until very recently, and then they added salting only as a mitigation of a security vulnerability

halcyon trail
#

I'm genuinely surprised to see e.g. a cpython core dev refer to "unnamed use cases for reproducibility between runs"

halcyon trail
spark magnet
halcyon trail
#

Not sure what the motivation was, but it seems possible this need had already been recognized

raven ridge
halcyon trail
#

I mean presumably if anything else is ever salted it will be using that same value to seed the salt

#

I.e. that environmental variable will always allow for reproducibility of any value semantic hashing

raven ridge
#

you don't consider ```py
class SomeClass:
pass

SINGLETON = SomeClass()
hash(SINGLETON)

halcyon trail
#

It hasn't defined ==

#

Or hash

raven ridge
#

that's correct, but it supports both equality and hash

halcyon trail
#

A monostate Singleton is something of a degenerate case

#

That's a big part of why None creates confusion

raven ridge
#

I'd call it value-semantic because it does allow == - so if this degenerate case is value-semantic, then no, PYTHONHASHSEED does not make all value-semantic hashes reproducible.

halcyon trail
#

For a monostate Singleton, all instances are equal and hash the same so it value and reference semantics aren't exactly distinguishable

#

I would say it's clearly a corner case but not very likely to come up much

#

It comes up a lot for None in practice because it's part of Optional, in practice

#

And Optional is a pretty common building block in value semantic types

raven ridge
#

well, that's probably the most convincing way to formulate this argument.

  • Reproducible hashes are useful for reproducing test failures (which is why pytest defaults to printing out PYTHONHASHSEED, for instance)
  • The non-reproducibility of hash(None) even when PYTHONHASHSEED is set is the only thing making the hashes of many simple dataclasses non-reproducible.
gray galleon
#

when will python have symbol type ||have i asked it before||

long isle
#

Help me

gray galleon
boreal umbra
#

@rich cradle I've always thought the evils of inheritance were overstated, but that's probably because I rarely actually make subclasses, and when I do, it's specifically to avoid boilerplate.

rich cradle
#

i've just never written code that really needs that kind of structure. i dunno why.

#

i tend to abuse things like protocols though. holdover from using typeclasses in haskell and rust.

#

the entire inheritance model seems strange to me, architecting shared behavior as a tree, but Β―_(ツ)_/Β― i don't actually use it where possible

boreal umbra
#

well, that's why we have duck typing bing_shrug

rich cradle
#

well, now that i think about it, it's not even a tree, it's a... directed acyclic graph? maybe? which is even more wild.

swift imp
#

When you say protocol you mean dunders or structural protocols

rich cradle
#

whatever typing.Protocol is, so probably the latter

swift imp
#

How do you abuse that

rich cradle
#

i use it in places where inheritance probably would be more appropriate, that's all

swift imp
#

Oh

#

I mean isn't using a protocol vs a abc like functional vs oop

#

Just different paradigms

boreal umbra
#

it might be a different take on OOP. it might even be a different paradigm. but if it is, functional isn't the one

swift imp
#

Wait

#

Is typing.protocol the one you subclass?

boreal umbra
#

!docs typing.Protocol

fallen slateBOT
#

class typing.Protocol(Generic)```
Base class for protocol classes. Protocol classes are defined like this:

```py
class Proto(Protocol):
    def meth(self) -> int:
        ...
```  Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example...
swift imp
#

And then you can type hint saying your callable takes in an instance matching that protocol

#

Yeah yeah ok

rich cradle
#

well it's different to some extent, but it's the most similar there is in python that i know of

#

well, i think it is. it's been months since i wrote sane python code. the past few months have been largely random things to test one of my projects.

swift imp
#

I don't think I write generic enough to really use protocols

#

And while I've used ABCs, it's honestly unnecessary

boreal umbra
#

I feel like ABCs are only there to appease people from languages that have them

#

it's more consistent with Python's philosophy to just... not instantiate the class.

halcyon trail
halcyon trail
#

That's very useful and a very easy mistake to make by accident

halcyon trail
#

Python protocols are more like Go's interfaces

rich cradle
#

yes

#

but i think they're the closest usable equivalent we can have in python, at least in my usage of them

halcyon trail
#

Idk, they are more or less close depending on what you look at

#

ABCs are explicit, like typeclasses

#

In that sense, ABCs are closer

rich cradle
#

right, hence my "usable"

#

you can't add ABCs to random stdlib types or things from other packages

#

...i think

halcyon trail
#

I think you can but I'm not sure if mypy recognizes it

raven ridge
#

I just see Python's Protocols as formalized duck-typing.

#

Protocol lets you describe what a duck looks like

rich cradle
#

that's exactly what they are. but i'm fine with that, personally.

halcyon trail
#

Fwiw in 95 percent of cases, ABCs are very easy to use.

#

It's compile time duck typing more or less

#

Which is what structural typing is

#

In that sense it matches well with python

#

But there's a fair amount of criticism (that I agree with) in just implicitly satisfying a constraint because you have the right API

rich cradle
#

i absolutely would prefer to tack on a ridiculously powerful type system to python. i just don't think it's fundamentally possible, and would break a lot of things.

halcyon trail
#

That's why structural typing is very rare, out of popular static languages mostly just Go uses it

#

It's not really about the power of the type system per se

rich cradle
#

isn't it? python has built a lot of its type system by shoving things into the class model, even if they don't necessarily fit.

halcyon trail
#

No? A specific choice of the type system isn't the same as it being more powerful

rich cradle
#

no, wait. ignore my previous statement. that was another thing i somewhat disagree with, but not what i meant to say.

halcyon trail
#

It's just worth trying to use ABCs if you havent. Using ABCs has very little to do with the inheritance rabbit hole

rich cradle
#

i think what i'm really getting at is "i want static typing, and i want language features that only make sense with it," but there's no chance in hell that's happening. that arguably wouldn't even be python anymore.

halcyon trail
#

IME most usages of polymorphism don't actually require the loose coupling provided by protocols

#

A class that implements an ABC is explicit about it, which is nice, and it also means you get errors early rather than later, like with protocols

#

Well, sure,.but I'm talking out of options available in python

rich cradle
#

right. i probably shouldn't have even brought that up.

halcyon trail
#

I forget how

#

But almost sure it can be done

#

There ya go

#

Basically for many people this would be the ideal in a static type system

#

Explicit but non intrusive

#

ABCs are explicit and intrusive

#

Protocols are non-intrusive because they're implicit

#

Haskell type classes and rust traits have this nice explicit but non intrusive property

broken sluice
#

https://discuss.python.org/t/hash-none-mk-2/21465/16
if you care at all to write anything there
I think there's no use tbh
I've made my case, both sides are repeating the same arguments

radiant garden
#

Feeling odd deja vu here

quick snow
# broken sluice https://discuss.python.org/t/hash-none-mk-2/21465/16 if you care at all to write...

Have you looked into faking /dev/random? That should account for ASLR and anything else, fixing not just your specific usecase, but any hashes of arbitrary objects (I think): https://stackoverflow.com/a/26067735/1016216

broken sluice
#

The source of the non-determinism is the memory location of None (since that is what the hash function is based on). It is not due to input from RNGs

elder blade
#

!e print(0xBADCAB1E)

fallen slateBOT
#

@elder blade :white_check_mark: Your 3.11 eval job has completed with return code 0.

3135023902
elder blade
flat gazelle
rose schooner
dusk comet
elder blade
#

That would make more sense, but unfortunately the PR they submitted is weird and the behaviour they seem to want even weirder

broken sluice
#

I'm not sure which makes more sense than returning a constant, there are arguments and opinions both ways

#

it does seem to make a lot of sense to me for a monostate type to hash to a constant, but what do I know

halcyon trail
broken sluice
#

I couldn't quite figure it out of all the memes

halcyon trail
swift imp
#

The biggest argument against it, is the false premise that set iteration is dependent purely on hashes and not history of the set, Steve D provided that counter example in first reply of OG thread

native flame
#

the history of the set doesnt change over multiple runs

swift imp
#

After thinking about it more, I get what the OP wants but the reason for their wanting it, is just wrong and could be found for any number of classes, even after fixing None

native flame
#

like what though

swift imp
#

Pick virtually any object whose hash is based on id and u r back to square one

native flame
#

the argument is that all other objects commonly used as dict keys dont do that

swift imp
#

That's weak imo

native flame
#

str, bool, float, int, tuples of those

swift imp
#

Str do not give constant hash

native flame
#

they do if you set the seed with the flag

feral island
#

type objects are reasonable choices as dict keys

#

and they also have non-reproducible hashes

swift imp
#

Exactly

native flame
#

fair

swift imp
#

I've used registry patterns where the key is the type

#

I don't want to beat a dead horse, I just think time would be better spent refactoring their need to iterate a set in specific order

#

Pretty sure the help or repl hashes types, into a set no less. I've got weird errors when I messing up custom hash implementations and all of a sudden repr broke in the repl

flat gazelle
swift imp
flat gazelle
#

None uses an id-based hash. Which is mostly going to be stable due to the way modern OSs work with memory. But it is nevertheless a non-deterministic value

halcyon trail
#

Just a misunderstanding

#

Because that was never a premise

#

The point is reproducibility, reproducibility just requires eliminating actual sources of randomness

#

Whether the sets history affects iteration order doesn't matter, because it's not magically randomized

#

Nones hash is randomized. Just like strings hash is. The latter provides a way to make it non random. The former doesn't.

halcyon trail
#

The example should be kept in mind but to use it as a basis to reject improvements in reproducibility would IMHO be Nirvana fallacy

broken sluice
#

Set's history affects iteration order and that's perfectly fine, because the next time you run the program on the same input it will create the same set in the same way

#

It's a misconception they keep repeating over and over

#

the only thing the set itself can do to break reproducibility is if it reorganizes its internal structure in a manner that isn't deterministic by its input commands

#

for example, a set that has a thread that concurrently rehashes it or something

#

I don't know of a single programming language that offers only non-deterministic sets. It's a nightmare honestly, and concurrent hashmaps are used only for super high perf applications, that probably don't want to use Python anyway

gray galleon
#

is it me or @ is the most underused operator in python
its only use case is in numpy for matrix multiplications
even then dot gives the same functionality

umbral plume
#

also, its recommended to use @ over np.dot when possible, since then expressions appear to translate over to mathematical equations much clearer (plus a little boost in performance i think)

quick snow
gray galleon
#

unary @?
you mean decorators?

quick snow
#

Yes

#

I guess it's not an operator there

dusk comet
#

and __divmod__

quick snow
fallen slateBOT
#

Modules/_datetimemodule.c line 2173

/* Could optimize this (by returning self) if this isn't a```
dapper lily
#

time for a PR

deft horizon
#

Makes me want to implement date @ time -> datetime, it's a bad idea but would be cute.

#

I'd also nominate @ as the function composition operator (by exact analogy to matrix multiplication!) but we don't even have functools.compose(), so.

dusk comet
#

You can fishhook FunctionType.__matmul__

grave jolt
#

wait... unary @?

#

oh you mean decorator syntax?

paper echo
grave jolt
#

btw, is it used anywhere outside of numpy?

#

and the cursed emails thing

paper echo
#

yeah xarray has it

halcyon trail
#

My guess is that they probably don't want people randomly abusing operators just to get infix

paper echo
#

probably true, "here's a random operator have fun" would be pretty un-pythonic

halcyon trail
#

In 99 percent of cases if an operator isn't already familiar in a context then overloading it is the wrong call

#

(pun intended?)

grave jolt
#

/ for paths and urls was kinda strange tbh

#

but I think I got used to it

#

Haskell libraries used all kinds of custom operators with urls

halcyon trail
#

It's not really strange

#

It's the character separator and it's what you type in to combine paths in bash

#

It's certainly less strange than + for strings.

#

C++ also uses operator /, just like python

#

Another reality here is that standard library just has more leeway. They can pick something semi reasonable and everyone will learn it pretty fast. For a third party library it's more annoying really to use operators in obscure ways

grave jolt
dusk comet
halcyon trail
grave jolt
#

something something monoid

halcyon trail
#

Yeah I've seen people make this argument before

#

It's ridiculous

#

Ironically, * was of course used for multiplying reals, integers etc long before scalars

#

Which is commutative

#
  • was selected for matrix multiplication because it's conceptually similar to multiplication
sacred yew
flat gazelle
#

they could have quite literally picked any operator in existence, it's julia, they support all of latex.

halcyon trail
#

Mathematicians weren't slaves to the fact that they were using a previously commutative operator for a non commutative operation

flat gazelle
#

but eh, if someone wants to go all math nerd for string concat, sure

sacred yew
#

integer subtraction is noncommutative
guess * should be for subtraction then

halcyon trail
#

When you put higher value on formalism than concepts relative to mathematicians you know you're in bad shape

flat gazelle
#

well, subtraction is not asociative

#

the convention that * is for associative operations is a fairly new one

dusk comet
# grave jolt

Following this logic, string repetition should be +, because it is commutative operation: 5+'abc' == 'abc'+5

rose schooner
#

power is right-associative and non-commutative though?

halcyon trail
#

I think intuitively it's pretty clear that conceptually string concatenation is like adding. Each item present in each of the arguments shows up exactly once in the final string

#

And with multiplication the things in the collection are multiplied

#

In other words, if z = x + y, then len(z) = len(x) + len(y)

#

And the same relationship holds for string multiplication

swift imp
quick snow
quick snow
#

How could they use *, that denotes a noncommutative operation, while scalar multiplication is commutative! Should have used + for multiplication, clearly

deft horizon
# swift imp Cannot agree more. I would like function composition too

In fairness matrix multiplication very common in the sciences! Fortunately it's also just a special case of function composition (where the functions are affine transformations), so that makes sense. And like (matrix) multiplication, function composition is often represented by an infix dot operator πŸ™‚

dusk comet
#

__matmul__ is not always a matrix multiplication
__add__ is not always a addition
__truediv__ is not always a division (fpr example pathlib.Path)

So, "matmul" is not bad name for that dunder. Different operators have different names (+ addition, - substraction, * multiplication, / division, @ matrix multiplication), and the operator names are irrelevant to what the operators do. So I don't see it (matmul being a name for operator) as a problem

#

There is also z3 lib (iirc), that have some placeholder variables, and X+Y becomes not result of addition, but some expression object, that can be evaluated at given X and Y

#

Also there is some lib (i forgot name), that have some "magic" var (iirc it is stored at lambda or phi symbol name), and var+1 becomes lambda x: x+1

#

So, you can do whatever you want with operators until you and your users understand what's happening

sacred yew
#

sympy?

tall surge
#

For the dunder methods __getattr__, __setattr__, and __delattr__ a difference that emerges between them is that __getattr__ is called only if looking up an attribute in an object dictionary fails but for __setattr__ and __delattr__ are called regardless of whether the attribute is present in the objects dictionary but why is it that __getattr__ is handled differently from __setattr__ and __delattr__ rather than handling them all the same?

feral island
grave jolt
#

I guess this is a naming issue then πŸ˜„

#

__setattribute__ when

feral island
#

I think the default __getattr__ behavior is useful because you often want it as a fallback for attributes that aren't explicitly defined, while attributes that are defined normally can just use the normal system

#

Yeah the naming isn't great, probably a historical accident

tall surge
#

thank you!

radiant garden
quick snow
#

I like this asymmetry. When you want to customize item access, you define __setitem__/__getitem__. When you want to customize attribute access you define __setattr__/__getattr__, almost always. When you define __getattribute__, you have to be extremely careful, you almost never want it.

grave jolt
#

get and set are extremely ambiguous and overused words

#

well, in this context it's probably appropriate

static bluff
#

Question about Python dictionaries

#

I'm learning about hashing in my data structures class

#

I just learned about how, as the load factor of the hash table approaches 1, the time to find an unoccupied slot (or, a the time to perform an unsuccessful search) approaches linear time

#

Generally speaking. But I've heard for most of my python-using life that dictionaries are lightning fast, at least in Python terms, and that dictionary operations are in more or less constant time

#

How does Python handle this? Amortized resizing of the table?

feral island
#

Python automatically resizes dictionaries as they grow yes

#

Not too familiar with the details, but that should keep access times amortized constant

#

It is possible to get bad behavior if you have many keys that happen to hash to the same bucket

static bluff
#

XD There's no winning, is there?

feral island
#

It's fairly unlikely in practice. String hashes are randomized to avoid DoS attacks where many keys map to the same bucket

#

It's probably still possible with ints (which have very predictable hashes) but that's rarely relevant in practice

feral cedar
#

i think they get resized by a factor of 9/8 when they get 2/3 full or something like that

#

To avoid slowing down lookups on a near-full table, we resize the table when
it's USABLE_FRACTION (currently two-thirds) full.
load factor ^

  • Currently set to used*3.
    how much to expand by ^

i think 9/8 is for lists

tacit hawk
#

Is the hash() of int and floats constant? If yes is it stable or just an implementation detail?

feral cedar
#

it's an implementation detail. the only thing that must be satisfied is that equal ints/floats have the same hash

spark magnet
halcyon trail
#

Ints hash to themselves

#

Floats have to be compatible with that, I think

#

Although as a general rule of thumb you just don't want to use floats near hash tables

feral cedar
#

^ integral floats hash to an int equal to themselves, but non-integral floats hash to...something

halcyon trail
#

Statically typed languages almost all just disallow using floats as keys or for lookup

#

(at least by default)

raven ridge
fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

-2
spark magnet
#

mostly ints hash to themselves πŸ™‚

halcyon trail
#

Interesting

#

Is this a general thing for negatives?

spark magnet
#

no, just -1

feral cedar
#

no, it's a funny implementation detail of the hash function, lol

halcyon trail
#

Please tell me you guys know why this happens

feral island
#

it's because returning -1 indicates an error

spark magnet
#

also, hash(i) == (i % 2**61) (I think)

halcyon trail
#

πŸ€¦β€β™‚οΈ

feral island
halcyon trail
#

Under what circumstance does hashing return an error

feral island
#

def __hash__(self): 1/0

raven ridge
halcyon trail
#

That doesn't answer my question though

rose schooner
raven ridge
#

I'd have to check the data model docs, but I doubt that's guaranteed...

halcyon trail
#

If the user hash throws.then I suppose you can just propagate it. I don't see any legitimate reason for it to throw though,.so I'm surprised that they reserved a sentinel for this

rose schooner
halcyon trail
#

I've never seen hashing reserve an error channel in another language, I think

feral island
halcyon trail
#

Yes, it can, but you can also propagate that exception

feral island
halcyon trail
#

The point is that I don't see why it would be a use case to care about it, so I don't see to do it any favors

raven ridge
#

efficiency

halcyon trail
#

Efficiency in the error path shouldn't be a consideration here

feral island
#

otherwise you'd have to call PyErr_Occurred() after every call to tp_hash

#

so the non-error path would be slow

halcyon trail
#

Is it more efficient in the happy path?

raven ridge
halcyon trail
#

Ugh

#

That makes me sad

raven ridge
#

why? It's an implementation detail

feral island
#

and it doesn't really affect you if you're working at the Python level

#

unless you go out of your way to check the value of hash(-1) or something

rose schooner
halcyon trail
raven ridge
#

Python code thinks that __hash__ returns a Python int. For efficiency, the actual C code doesn't use Python ints, it uses int64_t's. So there's always got to be some conversion happening to get from one to the other.

halcyon trail
feral island
#

check the code cereal just linked

raven ridge
#

if your __hash__ returns -1 as a Python int, the code that turns that into an int64_t (aka Py_hash_t) returns -2

feral island
fallen slateBOT
#

Objects/typeobject.c line 7500

wrap_hashfunc(PyObject *self, PyObject *args, void *wrapped)```
raven ridge
#

I doubt that microoptimization saves very much, honestly - but it lets the error handling path be c if (hash_code == -1) { // propagate exception } instead of c if (hash_code == -1 && error_occurred()) { // propagate exception }

#

allowing -1 as a hash that isn't just a sentinel for an error would mean that everything that hashes to -1 needs an extra check to see if -1 is or isn't an error indicator.

#

I don't think it improves the performance of things that hash to the other 2^64-1 values, though - so maybe in practice it doesn't save too much.

halcyon trail
#

I wonder if any other mainstream GC language has something like this, and I just didn't know about it

raven ridge
#

my gut feeling is that this is probably a microoptimization that makes pretty little difference given today's branch predictors.

#

regardless, it's an implementation detail that's invisible to everyone except for those working at the C API level, so 🀷

halcyon trail
#

If there was no sentinel then error_occured would always have to be called right? That's what was discussed previously

raven ridge
#

right

halcyon trail
#

error_occured is expensive, it was alleged

raven ridge
#

but some functions in the C API do return -1 as both a sentinel and a real return value - if they return -1 you need to check the error-occurred function as well

halcyon trail
#

So the branch predictor wouldn't really save you

#

Yes, this sort of thing makes me wince

rose schooner
raven ridge
halcyon trail
#

I was comparing to not using a sentinel at all

raven ridge
halcyon trail
#

I still don't think I understand though, Jelle s example doesn't return -1, it just throws

#

So there must be some C code that catches that exception, and then returns -1

raven ridge
#

yes

#

and then other C code that sees that -1 and propagates the exception

rose schooner
halcyon trail
#

So this is written this way based on being old and/or wanting to be efficient on older machine

#

On x86 64 returning a two word trivial object has been basically free for ages

#

Or 32 bit, I suppose

raven ridge
raven ridge
halcyon trail
#

Yeah. But anyhow you can see why I wince, having a sentinel that overlaps the legit range for something never feels good

#

It's like the C functions that parse strings to int

#

0 to indicate an error

#

An error sentinel which is probably also the most common legitimate output πŸ˜›

raven ridge
feral island
halcyon trail
#

I guess that there's like zero chance of moving to C++?

#

For implementation details

raven ridge
#

they just moved to C99 πŸ˜„

#

like, this year.

halcyon trail
#

That makes sense to me

#

I mean when did msvc start supporting c99

#

Like 6 months ago πŸ˜›

raven ridge
#

yeah. not long ago.

halcyon trail
#

Gcc did the C to C++ move continuously, so it's not unthinkable.

#

But definitely not easy

tacit hawk
raven ridge
#

definitely don't do that.

spark magnet
spark magnet
halcyon trail
#

If you.outsmart the database you become the database

#

It's like beating up the bouncer

gray galleon
#

is there a pep for symbol type in python?

flat gazelle
#

what would that entail? If you mean symbols in the style of erlang et al, thats pretty much the sentinel objects PEP.

gray galleon
#

i mean interned names like in lisp and ruby
wait python already have interned strings

#

it doesn’t look like symbols but nice

flat gazelle
#

yeah, python just uses strings in those places. Thanks to the interning, the comparison is fast enough even without the identity, though unlike symbols they aren't namespaced. But for sentinel-style stuff, you use None or object()

long isle
grave jolt
unkempt rock
#

!pep 638

fallen slateBOT
#
**PEP 638 - Syntactic Macros**
Status

Draft

Created

24-Sep-2020

Type

Standards Track

unkempt rock
#

This could be interesting to see in an interpreted language

boreal umbra
#

I thought that PEP was rejected a few years ago for fear that it would fracture the ecosystem

native flame
#

i kinda hate the idea

#

both for any and callable

feral cedar
#

that's kinda cursed tbh

quick snow
#

Eww

#

any could be synonymous to Union instead

#

Then we could also finally have all (for hypothetical Intersection)

#

any[str, int]. all[Indexable, Sized]

radiant garden
#

or an even worse alternative, all for Never

native flame
#

i dislike the idea of "reusing" functions for annotations at all tbh

radiant garden
#

if Any is any, BABAXD

quick snow
#

I promise I will start using type hints when you can do arithmetics with them. Sequence - str, ~int, ...

radiant garden
#

hell, might as well reuse iter for iterable

native flame
#

map for Mapping

gray galleon
radiant garden
#

serious answer: because different semantics

#

real answer: because fewer characters to type

feral cedar
native flame
feral cedar
#

@grave jolt

Whatever you have β€” a strnig, a number, a function, a chess piece β€” it's an object.
strnig πŸ˜”

gray galleon
radiant garden
#

rename TypeGuard to isinstance

feral cedar
grave jolt
grave jolt
quick snow
native flame
#

typecheckers will let you do anything with Any

#

it isnt treated like normal types

grave jolt
#

Yeah it's special

quick snow
#

I see

gray galleon
#

btw how do i make recursive types
smth like this ```py
@dataclass(frozen=True)
class LinkedList:
first: object
rest: LinkedList # will throw an error

feral island
gray galleon
#

hmm

feral cedar
feral island
#

(or use from __future__ import annotations, or PEP 649 in the future)

feral island
halcyon trail
#

It's pretty confusing in a way since Any is usually used as a name for the top type

#

and to me at least that's also what the name implies, verbally

feral cedar
#

yeah i think c++ does that

halcyon trail
#

C++ doesn't have a top type

#

it has a library type called any that can hold anything though

feral island
#

typescript's any is like Python Any

halcyon trail
#

are you sure? then there's a mistake on the wikipedia page

halcyon trail
#

yeah you're right

#

does TS not have a proper top type?

#

unknown maybe?

#

yeah, seems like unknown is the top type

halcyon trail
#

those names feel backwards to me

feral island
#

unknown gives an error when you use it, but there's no error for accessing an arbitrary method

halcyon trail
#

Any to me feels like a known type, that could just happen to be anything
Unknown is a type we don't know anything about, and we're opting out of type checking

#

weird

#

at least in python object is a pretty typical top-type name

feral island
halcyon trail
#

Since there are no legal operations on unknown, it immediately errors when an unkown variable is mentioned, in a context where its type hasn't been narrowed in some way

#

it's not really conceptually different though

#

Doing it this way is a little, "extra" strict because x = y is always a legal operations in python and AFAIK JS

#

similarly, if you have a my_list: List[object], then my_list.append(y) is legal even if y is of type object

#

so error'ing immediately when it's mentioned is very odd. I'd have it as a warning.

#

but I can see the benefits practically

dusk comet
# native flame map for Mapping

compile for types.CodeType
isinstance for TypeGuard
getattr[cls, 'attr'] for declaring same type that cls.attr is (i guess there is a PEP or some proposal about that feature)
globals['varname'] - same as getattr, but using global vars instead of attrs of some type
locals['varname'] - same as globals, but using local vars
sorted for Iterable[T] where T is SupportsLT
sum for Iterable[T] where T is SupportsAdd
vars for dict[str, any] (commonly used for arbitrary namespaces, globals() and json dicts have this type, for example)
hash for Hashable
iter for Iterable
bool for Boolable
aiter for AsyncIterable
len for Sized
reversed for Reversible
open for SupportsFSPath
abs for SupportsAbs
round for SupportsRound
dir for SupportsDir
divmod for SupportsDivMod
format for SupportsFormat
pow for SupportsPow

#

x: property[int] for read-only var or attr (UPD: i realized it is almost equivalent to x: Final[int])

boreal umbra
# native flame i kinda hate the idea

I don't like it, either. any and typing.Any represent two different concepts that aren't even guaranteed to share the same word in every human language. And I think Python should limit its already preferential status for English.

#

I actually didn't know about the callable builtin. I wish it had been named is_callable.

feral cedar
#

why is it even a built-in

halcyon trail
#

it's a bit of a weird choice because Any is not an annotation you want to be using that often

#

I almost never use it. Usually it enters into it implicitly, because your code depends on first or third party code that was written without annotations, so Any is the "default" when annotations aren't present

grave jolt
#

and if it's not installed, it prints the mypy output

halcyon trail
#

the concept of built-ins is a bit strange to me; or maybe I'm just attaching more signifiance to it then it deserves, because of the name

grave jolt
#

on a real printer

#

hmm

halcyon trail
#

i prefer to just think of functions/classes that are imported by default

grave jolt
#

yeah

halcyon trail
#

seems to be how it works in Kotlin, Rust

#

is there any meaningful distinction though between that, and a python "built in" ?

grave jolt
#

Haskell has a "prelude" which is basically a library star-imported by default. IIRC you can even replace it with a different one

halcyon trail
#

i don't know if it's replaceable in kotlin or rust

grave jolt
#

I don't think Rust even has "built-ins"

#

ah, some macros

#

yeah it does, I misremember

halcyon trail
#

it does

#
fn main() {
    let x: Vec<_> = vec!(1,2,3);
}
#

a valid rust program

grave jolt
#

yeah yeah

#

it was a brain fart

feral island
grave jolt
#

yeah I don't think there's something sacred about builtins

halcyon trail
#

in kotlin/rust you can actually have far bigger predules, because of the ability to scope things to classes, without them being members

#

like, all of itertools for example, is iirc in the preludes of both Rust and Kotlin

#

in python this would be extremely annoying and people would rightfully complain

grave jolt
#

hmmmm

#

well, it's kinda different

#

unless I misunderstand you

halcyon trail
#

it's different because it's member-scoped

grave jolt
#

yeah

halcyon trail
#

yeah, that's what I said

grave jolt
#

oh, you mean traits and extension methods

halcyon trail
#

yes

#

you can have all of itertools in Kotlin or Rust, as part of the prelude/"builtins", because they're just available via member functions syntax that are only available on suitable types

#

groupBy in Kotlin is an extension on Iterable<T>, in python groupby is just a free function

#

so with the latter, you'd have issues with shadowing and such if you define any function, class, or variable called "groupby"

#

I have lost track of how many times I've tried naming a variable "input" in python

#

only to get yelled at by my IDE, sigh, and change it

grave jolt
#

remove the IDE check brainmon

#

I have used id liberally

#

and filter

halcyon trail
#

I mean yeah 99% of the time i twon't matter but I have actually, a couple of times, hit a really confusing bug caused by shadowing

#

I think it's just best practice to not use those names

#

it starts off okay, then other programmers see that id is the idiomatic name for some variable that comes up in your business logic, and they start using it

#

soon there are local varaibles called id everywhere and then eventually something bad happens

#

it's annoying but there is no real technical justification to just not get a different name

#

student_id or foo_id or whatever

grave jolt
#

my usual fuckup scenario is when I format that id somewhere into a string

#

and then get something like Foo(id=<built-in function id>)

halcyon trail
#

nice

grave jolt
#

another solution might be banning certain builtins altogether in a linter, like id

#

that would also catch such mistakes

halcyon trail
#

id would also be a good example of something that could just be an extension.
I actually never thought too much of that benefit of extensions; not polluting the global namespace. kinda cool.

grave jolt
#

I think id should've just been part of sys tbh

#

I don't think I've actually used it besides the REPL

#

I guess it's useful if you want to include the id in the __repr__, like the default __repr__ but with some extra stuff.

#

But that's a very niche use case, hence it could be moved to sys

halcyon trail
#

yeah I agree

#

input is probably my personal worst offender. very useful variable name and I almost never use the function.

raven ridge
#

!e print(type("".split))

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

<class 'builtin_function_or_method'>
raven ridge
#

that's "builtin" in the first sense

#

almost as annoying as the fact that "package" means at least 2 different things in Python

grave jolt
#

coroutine πŸ˜”

raven ridge
#

yeah, that's another one.

#

it's moderately interesting that you can monkeypatch in new (or replacement) builtins in Python

#

!e ```py
import builtins
builtins.hello = lambda a: print(f"Hello, {a}!")
hello("World")

fallen slateBOT
#

@raven ridge :white_check_mark: Your 3.11 eval job has completed with return code 0.

Hello, World!
raven ridge
#

Sick of your linter complaining that you've shadowed builtins.id? Just del it! hyperlemon

halcyon trail
#

Idk about rust but in Kotlin you can shadow the prelude

#

No warning either

#

It's just a lot more explicit since it can only be done by an import and not by a local variable I don't believe

raven ridge
#

If you modify the builtins module in Python, all modules will see your change, since all modules share a reference to the same builtins module

sacred yew
#

aside from the async function one

feral cedar
#

the function itself is called a coroutine, and the thing such a function returns is also called a coroutine

sacred yew
#

ah

feral cedar
#

same with generators. the "correct" name would be "coroutine function" and "generator function" but no one actually says that

sacred yew