#internals-and-peps
1 messages ยท Page 113 of 1
Heya my dudes- I posted in help-cake and I thought I might ask you guys to check it out. I know cross posting is frowned upon, but one person posted back saying that not a lot of people would have the vocab/experience to be able to comment
Which means it calls in you guys' wheelhouse
I guess
You know you guys have really inspired me. Shifting my syntax to better conform to norms is proving quite painful, and lord knows I have a tendency to make things way more complicated than they need to be
So I'm reeeeeeally trying to pull it in, in part because of my talks with you all
#software-architecture seems like the best place for that discussion.
Thanks
!e
class static_decorator:
def __init__(self, /, **kwargs):
self.kwargs = kwargs
def __call__(self, func):
for key, value in self.kwargs.items():
setattr(func, key, value)
return func
@static_decorator(num_calls=0)
def thingy():
thingy.num_calls += 1
for _ in range(10):
thingy()
print(thingy.num_calls)
@boreal umbra :white_check_mark: Your eval job has completed with return code 0.
10
The idea is to emulate static variables in a function. I've never actually wanted that, but someone on python-ideas does. (That person also wants Python to be C++ for some reason.) Does a widely-used thing like this already exist?
what does static mean inside a function in c++?
Could it be something type checkable but not enforced at runtime?
!e Is that really much better than, say, ```py
def thingy():
if "num_calls" not in thingy.dict:
thingy.num_calls = 0
thingy.num_calls += 1
for _ in range(10):
thingy()
print(thingy.num_calls)
@raven ridge :white_check_mark: Your eval job has completed with return code 0.
10
It's a variable with global lifetime but scoped to a function, basically.
It is initialized on first use, lives until the program dies, and is only accessible from within that function
Huh... why
Global but not global?
Definitely seems like something python doesn't need
shared across calls, but not shared across functions. Immortal and lazily initialized, but not globally scoped.
!e The few times I've needed this I've just done:
def thingy():
thingy.num_calls += 1
thingy.num_calls = 0
for _ in range(10):
thingy()
print(thingy.num_calls)
@raven ridge :white_check_mark: Your eval job has completed with return code 0.
10
I would agree that python doesn't really need it
the two major uses of static locals in C++ are:
- to have a lazily initialized global, that is also thread safe in its initialization.
- For certain compile time computation situations that simply are not applicable to python.
I'd use it static in Python occasionally if it existed. It'd be helpful for manual memoization, compared to the usual my_func(arg1, arg2, *, _cache={}) idiom, for instance. But it's not particularly hard to live without - you can achieve the same effect in lots of ways.
Well, a third use case for static locals in C++, which is usually not a great idea, is for caching
and cachingin python is usually achieved via decorators
functools.lru_cache does it by injecting a closure, for instance.
In C++, because it's not GC'ed, and because of the performance sensitivity, and because of the threading issues, having an "implicit" cache like this is very heavily discouraged
at least, among all the people I've talked to
If you want to have a cache most people will tell you to be explicit, write a class with operator ()
I think that depends a lot. I've written one-element static caches in library code in C++ that drastically improved the performance of some program. The overall design of the program was bad, but the cache in the lib was easier than rearchitecting the app.
Yeah, that's a good point, as always if you are constrained to be backwards compatible, then adding a cache to a function can be acceptable
I would definitely only use that sort of caching for POD stuff, though. Destructors firing after main can fail spectacularly.
well, your cache will probably lead to a destructor firing after main
in my case, nah - it was caching POD stuff. That's what I'm saying.
the individual elements can be POD, but you still have a data structure for the cache usually
unless it's a no argument function
but if hte function has arguments typically you have a dict and lookup on the dict using the arguments
well, we're getting wildly off topic at this point
I was about to say explain why that didn't apply in my case, but it's really neither here nor there ๐
fair enough
The thing is that python doesn't really believe in scoping anyway
so you may as well just have your function access a private global defined in the same module
"we're all adults here" and just not touch it anywhere else
prefix it with underscore
etc
locks are another example for statics in C++ - if you have a single function where all calls need to be serialized, making a lock that's local to that function is reasonable. Though as soon as you need to share it between multiple functions, it either needs to be a static class variable or a global
Eh, I mean, it's not completely terrible, but it's not how almost any good C++ programmer will recommend you write code, today
if you have to retrofit things with legacy, keep backwards compatibility, etc, that's another story
You can do that, but then you're forced to come up with a name for it - instead of ```py
def thingy():
thingy.num_calls += 1
thingy.num_calls = 0
you need to do something likepy
_thingy_num_calls = 0
def thingy():
global _thingy_num_calls
_thingy_num_calls += 1
Sure, that's not such a big deal
well - it's not better, either
For one time code I would do that over using a fancy decorator
if you want something reusable, sure, something like lru_cache is nice
I don't see any way in which the global is superior to just making it an instance attribute of the function.
it takes more code, and looks uglier to my sensibilities.
I see what you're saying, i think they're pretty much equivalent
Yeah - the decorator is just a way to move the variable assignment from below the function definition to above it. Which seems to be solving a non-problem to me.
I almost never use such things though. You're taking away a lot of control from users with no benefit really. Put the state in a class, and if you want you can provide a "convenience" global instance of that class
More testable, easier to read, and now the user has choices, at no extra cost, they could use the global instance and not manage thing.num_calls at all, or they could spin up two separate instances if they needed to, etc
At that point I suppose you could also provide "convenience" functions that use the global instance automatically; but it's a trivial wrapper at that point, and the key is that the user can spin up more instances if they want to
I more or less agree, but global state isn't something you should never have, just something to minimize. Sometimes it's the best solution to the problem.
I'm not saying you shouldn't have global state at all
I'm basically just saying you should "globalize" it at the last possible second, if that makes sense. Write a class that wraps up the state and the function calls, that can be used anywhere. Then, if you need it as a global, make it a global.
logging is a good example of this
logging.warning is just a trivial wrapper around logging.getLogger().warning
I totally understand what you're saying. And I mostly agree. ๐
I mean on the internet you really can't hope for more than that, can you ๐
testability is by far the best argument for that type of design. But, well - we don't live in a perfect world, and devs need to balance lots of different concerns. Backwards compatibility, ease of use, ease of documentation, difficulty of misuse, and a million other things.
so, if Python had static, I'd use it. Rarely. I find myself doing something in Python that I'd do with a static in C about once every year or so.
But not often enough that I think it's worth adding a language feature for, really.
I think in the last decade, I actually have not used static in C++ outside of for Meyer Singletons, or in conjunction with static
sorry, in conjunction with constexpr
the former is just not necessary in python (and as of C++17, rarely necessary even in C++). And the latter just isn't applicable.
u can just use function properties as statics in python I think
but yeah statics are not necessary and that's pretty gross
hey guy, i have a question, if AI can understand Human Language, can it create a software automaticlly based on human describe - Human Language. ?
there is a difference between converting spoken language to written language, and actually understanding language
AI doesn't really understand human languages yet
we've managed to teach it to convert certain words -> actions by basically brute-force teaching it, but the gap from that to understanding words and creating a program from it is very wide
My favorite example of why AI probably never will understand human language is sentences like:
When I dropped the bowling ball on the glass table, it shattered.
When I dropped the drinking glass on the sidewalk, it shattered.
in both of those sentences, any human reading them well have no trouble telling you what "it" refers to. An AI has absolutely no idea what "it" refers to in either of those sentences, because in order to figure out what "it" means you need to know that a glass table is more likely to break than a bowling ball, but a drinking glass is more likely to break than a sidewalk.
interpreting human languages requires massive amounts of context beyond what's present in the text.
An AI trained on lots of text will be able to tell the difference
Right, I've had training as a English as a second language teacher, and a lot of expressions we take for granted in the west are incomprehensible to someone learning English by the dictionary standard.
the more training the better able to tell apart context they are
but at some point text generation AIs plateau off
Train the model on two people from Glasgow speaking to each other ๐๐
that's just cruel
got it
do you guy think cyberware is coming ? , human will invent a small computer and connect them to the brain, it's not powerful like our computer but it can be used to do daily task like smartphone
i saw Elon Musk had a plan for that
to some degree it already exists. We have brain computer interfaces for helping paralyzed people, or the like
and i believe it will come true soon
there is research being done on brain-computer interfaces and some basic level of it already exists, but there are a lot of unknowns still and testing will likely be rather slow due to, you know, ethics with human reserach
i wonder what if we're not trying to text to that person , we're think that person weird and the device will sent our thinking to that person ๐
Like the computer dont know which thinking is order/command which is just thinking
I honestly don't think that's true. No matter what, when you train an AI on lots of text, it's limited to being able to understand a situation it has seen before and been specifically trained on. It can't draw novel inferences.
It's not that it's impossible to train an AI to interpret both of the sentences that I gave above correctly, it's that there are a nearly infinite number of similar situations, and you can't possibly train it on all of them
Sure, but there's an extent to which language models can also extend their capabilities beyond "this is in the dataset"
If I recall, GPT3 can perform some simple arithmetic that's not once included in its training data
Not that it's anything more than extrapolation, but it's still impressive that it can do that with relative accuracy
That's very impressive, but simple arithmetic has relatively constrained rules that can be inferred from the text. Learning to solve the "it broke" example requires you to know, for any arbitrary pair of objects, which is more fragile. Then replace "broke" with "melted" and you have to know which thing is more likely to melt in heat. Or replace it with "shrank" and you have to know which one is more likely to be made of a material that changes size. Etc, etc
I could argue that basic qualities like that are equally feasible to infer.
Especially given the vast swathes of context that they're given in (the total training) text.
well, I disagree, but I guess we'll see.
let's come back to this conversation in 10 years and see how much or little things have changed ๐
maybe when AI's put the I in AI
and are not basically glorified non-linear regressions ๐
language models are useful, but i'm not sure they come close to anything like what we consider "intelligence"
they're "intelligent" in the way a sea cucumber is intelligent
actually sea cucumbers are probably a lot more intelligent than bert
@raven ridge your examples are really excellent and they bring to mind an old discussion in philosophy and in more theoretical discussions of AI
whether intelligence is at all possible without "corporeal-ness" - having a body, participating in the world
understanding all these sentences, the context you need to understand them immediately, or to be able to take a step or two of open ended reasoning, can that ever originate from learning from any corpus, no matter how large, or can it only be gained by living, interacting with your environment
what do you define as โnovelโ?
the whole point of supervised learning is extrapolation
this is tantamount to saying that there is some mystical quality inherent to biological computation that cannot be replicated in code
certainly the degree of experience a human being is exposed to far outstrips the amount of data that even the most advanced models are trained on at present
No, I'm saying that's true of our current approach to AI. I think there are fundamental limitations to the "feed it a bunch of text and ask it to make inferences" approach.
Not that meat is better at it, but the way we train meat and the way we train computers are fundamentally very different.
it is
but I do not see why, in theory, it is impossible to learn from text alone
โlearnโ in the abovementioned sense of associating attributes with abstract objects
this, specifically
โฆwhich is also where you referred to AI in general, instead of any specific method of training
because this information can still be captured in text
An interesting way to look at this might be to compare learning a language with native fluency to driving safely. When a person learns to drive, it takes maybe 100 hours of practice or so, but we've had teams of brilliant engineers trying to build self driving cars for decades, and they're still terrible. Meanwhile, for a person to learn a language with native fluency takes years of practice. Maybe there will be some breakthrough, but with the current approaches we're taking, I doubt we'll have an AI that can have a conversation at native fluency in my lifetime
Is the live transcribe feature google has partially based on text prediction?
Or is it purely sound recognition
I would imagine it is
Text prediction is absolutely trivial compared to understanding text
Most CS students build a Markov chain generator in college.
true but also low quality predictions
Am doing some stuff with aws comprehend rn
Yeah, it's garbage, but it's massively easier than understanding text. 'I misheard someone, but they either said "the Arctic" or "the art hick"' - bet I know which one they said.
Was it said in a snooty art gallery in portland tho
๐ I don't have to know anything about what those words mean to know which of the two phrases that closely match the sound I heard was most likely to be right.
A corpus of text and frequency analysis can easily answer that question.
I don't bet, but if I did I would take you up on this ๐. Even though I largely agree with your earlier statement, but within your lifetime I think we'll get a breakthrough with text.
The main thing I'm thinking of is things like the sentiment neuron, there seems to be enough information in text that, by association, even if we could never mimic the efficiency of human learning, we could still end up teaching novel connections that honestly even we didn't anticipate.
To actually understand the world around you you have to develop something called an 'ontology'. Originally a philosophical term, an ontology is a web of concepts both real and abstract, and their attributes
For example, a concept node 'apple' would be connected to a concept node 'fruit' by a link of type 'is' (or 'subtype' or something). 'Apple' would also have attributes 'color' and 'sweetness', nodes in their own right
A model, really, of the world. An object oriented network of concepts linked to each other by different types of association. Hand-building one of these is possible but extremely time consuming even for smallscale world models. I've always wanted to build one for a gameworld inside a 4X type empire game
So the solution is to develop a method to 'grow' one of these through the act of teaching, just like a with a human. Unlike normal machine learning though, which requires massive datasets, this type of associative learning places much more weight with each encounter of a concept
We really just don't have the vocabulary yet to properly define the problem, but we do know its the direction we need to head. I'd imagine a good part of the 2030s will revolve (in the AI industry) around ways to build these webs. Speech (and written text by extension) is simply the medium by which new information is ingested
That'll be the decade, I'm guessing, where we start to see AI begin to manifest simple personalities and start to take on much more organic learning and behaviour patterns. Most people outside the programming sector seem to feel that AI will be by definition a rigid, highly mechanical/alien creature- but the reality is a general-intelligence must grow, think, and behave organically
I feel that, quite interestingly, this process starts bumping quite close to what we might call consciousness. Some process would be required to take in speech or text (the method by which real world information is transferred) and abstract it to a web of nodes and associations. When outputting information, the opposite would have to occur, where abstract concepts and links would have to be flattened and converted to natural language. If you take this process and run it in a loop- the AI constantly using first order and other logic to derive new links between existing concepts, you have (in my opinion) thought. If you formalize this process by converting each new associating to text form within the ai's mind, you have inner monologue
And with enough sophistication and a literal concept of the self (in the of a node) that can be used in logic, you have something beginning to resemble self awareness
Alright I'm done, sorry for the schpeel
I'll just point out that people have been predicting that artificial general intelligence is only 20 years away for about the last 50 years
True ^^ I'll believe it when I see it, also
More than 50 years, even:
AI pioneer Herbert A. Simon wrote in 1965: "machines will be capable, within twenty years, of doing any work a man can do."
AGI is like nuclear fusion
always round the corner after the next
I'm pretty sure we'll have cold fusion first, but maybe I'm just more optimistic of that because I'm more ignorant of physics ๐
Fusion maybe, but I highly doubt cold fusion
tbf humans can't infer what that means without context either either, it's a famous sentence and a very well-trained AI would be aware of it
"Without context"
one could argue there is some degree of a priori linguistics processing capability to human brains but yeah language is basically entirely context
it's the opposite
99% is unambiguously parsable
worst case the ai will tell you it's ambiguous, and will list all parses
well, not ai, just normal parser
There is alot more to understanding the world than parsing
the buffalo thing may be easier than average
i predict within the next 20 years, men can do any work a machine can do
Think about a programming language. Parsing just gives you a slightly less raw representation of what you're dealing with. Actually working with that information is... a whole different animal
let's hope
Lets hope, eh
our running time might be exponential though
Hey quick question
How do I skip a yield
IE, a given iteration results in no yield and I need to repeat the method, do i just call say 'yield theYieldingMethod()'?
def generateTokens(self, grammar:Grammar):
buffer = grammar[offset:]
tokentype, match = self.computeTokentype(buffer);
result, *results = self.computeCallbacks(tokentype, match);
if result is SUCCESS:
token = tokentype.tokenize(grammar, *result);
tokentype.advance(grammar, *result);
yield token;
if result is IGNORED:
self.advance(grammar, *result);
yield self.generateTokens(); #???????????
if result is FAILURE:
raise SyntaxError(*result);
In the event of result is IGNORED I need to skip to the next iteration. The method itself is just a generator and the tokens of course don't actually get computed until the generator is iterated
Yeeeeeeah I guess you're right
Just in theory though, I know for next time, whats the way to do this? I know its something to do with next()
there shouldn't be a way
i don't understand what you mean
I don;t understand a generator without any loops
it's yield from to switch to a different generator
A generator itself doesn't actually do anything until its iterated
So simply by iterating the generator, either in a for loop or by casting it to a another iterable type the iterator protocol will be activated
This will cause python to implicitly call 'next' on the generator until a StopIteration is raised
So a call to 'generateTokens' would return a generator object which, when I go to iterate through the tokens 'inside' the generator would repeatedly call the generator. The inside of the generator must be then use any logic it likes to yield some value or, if the iteration is complete, raise the StopIteration
I'm just not certain what to do in the event that there is nothing to yield on that particular run, but more runs are needed
Correct me if any of this is wrong, my peeps
^ Me, on some other planet, apparently
@static bluff A loop here seems appropriate, you can always break once you've reached SUCCESS or FAILURE, or continue if some result is IGNORED
Just curious though, what types are those?
They're list-like objects, 'singletons' that just get loaded up with data and returned from the various callbacks
I'm toying around with various ways of iterating each of a given tokentype's callbacks and actuating based on the result
Path("") is the path to your cwd
But that isn't suitable to this channel's topic, this channel is for a discussion about the Python language itself
The SUCCESS etc. constants?
Hello everyone,
I cut and copy my code from my local jupyter notebook to google colab. But I don't have the same result. The results are deeply different. How is it possible?
Hey guys will someone explain me why this error occurs it occur suddenly moment before everything is works fine the error is --> ValueError at /user/feed
The 'picture' attribute has no file associated with it.
You showed show full stack trace and it's actual error @crude bronze
And also to do this in general
it depends on what exactly you did... but this is off-topic for this channel. if it's data science related, ask in #data-science-and-ml . also if you can provide a minimal reproducible example (https://stackoverflow.com/help/minimal-reproducible-example) it would help a lot.
whispers: Are we all just not going to talk about the semicolons?
they know their programming style is weird, not much else to say... i wish they wouldn't, it's completely unnecessary visual clutter!
there are zero multi-line statements in python that semicolons are useful for disambiguating
\n and ; are i think pretty close to equivalent in python source code
not that they're useful beyond the comandline/repl but semicolons can only separate simple statements
for that reason alone it is nice that they exist
I love the Python REPL
I use it in the REPL most commonly if I want to import a bunch of things at once, and be able to rerun the whole command later if I restart the interpreter
from pathlib import Path; import datetime; from my_lib import foo
and then later in a new session I can just type f and press up and find the command
Has any one worked in SMTPlib ?
I use a script behind the PYTHONSTARTUP env var for the most common imports
yes. but it's not the question about the python itself, so it's not really suitable for this channel
Is it possible that you executed the cells in a particular order in the original notebook?
Also/alternatively, Colab should let you upload the ipynb file rather than cut and paste.
I'm also working hard on staying more within normal standards, and I'm getting a lot better. But the semicolons are completely harmless.
does anyone know how Python makes hashes change between sessions?
i know why they do it, but i am writing my own Hashtable in C and i was curious how they did it.
hash("foo")
879039098592663196
and in another session:
hash("foo")
1759215547526481629
or if someone could link to the portion of the source code where it happens, that would be nice too.
๐ค
Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide.
ah, so they simply offset the hash.
i did not know that attack was called tar-pitting, though.
thank you, Darr.
it's called salting the hash, fyi
another defense against these attacks is simply to use binary search tree based maps, instead
because they have 100% guaranteed log(N) behavior, so they're simply impervious to such attacks
okay so this might seem a bit silly, because i might not be getting it fully, but it is not as simple as adding a random number from a seed onto the hash, is it?
I think that's the basic idea
import random
original_hash + random.randint(1, 100)
probably, the random number should be prime
at least, if you are taking a simple modulus to get the hash in the range you need
yeah
i was never fully sure about the correlation between hashing and prime numbers.
djb2 uses 33 i think.
unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
33 for what?
eh nevermind.
Ah, yeah so it uses 33 asa multiplier in this particular algorithm for accumulating the hash over a string
yuh.
33 of course is not prime ๐ so I'm not sure what that does for the theory
oh lmao sorry
it just really depends on the details of the algorithm, prime properties are often nice but aren't strictly necessary I'm sure in many cases
prime hash table sizes is also a fairly powerful, but expensive, safeguard against mediocre hash functions
i will keep note of that, thanks lol.
but if i had a really really simplistic hash function, like:
def hashString(value) -> int:
total_hash = 0
for char in value:
total_hash += ord(char)
return total_hash
foo and gon will produce a hash of 324
so obviously they collide.
i am not really seeing how adding the same(?) number onto it would stop anything?
well, the hash function itself needs to be really good.
oh. it doesn't "stop" anything. it's just essentially making your code effectively crawl to O(n) lookups/operations where you were expecting O(1). leading to DOS.
oh yeah i know, i was referring to how adding the same(?) random number would stop that attack.
oh. adding the same number won't stop the attack.
if it is not the same random number, i am kind of curious how you would find the hash during lookups.
since you would need a way to generate the same number that you used when adding it, no?
ah
er.. actually let me rephrase that. within a session, the number will be fixed
but between sessions, the number can/should change
i have no idea what im looking at ๐ฆ
er.. actually let me rephrase that. within a session, the number will be fixed
๐ค
A demonstration of the compiler automatically performing the same optimization without obfuscating the source code
ah nice
sometimes they are still necessary because the behavior isn't the same under overflow
I thought that might be the case here which is why I didn't comment
I'm actually not sure if that rewrite is valid
in this case I think it is but you can get caught surprisingly easily in these things in C/C++
in Russian this is called 'byte-fucking', not sure how to say it in English
mfw
bit twiddling
Reverse BitMagic oh wait
ive always heard it called bit fiddling
you win according to google ๐
actually, I'm not sure
there's far more google hits for bit fiddling, but perhaps that's for another reason
the official wikipedia article is called "bit manipulation" and it mentions bit twiddling (but not fiddling) as another name for it
So, idk, google vs wikipedia ๐
what the fuck
this should be the english name too
well, it's not literally about making love to machine words
so I thought it would be something different
english speakers use "fucking" this same way
not always, but it's a very general-purpose word
Hey guys, anyone tried using GDB in order to attach a Python process and debug it?
Without restarting the process
If anyone cares, I gave up on this and wrote my own parser ๐
A lexer and a parser
I suppose you could just call them a parser
A scannerless one, yeah
This would be pretty cool but I don't think it's supported
As far as I know, you could use GDB to look into cpython itself, but python programs I think need a debugger attached when the program starts
pdb works
I've seriously considered just running everything under a debugger before, but apparently there's non-trivial overhead to doing that
But you can't attach it to an existing python process, can you?
I thought you could, but I'm probably wrong
If it's possible, I'd love to know how
https://blog.jetbrains.com/pycharm/2015/02/feature-spotlight-python-debugger-and-attach-to-process/ it does appear to be possible
I had no idea, this is a great feature
๐
hi
hi
I was wondering why python's backend service needs to wrap with a WSGI http server like Gunicorn or uWSGI, but in comparison to javascript Node's express.js its self contained?
there's a gdb helper script that lets you do things like print Python variables and get a Python traceback while stepping through CPython code - which sort of gives you a hybrid behavior. It's still not really a Python debugger, but it makes debugging Python code at the CPython level possible.
separating the http server from the web server framework lets them be developed independently. Since both of those things are relatively specialized, being able to improve one - or build a replacement for one - without affecting the other is helpful.
and it makes it easy for them to be developed by different people at different times.
I hope this is the right channel... Feel free to redirect me somewhere more appropriate!
I have a three-part question:
- is there really no way to get the string representation of the parameters of an alias in pdb? It seems to me that should be defined here: https://github.com/python/cpython/blob/8b93f0e696d3fc60fd311c13d5238da73a35e3b3/Lib/pdb.py#L390-L413 but I don't see it.
- if no to 1.; would such a feature be added if proposed, or is that something that was left out on purpose?
- if yes to 2.; where could I propose such a feature? Can I just open a PR on GitHub?
Hello
Can i creat an os with python?
hi
I don't think so
because it is very high level language
ok
yeah,
No, Python is far too high level to viably do systems programming. To make an operating system, you should use a lower level language.
like assembly, c, c++
i use c++
You could probably do it in C++, but not Python
Guys
How could i get the url of an image
am i in the wrong channel?
probably actually
What?Assembly!!! It is too hard to make an os with Assembly ๐
A part of an OS will always be in assembly, but you can use other languages for most things other than that little bit
Why would they integrate assembly? Doesn't C compile to better and faster assembly than handwritten assembly?
it's not really so much about better, it's more just about if you need to access some kind of special register or address or something like that to do the low level implementation of say a device driver
then maybe there doesn't happen to be an intrinsic available for it. but even then, you'd typically just have some inline assembly in your C/C++ (I think)
So, curious. I understand that assembly code can/will exist for making an OS. But do we explicitly need to write something in assembly to make an OS, or we can be fine with writing purely C.
Ie. If I explicitly differentiate between us, the programmer, having to write assembly, vs just having assembly code that's generated, can an OS be properly written with just the latter.
i've been under the impression that any kernel will need a small amount of processor-specific assembly to get the rest of the kernel loaded up properly
does any 1 here has uploaded python package in pypi
why would len return the wrong length for a file (e.g., len(f.read())? i'm ultimately trying to figure out why i can't correctly compute the SHA1 hash for a handful of files.
# dist/Data/Scripts/Source/SKI_ConfigBase.psc
# - on disk : 756250ff67daae4860bb2de8d7b0ded7d3d2f830
# - on remote : 9a44ef1c95b1c1266c4a35783ac58702a94d0a73
# - size on disk : 13232 (len)
# - size on disk : 13734 (tell)
# - size on remote : 13734 (git)
Maybe it sees "\r\n" and changes them to "\n"
I think it wonโt do that if you open it in read-binary mode, "rb"
if i open the file in rb, none of the hashes are computed correctly, but len does return the right length then.
could you try decoding the content from bytes to normal text before you compute the hashes
trying that now and... it works!
๐
Maybe you could change the newline argument so it doesnโt translate them
in case anyone's interested in computing git hashes right:
- BAD:
mode='r',hashlib.sha1(('blob %s\x00%s' % (length, data)).encode('utf-8')) - GOOD:
mode='rb',hashlib.sha1(('blob %s\x00%s' % (length, data.decode())).encode())
open(file_name, mode="r", newline="")
can't do that: ValueError: binary mode doesn't take a newline argument
Sorry I meant r mode
still weird that len returns the wrong length for some files in r mode
ah, it sounds like calling len on a unicode string returns the char length of the encoded string, not its size in bytes.
Oh yeah. That makes more sense
you can just use a bytestring for that
hashlib.sha1(b"blob %s\x00%s" % (length, data)) should work if i remember correctly
just curious why the ancient % for string formatting
TIL, .format and interpolation are both not supported for bytes
that's... pretty weird
@paper echo :x: Your eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 1, in <module>
003 | AttributeError: 'bytes' object has no attribute 'format'
I read about some of the reasoning, the argument being that things could format to stuff that isn't allowed to go into bytes, but I don't see why that is less applicable to %
so, most likely % formattingin bytes is also broken, it just dates back to when all these str/bytes things in python were a mess generally
so I don't really love the python attitude here where they don't want to add f string or format syntax for bytes, but there's already a broken thing in it....
so now there's two reasons you still have to worry about the antique % string formatting in python, even though its two major revisions behind the currently recommended approach ๐
%s for data that's a byte string is weird anyway
so i think i agree
b = b'blob ' + str(length).encode() + b'\x00' + data
hashlib.sha1(b)
You could just add the encode essentially as part of the byte f-string literal format
b = fb'blob{length}\x00{data}
yeah i was going to say I sort of didn't expect it to do anything sane
yeah. i see no reason why that can't be the same in format and f-string
!e ```python
print( b'%s' % '\U0001F62E\U0000200D\U0001F4A8'.encode('utf-8') )
it's just awful that allegedly we're on f-strings, we still have to look at slightly older code with .format
@paper echo :white_check_mark: Your eval job has completed with return code 0.
b'\xf0\x9f\x98\xae\xe2\x80\x8d\xf0\x9f\x92\xa8'
and look at logging code that use %, and now apparently bytes code with % too
there are good reasons to use .format sometimes if you need to template things
and %s can be useful if you need to template things with {s, which is not common but i've had to do it before (for lemonbar, if you know what that is)
yeah, that is unfortunate as well
it's not like any of these three things are simple either, may as well have one 100% solution instead of 3 80% solutions, or what not
lest we forget that python also already has "template strings" that nobody uses!
I think there was a pep for delayed f-string formatting or something
Hm - because format() and f-string are implemented in terms of __format__, which returns a text string, not a binary string
you would need a whole different protocol for formatting as bytes instead of string, rather than being able to just reuse __format__
yeah, the pep mentioned this
seem sstraightforward enough to just add bformat though
provide it for whatever set of types makes sense, by default
def __bformat__(self):
return format(self).encode()
as the default implementation
I would probably just not even have a default implementation for most types
just let it error
provide reasonable implementations for a handful of things, like other bytes, integers, etc
i think that'd lead to a lot of boilerplate, no?
well people can define it on their own classes if they want
fair
what would it be on an int? would it be the string of 1s and 0s? the ascii integer characters?
there are many reasonable ways to go, but if people are still using % on bytes then that's really not a good situation
yeah, that is a good question too
!e ```python
print(b'%s' % 5)
@paper echo :x: Your eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 1, in <module>
003 | TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'int'
huh
makes sense in a way
__bformat__ could potentially be an interesting alternative to struct
depending on what you need to do
Ooh, integers are an excellent example of something you shouldn't have a default for. Should it be formatted 4 bytes wide? 8 bytes wide? big endian? Little endian?
it's a fair point
at its most conservative you could only have it work for bytes objects
The problem with that idea, really, is that it's not up to the type to decide how it's serialized, it's up to whoever is doing the serialization. And so making a __brformat__ that could be used as an alternative to struct means that you now have classes - data types - in your program that are specific to a particular binary representation of that object.
That seems really unwise.
well, it's unwise unless that's their specific purpose
which struct make already cover in python just fine, I'm not familiar with it
ctypes.BigEndianStructure is probably the better example of serving that purpose
yeah, I've never done such things in python
it's very standard practice in C++ though
to have structs declared with pragma packed, for example, whose only real purpose is to help you access bytes in a more type safe, less error prone way
(and more ergonomic)
in the BigEndianStructure case, having the weird base class indicates that the thing is meant for serialization, not for general purpose use. In the __bformat__ case, the only indication that this thing isn't meant for use as a value semantic data type representing your domain objects is a method buried somewhere in the middle of its implementation
unrelated question; most elegant way to have a function that receives a list of bools, and returns True only if the list ends with a contiguous chain of True
if there's no True, it returns False. If there's any False after the first True, it returns False. Else, return True. About to code up a for loop but maybe there's a nice itertool solution... seems like a tricky one though
itertools.groupby would work, but it'll be harder for your readers to understand than the for loop would.
and most likely slower, honestly.
if it is a list, the fastest way to do it is probably ```py
return lst == sorted(lst) and lst and lst[-1]
is it me or do this
and this mean different things
isnt first one just lst[-1]?
the second one imposes the additional constraint that if there are any False values they must also be contiguous.
no, that fails on an empty list
actually
that's what the lst and is handling in my suggestion.
is a length 0 chain also a chain?
there is no chain of trues in an empty list?
I would say it isnโt because
if it were then a length 0 chain of False would also be present @ the end
does it end with a chain of true? no, there is no chain hence it doesnt. thats the logic i follow. up to debate
but i get your point
right. And so an empty list needs to return False
Regardless - our interpretation of the first one is either return True or return lst and lst[-1] depending on whether or not an empty list is considered to end with a contiguous chain of True
is it ok to crosspost? i have posted a question to #data-science-and-ml, but this channel is more active, can i post here too?
It's not on topic for this channel, so please don't.
This channel is (supposed to be) about the Python language itself - its implementations, the grammar, and so forth.
granted the last question about the chain of trues wasn't really on topic either, but... ยฏ_(ใ)_/ยฏ
I'm pretty sure they mean the same thing
actually, nm
yes, the first description was innaccurate
I should have phrased it "all the Trues are in a contiguous chain in the end, and there is at least one of them" or something like that
something involving sorting and creating a new list is definitely only going to be fastest for very small lists, if ever
I'm betting it'll be faster up to a thousand or so - especially if the common case is that it's valid, and returning False is the uncommon case.
I guess it's easy enough to bench
even that aside, I find it rather awful, sorting a list of booleans
depending on the C-ism that True is 1 and False 0, or at least, that True > False
there's also ```py
num_trues = sum(lst)
assert num_trues and not sum(lst[:num_trues])
that's a nice one actually
!e ```py
print(issubclass(bool, int))
print(True + 0)
@raven ridge :white_check_mark: Your eval job has completed with return code 0.
001 | True
002 | 1
I know that that's the case
seems odd to call it a C-ism, then. It's how the Python language behaves.
I mean it was probably influenced by C in this regard
though I don't have concrete evidence
wait, why do you need to sort the list? isn't it only supposed to return True if the list as given ends with a contiguous chain of True?
is my impression as well
given that python is implemented in C, and C established false as 0, and true as 1, for decades before python existed
@silk pawn yes, you don't need to sort it. I don't love that solution either.
But sorting will move all the trues to the end
so x == sorted(x) basically checks if all the trues are already at the back
if this offends your sensibilities as a programmer then it's a sign that all is well ๐
and sorted is implemented entirely in C, which means that it's usually going to be faster than a Python for loop - at least for small N. And timsort is O(n) for already sorted input, I believe.
but if you just want to check if there are at least two Trues in a row at the end, wouldn't sorting give you a false impression?
if there was an is_sorted function then it would be a slightly less awful hack
but if you just want to check if there are at least two Trues in a row at the end, wouldn't sorting give you a false impression?
the condition isn't just that, I didn't explain it well the very first time
all of the Trues have to be at the end
ohhhhh ok
hmm, that criterion isn't actually perfect for what I need it for either
but I guess that can't be helped
what is it then
@raven ridge you're right, sorting is still faster at N = 1000. Bless python's heart.
@gleaming rover I'm parsing csv files with pandas. Very rarely, a csv file is cut off at the end
when this happens, the last row is a partial row. Surprisingly (to me), pandas just silently accepts this, and it will just fill in all the last entries with NA
at first I was dropping the last row when there was any NA but then I saw that there were actually real NA's in my data (rare)
so now I drop the row only if the last N are all NA, but now I realize that's not very good either
ah.
i was hoping to avoid looking at the file separately to count commas on the last line
but I'm not sure if there's any half decent way
def f(the_list):
list_iter = iter(the_list)
for b in list_iter:
if b:
break
else:
return False
return all(list_iter)```
it's pretty surprising that pandas doesn't evenhave an option to be strict about this
so you want to omit the last row if itโs partial?
yeah
there will never be any guaranteed way to tell if it's truly partial though, or if the last N columns just happen to be NA
it's actually ultra annoying
because, when you parse a csv pandas tries to infer the dtypes intelligently
surprisingly, it's hard to force pandas to do that on an in-memory dataframe
So, potentially I'd be reading the same data here three times
so the file ends with a line that has too few commas on it?
yeah
but only rarely, and I don't want to pessimize the common case
i mean honestly it's not that critical, but I'm annoyed by how clunky everything is
I'm guessing your normal NA values are encoded the same way (i.e. with empty strings)?
well, not really, with normal NA values you'd still have the right number of commas
yes, I mean the values themselves
e.g. 1,,2,3,4
yeah
then no, AFAIK
i guess the only thing to do, to not completely lose your mind about the idea of hitting disk three times is to open it as a buffer
and then you can parse in pandas, and then only if you have potential issues,read the file as text, see if the last line is partial
and then reparse excluding that line
parse with pandas, open as a text file and count last row commas, parse a second time (potentially)
why do you need to parse a second time
because pandas is intelligent about figuring dtypes and such when I parse
lets say I parse, and I have an integer column, default pandas behavior iirc will be to "upgrade" the integer column to float so that it can hold NA (which is just nan)
when I ignore the last row and re-parse, there won't be any NA's (potentially) and now it parses to integer as it should
ah, okay
that's a backward compatibility thing
yeah, there's no easy solution in this case
maybe in the future when IntegerArray becomes standard
you could make it part of data cleaning shrugs but that's ugly too
Should i attempt to implement algebraic effects with generators y/n
It would basically be an extension of curio, now that I think about it
It's a shame you can't re-enter a running function from an exception
it's not backwards compatibility
i just want pandas to infer the dtypes as intelligently as possible
i thought there would be a way to ask pandas to redo it once its' in memory but I haven't found a way
i've tried infer_dtypes and some other things
well, you could if you design the function to allow that - by yielding exceptions rather than allowing them to be raised...
@halcyon trail I think if you turn on the "python" engine it can do this
yeah, the python engine has some extra capabilities but it's so slow, it's literally just faster to read it 2-3 times with the C engine ๐
Yeah, that's what I would have to do, and the default handler for an exception could raise and crash, or you could at least have the option to install such a handler
Use data.table::fread in R, then re-save the file in a sensible format like parquet
I'm joking less than you might think I am
anything involving R is more of a joke than you think ๐
just not a very good one
fread is ludicrously fast and powerful
However I would encourage you to not try to read the source code
I can't recall fread specifically but my overall experience with R was definitely not that things were fast
it is
because IntegerArray can handle nullable integers
which, in this case, is what your data would be
It's in the third party data.table package
if i want to capture the traceback that would be shown in an exception were not caught, is it better to use sys.exc_info()[2], or sys.last_traceback or something else?
also - what is BaseException used for? even StopIteration inherits from Exception
err, I don't know what you mean by backwards compatibility. I don't have any backwards compatibility issues
i guess you mean that pandas behaves that way for backwards compat reasons
SystemExit, GeneratorExit, and KeyboardInterrupt. https://docs.python.org/3/library/exceptions.html?highlight=baseexception#exception-hierarchy
yep, generally catching those is a mistake.
yes
i wonder why GeneratorExit inherits from BaseException while StopIteration inherits from Exception - backward compatibility?
there is a new array type backing the series that handles nullable ints, which is what you want
but by default read_csv uses the older type which represents nullable ints as floats
i also didn't realize that async def coroutines supported send and throw!
and they still use StopIteration internally!
i had no idea
Yes, I think it's agreed to have been a mistake for StopIteration to inherit from Exception, but nothing to be done about it now.
really? there's code out there that depends on this?
I'm absolutely sure that there exists code that has an except Exception and expects that it will catch StopIteration, yeah.
indeed.
I don't understand what you meant with this question
i want to be able to re-raise an exception later using the original traceback, as with raise exc.with_traceback(tb)
try:
return foo()
except Exception as exc:
return (exc, sys.exc_info()[2])
is that right? or is there a more "modern" way to do this?
(not that you'd write code like this)
the traceback is attached to the exception
i thought that explicitly wasn't the case
!e ```python
def f():
raise ValueError()
def g():
try:
return f()
except ValueError as e:
return e
def h():
e = g()
raise e
h()
@paper echo :x: Your eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 14, in <module>
003 | File "<string>", line 12, in h
004 | File "<string>", line 6, in g
005 | File "<string>", line 2, in f
006 | ValueError
!e ```python
import sys
def f():
raise ValueError()
def g():
try:
return f()
except ValueError as e:
return e, sys.exc_info()[2]
def h():
e, tb = g()
raise e.with_traceback(tb)
h()
@paper echo :x: Your eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 16, in <module>
003 | File "<string>", line 14, in h
004 | File "<string>", line 8, in g
005 | File "<string>", line 4, in f
006 | ValueError
huh, it's the same. is there a way i can pretend like it was raised from g as it was originally, and not h?
you could construct a traceback for the frame in g and return that, I guess
i guess it's not that bad as-is, because you still get the original stack at the bottom
i won't worry about it for now
https://docs.python.org/3/library/exceptions.html#BaseException so somewhere in every BaseException instance, there's a traceback stored?
!e ```python
import sys
def f():
raise ValueError()
def g():
try:
return f()
except ValueError as e:
try:
raise e from None
except Exception as e:
return e
print( g() )
@paper echo :warning: Your eval job has completed with return code 0.
[No output]
!e ```py
import sys
def f():
raise ValueError()
def g():
try:
return f()
except ValueError as e:
try:
raise e from None
except Exception as e:
return e
def h():
e = g()
raise e
h()
@raven ridge :x: Your eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 19, in <module>
003 | File "<string>", line 17, in h
004 | File "<string>", line 11, in g
005 | File "<string>", line 8, in g
006 | File "<string>", line 4, in f
007 | ValueError
huh, that still has f in it - I wasn't expecting that...
in this case, ideally i'd want to preserve f and pretend that g and h don't exist
as if you'd just called f()
but it's not important since you still see f at the bottom
you can edit traceback objects, sorta, but, uh - ยฏ_(ใ)_/ยฏ
Lib/importlib/_bootstrap.py lines 232 to 233
# Frame stripping magic ###############################################
def _call_with_frames_removed(f, *args, **kwds):```
with frames removed? like, stack frames?
yes
it's for eliding frames from the middle of a stack
so that they don't show up in the traceback.
@gleaming rover the problem is that nullable ints don't have hardware support
And in general isn't broadly supported in software either since you'd need to agree on a convention, do explicit checks, etc
i imagine they're backed by arrow arrays
and/or they're backed by an array of "real" ints and a bitstring indicating which ones are missing
right, you get fast ints or nullable ints but not both
right. with floating point of course you get both
fwiw NaN is not really the same as "null"
but in pandas world they are equivalent
R got this right
right. I am curious now suddenly though if the exact NaN value is preserved or not
i have no idea how nan is actually defined, i know there are some rules e.g. it can never be equal to any other float
need to try this at some point but too lazy to do it now
it's not equal to anythin gelse, including itself
which is a common way to check for nan, since that's the only float with that property
NaN is a whole range of bit patterns, not just one value, and further subclassed into quiet and signalling nan
so e.g. where I work, we have a sepcific bit pattern that we use (in C++, not python) as our own "NA" value
I'm curious now if regular C functions on floating point preserve the exact bit pattern, or just give you some other NaN
implementation defined behavior, I think.
implementations are allowed to use the bit pattern to convey extra information about, e.g., the cause of the NaN. In practice, I don't think that's ever really used.
there are some real platforms that use signalling NaN's, but I don't think I've ever seen distinct quiet NaNs be used for anything.
it's definitely implementation defined
even merely copying around a NaN, it's not required to maintain the bit pattern
although that would be crazy
but I'm curious in practical terms whether it happens
this computation is happening directly at the processor level so it's really not so much about C's guarantees but the architecture's guarantees
looks like the exact bit pattern is preserved
#include <cstdint>
#include <cstring>
#include <iostream>
constexpr int64_t R_NA_REAL = 0x7FF80000000007A2L;
int main() {
double x;
std::memcpy(&x, &R_NA_REAL, sizeof(x));
auto z = x * 5.0;
auto result = std::memcmp(&z, &x, sizeof(x));
std::cerr << result;
}
prints 0
ye they are not in general but we were talking about pandas specifically in the context of typing, right
well I just meant that for that reason they are not necessarily super desirable, in any case
so I don't know if it's just backwards compat. I've seen discussions of this before and iirc the pandas devs weren't really excited by the "masking" kinds of solutions
but I can't say I've kept up with it
hi, if I write a function in a large file whats an easy way that I could run it through the command line without the rest of the file?
ye there are definitely performance concerns
all I'm saying is that in this specific case without the concern for backward compatibility read_csv should by default infer a nullable int for that column
is there a way to specify "any type except Foo" with annotations?
nope
what do you want to do?
i want "any instance of (a subclass of) Exception, but not an instance of UnhandledEffect", and UnhandledEffect is a subclass of Exception
no, that is not possible
i assume there is some good type theoretic reason why not
don't know what's the reason, but it certainly breaks LSP
Maybe you can ask on typing-sig
no, i think breaking LSP is bad and should probably not be allowed at the type level
maybe there's a way around the liskov violation
is there a guide on using asgi
however, there is also this case:
SpecialThing = ...
_Other = TypeVar('_Other')
@overload
def f(x: SpecialThing) -> int:
...
def f(x: _Other) -> _Other:
...
yep, that's possible ๐
is mypy smart enough to know that _Other should not be a SpecialThing?
(or any other type checker)
yeah, overloads are checked from top to bottom
is overload an builtin decorator
perfect, thanks
@maiden pier no, from typing import overload
so it comes with python but it's not "builtin" as such
oh
I think 'builtin' is a bit of a misnomer
the asgi framework docs are here https://asgi.readthedocs.io/en/latest/introduction.html
the entire standard library is built in to Python, it's just that some of it is in the global scope, and some is not
asgi is a specification; usually you use something like uvicorn/hypercorn the same way you would have used a wsgi server like gunicorn in the past
...and usually you don't write your own framework from scratch, you use something already available, like Starlette ๐
yeah, "built in" means at least 3 different things:
- The names available in every module, because they're in
builtins - The things that are distributed with the Python interpreter as part of the standard library
- A function that was imported from a compiled module
https://github.com/gwerbin/pyeffect well... i banged it together
i have no idea if this is a good idea
it seems kind of like a good idea
you can theoretically implement an event loop as an effect handler
you can probably also implement delimited continuations but im not smart enough for that
i also have no idea if these are actually "algebraic", but they definitely kind of superficially look like it
i think you could even implement something akin to structured concurrency (as in libdill) this way:
def f():
with (yield ConcurrentContext()) as ctx:
thread1 = yield ctx.Spawn(foo, 1, opt='a')
thread2 = yield ctx.Spawn(bar, 1, opt='a')
not sure how cancellation would work, need to ponder that one
what's the purpose of it?
allowing only a subset of side effects in a function?
is this like returns?
i'm not sure what returns does, but it's more general than that
it's my attempt to implement "algebraic effects" in python, which is something i've seen in a few research languages (including multicore ocaml). from a layman's perspective without much understanding of the very abstract math that went into the idea, it's basically a generalized exception handler, with "nice" theoretical properties
it's not quite the same as "real" algebraic effects in that the effect handler (in this implemenation) doesn't get access to the program's continuation, it can only return a value to be passed to the continuation, and the program is resumed exactly once
im not sure if it's possible to "fork" a generator, capturing its current state
if it is, then i can generalize it to more or less work like real continuations
evidently you can't
Consider ```py
def gen(socket):
while chunk := socket.recv(CHUNK_SIZE):
yield chunk
yep, that's on you if you want to fork that
Yeah, you certainly can't "fork" arbitrary generators, at least.
according to that post, it seems like it could be possible a the c api level
so i guess i'm stuck with "tail resumptive" effect handlers (i got this term from https://www.microsoft.com/en-us/research/uploads/prod/2020/07/evidently.pdf) which is usually what you want anyway
on the bright side it makes the code easier and i don't need to try to pass around some kind of continuation object
Maybe there could be a way to rewrite
def foo():
bar = yield baz(1, 2)
fizz = yield buzz(bar, 3)
if fizz > bar:
quack = yield aaa(fizz, bar)
duck = yield meow(quack)
else:
duck = "moo"
return duck
``` as ```py
def foo():
def __step_0(bar):
def __step_1(fizz):
if fizz > bar:
def __step_2(quack):
return meow(quack)
return aaa(fizz, bar).then(__step_2)
else:
return pure("moo")
return buzz(bar, 3).then(__step_1)
return baz(1, 2).then(__step_0)
``` at the AST level?
but I can imagine how horrible it would be
Let's just add a new keyword that would produce a function like that ๐
suspended, or sus for short
sus indeed ๐
suspend def foo():
bar = sus baz(1, 2)
fizz = sus buzz(bar, 3)
if fizz > bar:
quack = sus aaa(fizz, bar)
duck = sus meow(quack)
else:
duck = "moo"
return duck
computations in the Sus monad
they made more sense before I read the sandwich analogy
isn't this semantically identical to a generator, except it's forkable?
well, yeah, sort of
if I understand correctly what you mean
for example, this allows making more general monad comprehensions, e.g. on lists
monad comprehensions?
monad comprehensions are a somewhat unrelated thing
btw this article gives some practical use cases for these general effect handlers https://arxiv.org/pdf/1312.1399.pdf. e.g. you can set timeouts, implement rollbacks and restarts, et al
basically, make
suspend def foo(xs: Collection[int]):
x = sus x
y = sus y
z = sus z
if x*x + y*y == z*z and x < y:
return [(x, y, z)]
else:
return []
``` the same as ```py
suspend def foo(xs: Collection[int]):
return xs.flat_map(lambda x:
xs.flat_map(lambda y:
xs.flat_map(lambda z:
[(x, y, z)] if x*x + y*y == z*z and x < y else [])))
(just for fun, of course, it's a pretty silly addition to Python)
You can essentially make a forkable generator, as long as you go out of your way to build it in a very specific way, I think. You build it as a class with a __iter__ that returns self and a __next__ that maintains a state machine using instance attributes, and your fork clones the class instance
At least, I think so. ๐
generators can be getting future states from anywhere
Maybe really __aiter__ and __anext__ instead
u can't fork that without intentionally allowing it
Yes, I already mentioned that it's not possible in general
yes, but from the PoV of the generator
it is forkable
that's how most ppl feel lol
You can design a pure, forkable generator, I think. You can't fork arbitrary generators.
hm, now that is an interesting idea
Just like you can build a seekable stream, but can't seek in arbitrary streams
i think making a forkable generator might be some fun
the effect itself could contain some indication that forking is supported
A class with __aiter__ and __anext__ that stores its state in instance variables and acts as a state machine is the way to implement a coroutine using the C API
if an iterator only loads the next, say, 100 items at a time, would that be "semi-lazy" or there another word for it?
while yall talkin about generators
buffered iterator
ah makes sense
and I buffer the last 20 or so items of each buffer so loading the next buffer can be done while iterating through those
so it's a buffered buffered iterator...
Double buffering is a real term that means something different
ah
It's when you have two buffers that you continually swap between. Think graphics: you paint your new state into a buffer incrementally, and then swap it in to be displayed atomically. When you do that swap, you're exchanging it with the previous buffer, which you can now update as the basis for your next paint...
i see
import multiprocessing
from PIL import Image, UnidentifiedImageError
from requests import get
def _get_img(image_url):
"""Get image from url and return as Image"""
try:
return Image.open(get(image_url, stream=True).raw)
except UnidentifiedImageError:
return Image.new('RGB', (20, 20), '#ff0000')
def multi_gen(url_gen, chunksize=1, itersize=100, buffersize=20):
"""Image generator"""
buffer = []
for urls in url_gen:
n = 0
with multiprocessing.Pool() as pool:
images = pool.imap(_get_img, urls, chunksize=chunksize)
pool.close()
for image in buffer:
yield image
for image in images:
n += 1
if n < itersize - buffersize:
yield image
else:
pool.join()
buffer = images
break
this is the buffered buffered generator
it allows me to rapidly click through a list of 1,000 cat picture urls without a hitch
(url_gen is a list of lists of 100 urls)
this is probably a really dumb question but how can a stack implementation store more then 1 variable since you can only access the top one
for example:
a = 1
b = 4
the stack would look something like
4 (top)
1
so how would you access the a variable (value of 1) without deleting the value of b from stack???
if registers come into this, then i know there are a limited number of registers whereas you can define many more variables
after they're assigned, the values aren't stored on the stack. They're loaded in from the names and pushed back as they're accessed
what do you mean? can you show an example
so theres a "storage system" thats separate from the stack?
a = 1 pushes 1 on the stack and stores the top value behind the name in the corresponding namespace. Then when it's accessed somewhere the value that name references is pushed to the stack
!e ```
import dis
dis.dis("""a=1
print(a)""")```
@peak spoke :white_check_mark: Your eval job has completed with return code 0.
001 | 1 0 LOAD_CONST 0 (1)
002 | 2 STORE_NAME 0 (a)
003 |
004 | 2 4 LOAD_NAME 1 (print)
005 | 6 LOAD_NAME 0 (a)
006 | 8 CALL_FUNCTION 1
007 | 10 POP_TOP
008 | 12 LOAD_CONST 1 (None)
009 | 14 RETURN_VALUE
so LOAD_CONST will push it onto the stack and then STORE_NAME allows for the name/identifier to be store elsewhere?
then LOAD_NAME pushes it onto the stack?
Yes, for example on a module's namespace. You can see in the dis output that after the first line, a and print have to be loaded in again (and in the correct order)
is index 0 the top of the stack or bottom?
You can see what all the bytecode instructions do here https://docs.python.org/3/library/dis.html#opcode-NOP
assuming the bottom
You work with the top of the stack
which is index 0 then?
?
index 0 of what? The stack?
yeah
i may be misunderstanding the stack tho lol
it looks to be in that bytecode example that a is pushed onto the stack in the first bit, then in the second the "print" function thingy is pushed onto stack along with a and then print is called with the value of a, have i got that right?
If the stack is an array you append to then 0 would be the bottom value, but it'd all depend on how it's implemented. You could use a whole another data structure or just go about it the other way and (inefficiently) add items to the start of the array
Pretty much, you can ignore the last 3 instructions as they're irrelevant here. 1 is loaded in and stored; on the next line print is loaded along with with a's value after it and CALL_FUNCTION consumes the amount of items from the stack that was passed in as an argument to it + the function
most stacks used in this way also support deeper access, so you can for example access the value below the top value, which means you can quite often remove some locals entirely, python afaik never does this though
if its an array implementation cant you access all of the stack?
sometimes the stack is in hardware at least partially, so there could be some extra limitations
Question for the community at large about a 'best practice'.
I have a mongodb with a few somewhat complex collections. The document structure might be something like:
"current_split": 1,
"current_season": 9,
"seasons": [{
"season_number": 1,
"ranked_splits": [{
"split_number": 1,
"end_date": "2021-06-15",
"start_date": "2021-05-04"
}],
"battlepass_info": {
"start_date": "2021-05-04",
"max_battlepass": 110,
"goal_battlepass": 100,
"end_date": "2021-08-03"
}
I would like to access the data so that it's 'structured' in the code and known to PyCharm (for purposes of autocomplete for example:
if mypass.seasons[0].battlepass.start_date == "2021-05-03":
...
would autocomplete the properties (such as seasons and start_date)
The only way I know how to do that is to either use a namedtuple or a @dataclass to replicate the structure in its entirety. Are there any other ways? Am I missing something obvious? Are there tools that can do that for me based on the json? Seems kind of tedious.
@solid ermine I mean for me, I would definitely handly this by writing a dataclass explicitly
if you have a lot of different json file schemas you need to do this for up front, I'm sure there are tools that can help you generate the dataclasses the first time, but obviously if you always generate the dataclass dynamically based on the incoming json then it defeats the purpose
so you'd perhaps generate it once, look over it to verify everything seems proper, and then commit it, and maintain it like normal source code moving forward
if you use pydantic then it offers built in to-from json conversion for dataclasses (or its own dataclass-like thing)
Thanks for the response. Funny, I had been heading down the same path (auto generate dataclasses one time.. I found this: https://github.com/russbiggs/json2dataclass )
I'll take a peek at pydantic
yeah, i think generating them is fine as long as it's a one time thing, and you go over it. Although, honestly, unless you have many dozens of different schemas, it really just doesn't take that long to do
I almost find it a bit enjoyable, kind of calming ๐
Also there are things that you may know that a generator like this cannot tell, for example some fields may be present but you know from experience they are actually optional
so you may want to annotate the type as Optional. Or, it may be optional in the json but you want a default value in the dataclass
Good point
you may want to use inheritance to share fields between different comments because you know in practice a subset of their fields have to be identical
*different documents
attrs + desert
i don't personally like pydantic, it does too much at once for me. but it's a good library.
Thanks for attrs that looks really nice...
I like desert, I saw something similar with https://pypi.org/project/dict-to-dataclass/
but desert looks to be a bit more robust with better support
why those over dataclasses?
more features (e.g. supports slots, supports per-field validators), the underlying mechanism is straightforward and i understand it, and i like the api a bit better
dataclasses would work fine
I like the data validation aspect of attrs for sure.
there are also other desert equivalents like https://pypi.org/project/mashumaro and https://pypi.org/project/dataclass-factory
i do wish attrs had multi-field validation though
this would be on my attrs wishlist:
from math import sqrt
from typing import ClassVar
import attr
@attr.s()
class Point:
max_dist: ClassVar[float] = 50.0
x: float = attr.ib()
y: float = attr.ib()
size: float = attr.ib()
@attr.multivalidator('x', 'y')
def validate_xy(self, x, y):
x_attr, x_val = x
y_attr, y_val = y
distance = math.sqrt((x_val)**2 + (y_val)**2)
if distance > self.max_dist:
raise ValueError(f'Point is too far from the origin. Distance: {distance}')
as it stands currently, you'd have to write that logic in __attrs_post_init__
Oh nice. Yeah. I would use that for start and end dates for example. (Make sure start $lte end )
@paper echo the thing is that if you know from day one that you want to convert to-from json
then the fact that pydantic does that out of the box, and attrs does not, is a fairly substantial point
right, that's what desert does
ah, ok
2 libraries vs one that does both
yeah, I have not looked at desert so cannot comment
I do like attrs slightly better than pydantic where their features intersect
although I do feel like attrs is still a bit awkward to use for properties, if you have some kind of situation with a property that isn't simply conversion/validator
but tbh I've never needed that in practice
multi validators are interesting
oh, another nice thing with pydantic is that by default it verifies that the runtime type matches the static type
i would have expected attrs to have an option to just turn this on automatically
but it seems like you have to ask for it individually for every field, which can be annoying
they are unfortunately hypothetical ๐
What does the CPU have to do with the time? What time are you referring to?
class Foo(Generic[A]):
...
does A need to be a TypeVar?
for example, i want to restrict Foo[A] so that A must be a subclass of AnyIO
do i need to write
_AnyIO = TypeVar('_AnyIO', bound=typing.IO)
class Foo(Generic[_AnyIO]):
...
or can i write
class Foo(Generic[typing.IO]):
...
What are you trying to do? There's the built-in datetime library
@paper echo I mean if you don't have a typevar, then you aren't generic, right?
If you do class Foo(Generic[typing.IO]), then what class does Foo actually inherit from?
Sorry, i shouldn't say what class does Foo inherit from
what class is Foo concerned with, for lack of a better way to put it? When you write the first version, you have an actual type AnyIO, it is bounded above by typing.IO but it will be something more specific than that (in general)
you can use that type in signatures and so on
In the second example, there is no type variable, so there's nothing you can use in type signatures and such except for typing.IO itself. which means that your class is not generic really.
It does need to be a typevar yeah, because otherwise the generic[] is pointless, there's no variables to specialise later.
On the other hand doing class Foo(Mapping[str, bool]): is allowed.
Sure, Foo is not generic though
I mean, I wouldn't be surprised if python thinks it is in some sense ๐ but by a sane notion of what's generic, it isn't
Pyright/Pylance flags it as an error
(which is... correct)
yeah, makes perfect sense
I meant though in the example with class Foo(Mapping[str, bool])
Foo inherits Mapping, mapping inherits generic
it's not really "inheritance" bu tto python it possibly still is inheritance
so possibly, issubclass(Foo, Generic) may be true ๐คทโโ๏ธ
there's an is_generic function in typing.inspect, not sure if that got merged to typing or not
and how that would work
yeah, it thinks its generic
from typing import Mapping
import typing_inspect
class Foo(Mapping[int, int]):
pass
print(typing_inspect.is_generic_type(Foo))
prints true
๐ฆ
Python's static type system being a crime against humanity, nothing to see here
lol
lol it is a generic type though
well idk what inheriting from a generic type does
but that's all it does other than being a type obj
Foo isn't generic, it cannot be parametrized
pretty sure there's an issue for this already and the guy basically said "yeah, it's not really generic, but in python, it is"
Hey guys, this isn't a python question directly, but I am building an application in python, and I need to figure out a better solution of storing user credentials.
The application makes API requests, so the need to access the information per request is essential.
Any links to great resources would be much appreciated.
That would be more on topic in #software-architecture. This channel is intended for discussions about how the Python language itself functions, and future improvements it may have, and the different Python interpreters
thanks! just copied it over
yeah that's what I meant "this class has Generic in its (effective) MRO"
so I guess make a branch which follows your definition of generic
probably not going to make a branch... just define my own generic function that follows his suggestion of testing for args, if I need to
I'm having an interesting discussion about lexing (and parsers in general) over in another server, and I wanted to get you guys' opinions
The general rule, I'm told, is that a hard coded solution to handling your language's source code- lexing character by character and handling every case with hand written code, for example, runs faster and gives you more fine-tuned control
Whereas a generic, generator type approach lets you represent things in an easier to write and read abstracted form. This could make it easier to make modifications later, and easier to grapple with bigger concepts
What are you guys' thoughts/preferences?
Parser generators win by a fairly large margin in terms of extensibility, yeah, and you can always roll out your own generator that's specific to your language
So from a theoretical perspective, or when speed or absolute control are required, hand built is preferred; but the generative approach is often more pragmatic
And of course, the standard truisms 'do what feels natural', 'it depends', 'use the tool for the job' and 'its both, really' all apply
Even then, hand-built parsers are bound to use existing formalisms much like parser generators because they just work
Parsing is fascinating
I was thinking, maybe one day just for giggles, I'll write an AI to do it
I just got into it a few days back, I'm actually writing a PEG parsing library to study
๐ฎ
I mean, I'm a few weeks into the journey myself
I'd love to knock heads as we go if you're up for it
I'm implementing my own horrifying mutation of PEG (though its a lot less horrifying than I had thought it would be when I started)
Why is there no built-in function composition function in the stdlib (functools?)
after all, we have partial
i would love something for function composition
would be cool if you could'd pipe them
well, at least something like functools.compose would be nice, I guess
!e
class pipe:
def __init__(self, first, *rest):
self.first = first
self.rest = rest
def __call__(self, *args, **kwargs):
acc = self.first(*args, **kwargs)
for function in self.rest:
acc = function(acc)
return acc
def __repr__(self):
functions = [self.first, *self.rest]
return f"pipe({', '.join(map(repr, functions))})"
def double(x):
return x * 2
def add_one(x):
return x + 1
f = pipe(double, add_one)
print(f(5))
@grave jolt :white_check_mark: Your eval job has completed with return code 0.
11
(Why a class and not a function? Because pickling, of course)
you could always use the cursed lambdas, lambda x: add_one(double(x))
that wouldn't pickle :^)
use dill :^)
never heard of it
but yes, from the pypi page it seems like it does [de]serialize lambdas.
But you can't use dill instead of pickling for talking between processes, can you?
I mean, you can technically send byte strings back and forth, and then deserialize them, but that's no fun
send the source code instead and have the receiving process eval :p
pathos is basically a reimplementation of multiprocessing that uses dill
Does writing a rust module for python improve the python performance?
Depends on what the module is doing, and whether the overhead of communication is compensated for adequately or not. If I had to guess, it should usually be faster.
I wrote a yaml like file parser in python but looking to port into a rust library
I want to use the rust module across languages
I haven't tried compiling the python yet.
My advise is look at PyO3 and maturin, Makes life very simple ๐
I still want function composition with the at operator
How about with |
!e ```py
class PipeMe:
def init(self, value):
self.value = value
def rshift(self, f):
return PipeMe(f(self.value))
def _(f, *a, **kw):
return lambda x: f(x, *a, **kw)
x = (
PipeMe("a_qwe")
_ (str.upper)
_ (str.split, "_")
_ (list.getitem, 1)
)
print(x.value)
@lament sinew :white_check_mark: Your eval job has completed with return code 0.
QWE
that's actually mapping, not piping, but alas
Anybody knows a video or tutorial about how to create a simulation for image processing? With photon amount and exposure time and all of that?
I've seen talk about them trying to make decorating on assignments possible
I personally wish they would overload @
just seems natural
Has Python's regex syntax changed since 2.7?
it's gotten some new features, but not lost any backwards compatibility, I believe
like what? named groups?
(?aiLmsux-imsx:...) - the ability to set flags inline, at least
Changed in version 3.8: The
'\N{name}'escape sequence has been added. As in string literals, it expands to the named Unicode character (e.g.'\N{EM DASH}').
there's probably some others.
The biggest difference are the code examples that uses regex
My reason for asking immediately leaves the intended scope of this channel.
But if it is what I'm thinking about, it works perfectly fine with 3.x :D
the docs for 2.7 are still on python.org - you can just look through both the 2 and 3 docs and see if everything you need is in both
Is it git diff time
Guys I ahve a question, lets say I've been using the same way to code something. As an example I always use if statements, in what way would you recommend to me for me to have a better way than using if else statement. One of the ways I can think off is studying more. Of course, but do you have any other way in mind ? Like not using if else statement and using another way
To help my code be better
You could do that in 2.7
depends on the context
if statements are supposed to be used often
but it really depends on the specific usecase whether there's a better method
idk how much of a boost it'd give you in py but branchless programming can help with performance at low levels iirc
if a == 2:
return a
elif b == 2:
return b
# vs
return a * a == 2 + b * b == 2```
in python I wouldn't count on it, and in C/C++ compilers will often do optimizations like that for you if the semantics are truly identical (although sometimes there are subtle differences)
python isn't a low level language
and that might actually be slower
because of the dunder calls
if you really care about perf, you should be using another lang
also you can simplify that to
if a == 2 or b == 2:
return 2
and it doesn't matter since the c interpreter still has a bunch of branches anyways
plus those 2 statements aren't even equivalent - the 2nd one is wrong if both a and b are 2
2nd one is slower
In [1]: def c(a,b):
...: if a == 2:
...: return a
...: elif b == 2:
...: return b
...:
In [2]: def d(a,b):
...: return a * a == 2 + b * b == 2
...:
In [3]: %timeit c(3,2)
298 ns ยฑ 17 ns per loop (mean ยฑ std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit d(3,2)
382 ns ยฑ 26.7 ns per loop (mean ยฑ std. dev. of 7 runs, 1000000 loops each)
Ahh alright then thanks man for answering
Is there a standard for what type annotations to use for *args and **kwargs?
if I had
def func(*args: A): ...
is it just inferred that you can pass an arbitrary number of A instances?
Yes
Yes, in Python, when you typehint *args/**kwargs, you just give a type for the values
Likewise, hinting **kwargs: A assumes dict[str, A]
yeah
ParamSpec just can't come soon enough 
what is
I think many problems would be solved if args and kwargs typehints were like in TypeScript
like, kwargs that are common to many functions
!pep 612
Thanks, I'll be looking into this!
.
:incoming_envelope: :ok_hand: applied mute to @woeful moon until 2021-06-06 14:58 (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
hi
adding a *_ doesn't help much imo and is considerably slower, just needs proper spacing; or using next instead if it's an iterator
@peak spoke this is surprising
Filling in the throwaway does take a lot of time so it's not exactly something that should be used for that. For one element iterables the "pistol" should be the fastest and is fairly readable when formatted properly, otherwise next(a)/next(iter(a)) is the best choice when assuming an iterator/iterable afaik
or [name] = it for the one elem
dont mind me, i confused micro and nano seconds so the first seemed fastest
lol
Oh yes that'd be very surprising when event an empty statement takes about 7ns to "run" for me through timeit
yep pistol is a slight bit faster for single element
what does
...
``` means in python
It's an ellipsis, and doesn't mean anything on its own. Numpy uses it in array slicing
ohk
It's a valid expression and is sometimes used as a placeholder, though
It's used in stub files, for example
is it useful in something?
You can use it for some special handling like the aforementioned array slicing in numpy. Otherwise it's just an object that you can reference with the ellipsis syntax which may make sense in a certain context
if you don't know why you want it, then you don't want it ๐
k
it's very little used
