#internals-and-peps
1 messages · Page 110 of 1
You're drawing a correlation between static analysis and static typing that I don't see - beyond that obviously a statically typed language makes static analysis of types trivial
I think you can statically figure out every expression in python as long as the inputs aren't doing certain things but it'd be very complex
because they are bolted on after the fact, and they are heuristic
static languages don't have metaclasses. That doesn't imply that metaclasses are the hardest thing to statically analyze.
You most certainly cannot.
with open("some.fifo") as fifo:
eval(fifo.readline())
in a mathematical sense, no
since a FIFO is read-once, you can't even peek in the file to see what that would do.
the question was asking about static analysis in the context of optimization; to optimize based on static analysis you have to be 100% sure its correct. If you have 100% certainty of something based on static analysis, then effectively at that point, it could be made part of the type system, since that's what types are.
@halcyon trail the 100% is why i mentioned getattr. I think you said getattr was fine as long as the data met a certain criterion. You can't assume that criterion.
If you have a static analyzer for C, sure, it's bolted on, so you still have a separate type system of C itself, and the conclusions of the static analyzing bounds checker, or what not. Effectively, if the latter had 100% certainty, it would be like having a second type system.
@spark magnet I mean I'm talking about python features, as they are typically used in day to day usage.
If you have 100% certainty of something based on static analysis, then effectively at that point, it could be made part of the type system, since that's what types are.
feel like this is not valid
hm but I'm not sure I need to think about it
i would say getattr usually doesn't use literal string names.
A lot of usages of getattr could in practice be done, no problem, with a compile time reflection system
for me the most common usage of getattr is reflecting over things usually
all of the members of a dataclass, for example
that's one use, sure.
you said i wasn't allowed to choose a hard case. I think you aren't allowed to choose an easy case 🙂
@gleaming rover other things that are tough to statically analyze: import hooks, .pth files, # coding: comments
Well, there's a distribution of hard and easy, is the point
with metaclasses it's all hard
with decorators, very often hard, not always
# coding: is hell
but really cool
a lot of decorators are ok because they only change implementation details of say a callable
in that case the situation vis-a-vis types is simple
Is it?
class MyMetaclass(type):
pass
class A(metaclass=MyMetaclass):
pass
but as soon as the decorator starts changing the API, it's a mess
.... I mean in real world use cases of metaclasses
hm
so you're picking hard cases.
No, I'm talking about what's typical in real world code
does it matter though
as long as there's a possibility of a hard case
you need to be prepared to deal with that
Right - which means that it isn't metaclasses themselves that make static analysis difficult, it's things that are done by the metaclass.
anyway I feel like
well, I dunno, it depends I guess what angle you are looking at it from
yes, this
that's what I was thinking
What are things that metaclasses can do that make static analysis harder?
when you asked "most dynamic", I thought about the features that on average do the most dynamic things
but if you could do only what you would normally be able to do in a language with a stronger type system and less reflection
I don't think metaclasses would really be problematic
metaclasses, again, involve executing completely arbitrary code, just to understand what the type looks like
did I mention that you can change the class of an instance in Python?
so they would be quite problematic
yes
but I'm saying
I guess it really all depends on how you want to compare. getattr means you'll need to know the value of a string at compile time. In theory, that can also involve executing arbitrary code.
So metaclasses and getattr are equally dynamic, in the black and white sense
!e ```py
class A:
pass
class B:
pass
a = A()
print(isinstance(a, A), isinstance(a, B))
a.class = B
print(isinstance(a, A), isinstance(a, B))
@raven ridge :white_check_mark: Your eval job has completed with return code 0.
001 | True False
002 | False True
In practice computing strings at compile time can often be done to a great extent, if they don't literally depend on information given at runtime
with the metaclass though you are still running arbitrary code. at least, that is how I see it.
not really - metaclasses are only equally dynamic as getattr if they do dynamic things like setattr. Otherwise, they're less dynamic, as in my trivial example of a do-nothing metaclass.
they're not less dynamic, because like I said, you need to execute the code in the metaclass in order to understand the type, and the type is something you want to understand statically
why do you need to execute code in the metaclass in order to understand the type, instead of statically analyzing it?
The answer needs to be that the metaclass does something that's resistant to static analysis, and that's the thing that we're looking for - right?
If you want to do that then you can also just bury the answer all the way at the bottom
the most dynamic feature of python is that everything is a dictionary, and you can modify those dictionaries freely
it's not really getattr, it's what getattr does, etc
keep pushing it down
yes, I agree with that.
except getattr is the lowest level for __slots__ classes, which don't have a __dict__
but I definitely agree that direct manipulation of __dict__ also makes static analysis very tough.
isn't it also true that things which are difficult for static analysis are also kinda difficult for humans to understand/reason about?
(in general)
like, I remember programming in C for a little bit
monad transformers 🥴 vs human emotions 👼
don't say the M-word plz 
hm. There's a correlation, definitely, but I don't think that's a rule.
try:
from unittest import mock
except ImportError:
import mock
my_mock = mock.MagicMock()
is reasonably dynamic, but not difficult for humans to reason about.
there's two different modules named mock with basically the same interface, which can be used interchangeably.
mock is a backport of unittest.mock to older interpreters - and it's still actively maintained, so you can actually import it in newer interpreters to get newer versions of mock than the stdlib shipped with.
it's more like unittest.mock is a periodic fork of mock, heh
yeah, not sure how static analysis tools will cope with that
I think I'm damaged by static typing.
I'm somehow unable to enjoy dynamic typing
I just built a library for work that, based on a config file or an environment variable, does essentially
if use_new_stuff:
from .new_submodule import Thing1, Thing2
else:
from .old_submodule import Thing1, Thing2
__all__ = ["Thing1", "Thing2"]
mypy is not happy.
whereas people have no trouble with it, because I've guaranteed that those things are interface compatible, even if they're not strictly speaking the same type.
are there any?
at this point
!d enum.Enum
class enum.Enum```
Base class for creating enumerated constants. See section [Functional API](https://docs.python.org/3/library/enum.html#functional-api) for an alternate construction syntax.
yeah - metaclasses are always for metaprogramming, really
Yeah, you wouldn't write your own metaclass if you're making a web API or something. Unless you really care about job security.
@fallen slate uses a metaclass for getting info from config.yml iirc
the only reason to use a metaclass is that you want to do something that can't be done easily with just regular types.
yeah 🤔
the closest I've gotten to a real use case for metaclasses is using __init_subclass__
and there's fewer of those cases now than ever, thanks to things like __init.... yeah.
and __class_getitem__
well, metaclasses aren't the only solution to those things, other languages have come up with their own ways of creating schemas or enums
my favorite use case of metaclasses is when ppl say everything is an object in JAVA and C# and I can say "no, classes themselves aren't instantiated objects in those languages"
ah I thought in C# types pretended to be classes
not very familiar with it though
anyone know what gc.get_referents does?
!d gc.get_referents
gc.get_referents(*objs)```
Return a list of objects directly referred to by any of the arguments. The referents returned are those objects visited by the arguments’ C-level [`tp_traverse`](https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_traverse "PyTypeObject.tp_traverse") methods (if any), and may not be all objects actually directly reachable. [`tp_traverse`](https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_traverse "PyTypeObject.tp_traverse") methods are supported only by objects that support garbage collection, and are only required to visit objects that may be involved in a cycle. So, for example, if an integer is directly reachable from an argument, that integer object may or may not appear in the result list.
Raises an [auditing event](https://docs.python.org/3/library/sys.html#auditing) `gc.get_referents` with argument `objs`.
basically, gets children of an object
i still don't understand referred to
but yeah metaclasses let me badger ppl who claim everything is an object in their language which is pretty good on its own
If an object a refers to object b, object b cannot be garbage collected before a is garbage collected
ok
now for some crazy stuff
Basically, a has b as its attribute, you could say.
!e ```py
import gc
gc.get_referents(int.dict)[0]['uwu'] = lambda s: print('uwu')
(5).uwu()
@valid rose :white_check_mark: Your eval job has completed with return code 0.
uwu
#esoteric-python lol
!e
import gc
print(gc.get_referents(int.__dict__))
@grave jolt :white_check_mark: Your eval job has completed with return code 0.
[{'__repr__': <slot wrapper '__repr__' of 'int' objects>, '__hash__': <slot wrapper '__hash__' of 'int' objects>, '__getattribute__': <slot wrapper '__getattribute__' of 'int' objects>, '__lt__': <slot wrapper '__lt__' of 'int' objects>, '__le__': <slot wrapper '__le__' of 'int' objects>, '__eq__': <slot wrapper '__eq__' of 'int' objects>, '__ne__': <slot wrapper '__ne__' of 'int' objects>, '__gt__': <slot wrapper '__gt__' of 'int' objects>, '__ge__': <slot wrapper '__ge__' of 'int' objects>, '__add__': <slot wrapper '__add__' of 'int' objects>, '__radd__': <slot wrapper '__radd__' of 'int' objects>, '__sub__': <slot wrapper '__sub__' of 'int' objects>, '__rsub__': <slot wrapper '__rsub__' of 'int' objects>, '__mul__': <slot wrapper '__mul__' of 'int' objects>, '__rmul__': <slot wrapper '__rmul__' of 'int' objects>, '__mod__': <slot wrapper '__mod__' of 'int' objects>, '__rmod__': <slot wrapper '__rmod__' of 'int' objects>, '__divmod__': <slot wrapper '__divmod__' of 'int' objects>, '_
... (truncated - too long)
Full output: https://paste.pythondiscord.com/qotozigiyo.txt?noredirect
hmm, so what is a slot wrapper
ah, i c
a slot wrapper wraps a function defined in C
so by modifiying this dict, we can add new methods heh?
not exactly
then why did the uwu example work?
int.__dict__ is a mappingproxy. It's a read-only mapping, and it refers to a dictionary (i.e. has it as an attribute)
yeah
gc.get_referents(int.__dict__) has that dict as the first element
but this time its mutable?
yes, that dict is mutable
!e ```py
import gc
del gc.get_referents(int.dict)[0]['repr']
print(repr(5))
I am surprised that works actually without ctypes.pythonapi.PyType_Modified
that is pretty interesting
@valid rose :white_check_mark: Your eval job has completed with return code 0.
5
why doesnt it delete
you can't mess with dunders by editing the dict alone
interesting
because theya re defined in slots of the struct in C
so, if i define __slots__ in my own classes? can people mess with my dunders
no that's different
only for c dunders eh?
can ctypes access libc?
yes
wait a sec, can i then use malloc and free in python?
technically, yes
hmm...
>>> from ctypes.util import find_library
>>> from ctypes import CDLL
>>> libc = CDLL(find_library('libc'))
>>> ret = libc.printf(b'%i\n', 1)
1
>>> ```
can, yes! Should, no.
ctypes is like writing an extension module, but worse and harder to maintain.
ctypes should really only be used to integrate with code that doesnt have a python interface
everything ctypes can do, Cython can do better.
ctypes doesnt require compiling tho
yes, and in exchange it gives you no compile time type safety. And it's not necessarily portable across machines.
i still prefer Cython, but for quick prototyping with a c lib i tend to use ctypes
CFFI has a mode where it doesn't require pre-compilation, too
I'd use that before ctypes, too
ctypes is stdlib
yes, but so are all sorts of other terrible modules you should never use in real world code
fair enough
woah
urllib.request comes to mind first, but there are plenty of other things in the stdlib that are substantially harder to use, less safe, and more error prone than the equivalent third party lib
So the other day one of you guys mentioned that having a reference to a class inside its own definition could "lead to the seeping out of the uninitialized class"
I guess I can sorta understand that- but what I want to know is why that would be untenable
setting that aside, metaclasses make it impossible, so it's kind of a moot point
Metaclasses, as they currently function
at least if we're talking about Python, and not a hypothetical Python-like language without metaclasses
right - metaclasses that take a namespace and turn it into a class.
with metaclasses as they exist today, you need to fill in a namespace first, before the class ever exists, because the namespace gets passed to the thing that makes the class.
Right
I'm not trying to suggest that referencing a class in its own definition is a good idea, by the way. I'm just curious
setting that aside, in a hypothetical language without Python-like metaclasses, if the class existed before it was fully populated, you'd need to define semantics for what would happen if someone interacted with it in that intermediate state - what happens if you construct it? destroy it? call methods on it? Perform isinstance checks?
what happens if an exception occurs part way through defining the class, and so this class that you started to define never actually gets defined, despite something having obtained a reference to it?
maybe just make the definition recurse lol
or maybe you just punt and say that it's all undefined behavior to touch it before it's finished being built - but in that case, what's the point of exposing it?
I think "obtained a reference to it" is an important thing for me to take note of. In theory, if the class instantiation fails you just return an undefined or else throw and error that may or may not get caught, yada
But if you store a reference to the uninstantiated class somewhere outside itself and then it fails, what then, right?
before a metaclass is fully instantiated and you refer to the instance... treat the reference as a new class constructor and run it with any new args n kwargs? It's dumb but that is what springs to mind
and the control flow is what you'd expect from recursion
it makes little sense but it makes the most sense as far as I can tell
note that this case isn't fundamentally different from what happens if you access the class in the middle of defining it. Like what this pseudocode would do:
class C:
__class__().say_hello()
def say_hello(self):
...
even if the type existed, in that case __init__ hadn't yet been defined (nor even say_hello, and yet you're creating an instance of that type and calling a method on it.
yeah there's no way to instantiate the metaclass instance if it itself isn't instantiated
Well, I'm sure there's some way to do it
recursive definition
But its probably not a good idea
I think the reason I had originally brought it up was because I was discussing how one might implement privacy in a language which allows new methods to be attached to a class after the fact
there most certainly isn't. The metaclass can return an entirely different class depending on the values in the namespace. You can't have the class before the namespace is passed to the metaclass, because the metaclass can do something different depending on what's in the namespace it gets.
You'd need to mark all of the methods which were defined directly inside the class as 'native', and define the method's 'owner' object as being the class
what makes those methods more important than others?
You'd want the methods originally defined within the class to have access to the class' (and its instances') private attributes, but prevent any user defined methods from being able to access them
if you can make class attributes private you can just have a different sort of attribute that's private
the question is just how to make them private
why is that what you'd want?
I'm not being entirely facetious - that would make class decorators far less useful, for instance, because they wouldn't be able to monkeypatch in new methods.
Well, I might be wrong here but, if it was as simple as just attaching a method to an object and, tada, you have access to the private attributes- well- whats the point?
likewise with @unittest.mock.patch for unit testing, etc.
there isn't one, that's why Python doesn't have access modifiers 😄
how would access to itself change that?
where do you want those private attributes to be accessed?
during construction only?
During construction, and within the scope of any methods defined directly within the class' definition space
but you also apparently want a liminal namespace for accessing "public" stuff
which also means getting rid of getattr and setattr
liminal?
it could be like javascript Symbols just inaccessible
liminal means in between
Symbols without special methods/functions for accessing them
because the contract getattr(obj, name) and setattr(obj, name, val) have no way of knowing if the caller of the function is allowed to call the function. They don't even know who the caller of the function is.
but you'd also need the in between components
the public interface
but yeah as godlygeek is gettin at, the hard part is actually building that private namespace and public interface to access it
thing.public #public attribute
thing:private #private attribute
thing::dunder #same privacy level as private, but a separate namespace, to avoid naming conflicts
Keep in mind- I'm not advocating OR disadvocating this syntax, its a work in progress
you need between public and private too
can't be altered but are accessible
liminal
Just make it accessible by a getter
so you wouldn't have getattr() and setattr() functions, right? Seems like you're agreeing with me.
the public getter has the private attribute exposed to it though
Well things brings my original concept of the problem full circle. Everything needs to have an 'owner' attribute (or even an array of them, in the case of nested classes???)
getattr would check to which owner( function/module/class ) the namespace inwhich the attribute is being requested belongs. If the owner is the class or a descendant of it all good, otherwise, fail
so u can just define a setter if the public stuff has access to the private stuff
It'd be kinda pointless, but in theory yeah
it would need to know the owner of obj.name as well as the provenance of the caller
provenance?
the identity and history, I guess
Ahh, well, yeah more or less
origins
Which doesn't entirely disagree with Python's objective nature in my opinion
Obviously it would take proper planning and a thorough understanding of the problem, but it certainly seems doable to me
have you done much unit testing? Both in Python, and in a language with access modifiers?
I can't say I have
it's the best reason why access modifiers are a terrible idea
they don't do anything useful, they just get in the programmer's way.
Disagree
You're ruining my fun
Especially when languages have an internal access modifier or similar
they make all sorts of reasonable things that programmers want to do - like "test what happens if I make a database call through my class while the database handle is in an error state" - much more difficult.
So you can have something that is private to the outside but a ailable to tests
Like giving the tests an all access pass
and they don't offer any benefits in exchange, because practically speaking, in every language with access modifiers, untrusted code is running in the same address space and can just choose to ignore the access modifiers.
That's really not true
yeah that doesn't sound right to me
what's a counterexample?
"just ignore" via complicated tricks with reflection usually
right.
And anything can be hacked
Yeah that's not the point
that's not "hacking"
Protect against Murphy, not Machiavelli
your code is running in the same process as the stuff that someone is trying to protect from your code. There's no trust barrier between the two things.
This just doesn't have any relationship to the software engineering realities of access control
I think I want a t-shirt that says this
Nobody claims it's a security measure
so we can agree on one thing that it's useless for, security.
what's a thing that it's not useless for? 😄
you would be surprised how often people claim otherwise.
that's just the name of the feature.
what's it good for?
why is "access control" a good thing? It doesn't aid security, but it does aid... ?
It dissuades people from screwing with the internals of the program
It aids prevention of people mucking around with internals of your code, encapsulation
Which, if you're designing for beginners for example, is a good thing
hm, why?
beginners have plenty of things that they're told "just don't do that", or that they don't understand.
I built a project that essentially replicates a javascript runtime within python. Pythonic control over the elements in a web page. Elements had a 'style' attribute
why is preventing people from mucking with the internals of your code desirable? It stops them from fixing bugs with monkeypatches, or from white box testing, or from printf debugging. What does it buy in exchange?
Now, you can use an underscore to denote privacy, but people are going to screw with it anyway. It's one thing for someone to try to change it immediately and get an error more or less right away, but if they change something deep in the internals of the language and then some day, maybe weeks or months down the way, start getting an error whose traceback may even have nothing to do with the modification you used
well, languages with private fields/methods usually have a way of circumventing that 🙂
I've never seen one that doesn't.
and any with any sort of C FFI immediately has a way of circumventing it.
Protecting, really protecting the internals of a project make it more resislient
hiring coders that aren't idiots is much better
or just name private attributes DO_NOT_USE_OR_YOURE_FIRED lol
const counter = () => {
let x = 0;
const increment = () => { x++; };
const getValue = () => x;
return { increment, getValue };
};
const ctr = counter()
Here you can't change the x variable from outside (apart from using increment)
For me, its not about absolute refusal of access, its about keeping the internals away from anyone who doesn't have the skills required to work with them
Yes, of course. What idiots high school students, passionate young people, and beginners of all types are. Such morons they are
that's not access modifiers, to be fair, just a closure with a language with insufficient introspection, heh
why does a project involving novices need to be resilient?
well, in Python you can change x 🙂
you're not just stopping idiots from touching your internals, you're also stopping people who know exactly what they're doing from touching your internals.
which isn't necessarily a good tradeoff.
All things being equal, why not?
I have fixed bugs in production libraries through monkeypatches. It was the right call.
You're making it significantly harder to touch internals by accident, and very explicit when you do touch internals
it'll already be impossible to refactor or read, why would just hiding internals from them prevent them from making other mistakes that make the project untenable?
So people can see it in code review and ensure it's truly necessary
You've got two otherwise equivalent languages, one with privacy, one without. If you've decided privacy is something you want, go for it. If you've decided otherwise, go with the latter
Also, monkey patching and access control are two different things
except we don't have them we just have speculation about a theoretical language and can't even decide how to implement privacy
I'm convinced that "privacy" is a way to give security through obscurity (or perhaps correctness through obscurity?), and therefore isn't valuable.
I think maybe this comes down to schools of thought 😛 I'm going to exit the debate with a smile on my face, having learned a thing or two not least of which- access modifiers, an empassioned issue
Nobody who knows anything about access control argues that it's related to security
that's absolutely not true.
Frankly anybody who brings it up is just showing their own misunderstanding
I can point you to recent threads on python-ideas with people arguing that it's related to security.
Then they don't understand it
I agree.
I'm sorry but it's that simple
I have never seen anyone argue it's for security tbh
Well that's what I said
if people can accidentally mess with the guts of something, despite widely accepted prefixed underscore syntax, and would, why are they part of the project?
it comes up surprisingly often.
The thing is that the underscore syntax + static enforcement would basically be access control
yeah true I just thought that as I typed that
In my understanding it's a way to separate the public API of a thing from its implementaiton details, and to enforce it at the language level (with an escape hatch, as always)
Just a primitive form and by convention
pycharm yells at you for accessing underscore prefixed stuff, just enforce that and you're gold
no need to edit the language
The fact is that most new statically.typed languages being created today, if they aren't say very purely functional, continue to include access control
lol u have a lexer not a rewrite of python internals
if there's an escape hatch, what good does enforcing that at the language level do? How is it better than Python's gentleman's agreement about underscore?
Why is static enforcement useful?
What I mean to say is, I (and others) build languages for fun. 'No need to' implies tedium and unpleasantness
I don't know, I'm not a language designer!
Oh, another thing that comes to mind is preventing name collisions.
YESSSSSSSSS
Which is mostly solved with __ in Python, but I haven't seen many people use it
Not the end of the world, but a nice bonus
it also breaks when two classes in the inheritance chain have the same name 
which is, granted, something I have never seen
a way to access mangled attributes set in a subclass without it being ugly and hacky might be nice
tho that is kinda not what mangling is for
a way to access mangled attributes set in a subclass
❓w ❓h❓y❓
...the only reason I've done it is because my current CS prof thinks mangled means private and makes us use them
cuz he hates python
Why would you ever need to access a private variable of a subclass?
The cases where you need to worry about name collisions are when
- You own a class
- You support other people subclassing your class
- You want to extend your class by adding a private attribute after it's been subclassed
Name mangling should be used more than it is, but that's still not super common.
In languages with access modifiers there are 'protected' variables
idk what the people new to python did
You can't access parent's private variables/methods of a parent class AFAIK in those languages
if the subclass sets the attributes, the parent class shouldn't be touching them.
because the parent class should have no knowledge of its subclasses, generally, for Liskov reasons.
thinking back, maaaybe he meant for people to make more setters and getters
without explicitly saying so
getters and setters are generally considered an anti-pattern in Python.
the tests for the assignments had methods like "setThis" and "getThis"
yeah it's gross
I tried using property and .setter but the tests required those naming conventions
and he calls instance methods class methods
_>
I really don't see the point in ever setting both a getter and a setter, unless you're doing some sort of logic with the value being passed to setter of course
That might be my ignorance talking though
yeah and you can just overload the attribute name in python
yeah. @property makes getters and setters unnecessary in Python, because you can evolve obj.attr = 42 to call a method in the future without callers needing to change their code.
so there's no need to set them for everything initially
exactly
I'm just getting credits
ppl ask if they should take this class and I say "no, it's not Python"
the reason you need setters in Java is because there's no way to start with obj.attr = ... and add validation to it later without needing all of the callers to change
yeah different languages have different standard practices for a reason
yeah. Teaching Python as though it's Java is, unfortunately, very common, though.
it's a truer OOP language than Java! Everything is an object!
Quick question. Is it too early to start using the 3.10 beta?
I am using it on a project and it's fine
Pattern matching working okay?
yeah
except my IDE hates it
in the project itself I've only used type union syntax but that works great
pattern matching will be perfect for one part though
yeah, I get the impression everyone, certainly me, is really excited for it
it makes me wish python had an option for better recursion handling
I'm not, personally. I'm not convinced that the complexity it adds to the language is worth the convenience.
it's a whole weird DSL that looks like Python without behaving like Python, and is going to be annoying to teach...
and the presence or absence of a . in determining whether something is a load or a store makes me sad, still.
I'm sure I'll wind up using it, but I'm not excited for it.
the dottedness to determine a constant is weird indeed
It's the steady march of change, either way
but you can use guards lol
I like the guard/pattern match combination
idk if it's unique
does another language have that?
I wish that instead of guards there was a "go to next case" statement, to break out of one that matched and continue matching on the next one. Though that would have the tradeoff of removing the option to evaluate cases in parallel.
So, when you compile something into a code object, you can do so within a module's namespace or simply within its own little virtual space. What a term to describe whatever thing, module or otherwise, in which the code is being compiled?
namespace
Rockin, thanks
yeah I'm glad they don't have fallthrough
that was always a werid aspect of switch case especially to me especially because it seems seldom used
maybe if match case was a module though and they introduced module-level soft keywords 🤔
I'm betting all keywords going forward will be soft
cuz the new mini language being global is weird
yeah they should be
someone in another server linked a tweet where someone was complaining about for (var of of of){} in js I'm like wtf do you want it to do the only nasty part of that is your IDE not understanding it
(all the ofs were keyword colored)
thats typescript im pretty sure 😛
oh I mean var
i mean vanilla js is still pretty bad
well, it would make an imperative alternative to guard clauses. Instead of
case int(x) if 0 <= x <= 10:
do_stuff()
case _:
other_stuff()
you could do:
case int(x):
if x < 0 or x > 10:
try next
do_stuff()
case _:
other_stuff()
js is extremely janky haha
aren't guards what other FP langs do?
I like pattern matching a lot for statically typed languages
For python, the benefits just aren't as big
maybe they will add less overhead to recursive function calls >_>
So I'm less sure on whether I like it
somehow
Also python is starting to feel really kitchen sinky these days
I'm sure I'll find places to use it, but I don't find it... exciting. I have a begrudging acceptance of it. 🙂
it's definitely not even something I think should be taught
?
?
?
isn't that the worst of both worlds? A feature adding complexity to the language that most people don't know about or use?
Not necessarily
Depends on the feature
Some features are more for library authors
Like metaclasses
But in the case of pattern matching
I agree
yeah, that's a fair point.
Yes :-)
it gives me serious regex vibes. It's its own special DSL jammed into the language, and it sort of looks like the language, but it doesn't behave like the rest of the language.
it's only hard because of a barrier to entry
int(x) after case does something entirely different from what int(x) does everywhere else.
it's not as convoluted or powerless as regex tho
lol you can even define real functions in it so it's your dict with multiline lambdas
...and weird rules haha
Well... it's like regexes except that it matches arbitrarily nested objects of arbitrary types, instead of just textual strings. It's still pretty complex - at least, it's not intuitive, and it reuses syntax that means something entirely different in the rest of the language to mean a different thing
everywhere else in the language, int(x) takes an existing x and converts it to an int. Inside a case statement, int(x) takes an existing int and stores it to x
that's at least... weird.
i still haven't come to terms with how weird walrus operator looks in comprehensions tbh 🤣
I'm behind
and you can't know how it will behave on an arbitrary type without knowing if that type defines __match_args__
tbf that's all damn dunders
pattern matching also gets kinda weird in python because python has two orthogonal type systems, the dynamic one and the static one
If you have Union[Foo, Bar] is a pattern match that checks for Foo and Bar exhaustive?
does None even have defined behavior for __lt__? Who knows, it's got it though
Ooh, and int(x) and MyClass(x) do something entirely different in pattern matching
int(x) matches an int and stores it in x.
MyClass(x) matches a MyClass whose first match arg is x
hmm what happens if MyClass is an int subclass?
you don't get the special int behavior.
I guess int(x) would work
int(x) would work, but ClassName(x) would not
you instead would need to do ClassName() as x
that sorta makes sense tho as far as the magic of builtins is concerned
only builtins really are apparently themselves not just defined to be as such
unless you implement a non built in in C or whatever
"built in" has two different meanings. It's used to describe both things in the builtins module (possibly only things that are in it by default, possibly including things that you add to it dynamically), as well as things that are defined in C extension modules
so what does deque(x) do?
tho deque is an especially hacky builtin extension type object imo
As mentioned above, for the following built-in types the handling of positional subpatterns is different: bool, bytearray, bytes, dict, float, frozenset, int, list, set, str, and tuple.
those are the only ones that are handled specially.
is it according to a new slotted dunder?
no.
no... that can't ever be changed
it would be a backwards incompatible change to the language.
ah true
deque(x) has a defined meaning in 3.10 - they can't change what it means in a later version.
well, they can change the others to be slot dunders
without a change in functionality
well, sure - but why?
Like regex?
_>
that set of 11 types will forever need to be handled specially
whether that set is hardcoded in the parser or given a special dunder that nothing else uses seems like an implementation detail.
aren't they already?
can't mess with their methods without trickery
they just pretend to be classes
the same is true of complex, and that's not included in the list
it's an arbitrary subset of the builtin types
and this is normative - future versions of PyPy, for instance, will need to special case this same set of 11 types.
at least it's just 11 then haha
it is, but... well, hm. I can't help but feel that section of the PEP didn't get enough discussion
is complex really so much less special than frozenset or bytearray? All 3 are pretty special case types...
yeah lack of complex is weird
well, unfortunately, it can never be added 
I have module A containing class A and module B containing class B. Class A uses an instance of class B as an attribute, but class B requires knowledge of class A for type annotation. Whats the solution?
XD
from __future__ import annotations
Oh, and whats this about my new calling?
(but also, don't)
make complex behave like other builtins in pattern matching in a cpython fork
if you have a circular dependency in your types, it sounds pretty fishy - that seems to indicate something is factored wrong, more likely than not.
at least they're not passing self to the instantiation of the composed class any more
...I hope
if an A has a B as an attribute, but a B has a method that returns an A, that's... suspicious. It may not always be bad, but it's a thing that's more likely to be bad than good, I think.
Wise words as always my dudes
good luck!
maybe you wanna pass a strategy for dealing with class A to class B... but rly shouldn't the concerned methods just be in class A?
it can happen in cases where there really is a circular dependency - like a tree with multiple types of nodes, where any type could contain another type, perhaps...
it's not always gonna be wrong, but it warrants a closer look.
For the record, I just moved them both in to the same module
Spacing things out into multiple modules is, in my mind, important. But I've been known to take it too far
yeah drawing that line can be tough
My issue is that my lexer is already over 700 lines of code, comments included
And its working fine for everything except for strings, which are a rabbit hole
Fstrings need their own lexer and their own regular expression, which together will eat up a few hundred lines
then u got ur b strings and r strings
I'd really like to keep everything in one module, but I think it might be time to break things apart
Oh!
And, I want to implement arrow notation, those will require in the very least a good deal of coding within the normal lexer to handle, if not their own lexer and expression
Too much?
Abstract - Base class
Anonymous - Arrow functions
Interpolated - Fstrings
Namespace - Module level (or just plain namespace in the event of a 'compile()' without being provided a module to compile in)
Stringified - normal string
My favourite new trivia piece about JS:
Object.defineProperty(
String.prototype,
'onions',
{
set: () => { console.log("I don't like onions") }
}
);
x = "foo";
x.onions = 2000;
console.log({'x.onions': x.onions});
When getting/setting properties or calling methods, primitives spawn a temporary boxed object
@static bluff https://medium.com/hackernoon/modifying-the-python-language-in-7-minutes-b94b0a99ce14 this might help
😛 With respect spoony, that article is like bringing a toothpick to a knife fight (read it before)
but have you whittled a toothpick yet?
it seems like you have a half built knife machine haha
I'm actually very pleased with my progress
yeha you seem to be chuggin along
😄
just might be useful to have a new operator actually implemented and tested
I mean- people have told me before that I need to walk before I can run, and I really really respect that position (and the people telling me that)
But I'm a trial by fire type. Its always been how I learn best
You should break things apart at boundaries between logical components, not arbitrarily based on size. If your lexer is a single logical component, there's nothing wrong with keeping it in a ten thousand line file
In principal I agree
Replied to the wrong message. Stupid mobile.
But I personally don't think its tenable to expect anyone to read through a document more than 2000 lines of code long. Theres just too much to keep track of
Fine for me sure because I wrote it- and maybe you too. But most of the people here in the advanced channel have the wits to be able to handle that. The most likely reason someone would be reading through the source would be to figure out how it works- very possibly starting from square one with no context to fall back on
I guess, I dunno. I'm trying to bring my coding style to within more standardized limits- keeping things united in their own modules for example
But my instincts are yelling at me to space things out at this point, and I'm not sure how thin or thick is 'normal'
if it's 2000 lines of code, it's 2000 lines of code. Reading through 2000 lines of code in a single file isn't necessarily worse than reading 2000 lines of code spread across 5 files.
if the boundaries are bad or arbitrary, reading the same 2000 lines of code can be much more difficult when it's spread across 5 files than all in one.
I'm not saying you shouldn't break things up into submodules, just that the criteria for where to split things should be based on the boundaries of logical components, and not on the size of the file
at least it's not spread across 5 npm packages
Really good advice
figuring out what should be a component and what shouldn't and where to draw the lines between them takes a lot of practice - you just need to read a lot of code, and see both good and bad divisions, before you can get any good at it.
as a professional programmer for many years, figuring out how to split things up so that the divisions make sense to other people, and so that things aren't too tightly coupled and don't have too many responsibilities, is still one of the parts of the job that takes the most effort for me.
but I have seen files of code that were ~40k lines where it wouldn't make any sense to divide them up - everything in them was closely related, they represented a single, large, logical component.
Any possible division would have been arbitrary, and wouldn't have aided comprehension.
Fair ^^
Well, I'm making the judgement call, at least for now, that to have multiple nearly identical (and visually confusing) components with different uses so close to each other is only going to cause confusion. And, its what my instincts are telling me 😛
What do you do Godly?
I work on the Python Infrastructure team at Bloomberg - a big news and financial analytics company. My team is responsible for the health of the Python ecosystem at Bloomberg, from maintaining patched interpreters and keeping up with CVEs to providing Python bindings for first party C++ libraries that the company already had (our backend for a long time was Fortran, then C++, and now it's switching slowly towards Python)
slowly in the sense that there's a lot of new Python code being written, but there's a huge codebase of existing C++ code, and some Fortran, still in active use
O.O
Thats amazing
I'm not normally one to gush but christ, its an honor
Thinks for keeping my job warm for me by the way 😉 I'll see you in ten years
eh, I'm just a developer. I'm a damn good developer, but there's a bunch of impressive people on my team. We've got a CPython core dev, and one of the maintainers of pip...
ur sayin my terminal will have 3.10 soon?!
I just got space invaders to run on the darn thing
XD
I saw someone tried to install doom on a home pregnancy test with a digital readout
We're already well off topic for this channel. Ping me in #career-advice if you want.
Would one of you fine folks be willing to take a look at my code? (I'm asking here, because I feel like I'm learning more talking to you guys than I've ever learned anywhere else)
Hey Guys!!! I am just trying web development in Python using flask. I just created everything, but when I try to register and login it says "CSRF tokens are missing". What should I do here?
you can share it here 🙂
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
I've been trying to conform to norms a bit better. There's nothing really impressive happening in this module, but I wanted to know if you guys noticed any bad habits (aside from the semicolons :P)
https://paste.pythondiscord.com/uqibamatug.py
Updated comments
Well, in Python nesting your classes like that does absolutely nothing, other than making you have to access it with a dotted name. It's probably easier to just put them at the top level above your main class. In Caret, you could use operator.add instead of the lambdas you're creating there - or really just directly add the three attributes, the iteration is a bit overkill. Why is the caret iterable in the first place anyway? Instead of unpacking to tuples, give it a as_tuple method, probably returning a NamedTuple.
I had originally intended to have not just addition, but all four basic operations in caret- the lambdas are just a relic from that
As for the the iteration overkill, yeah, you're probably right
hey all, I've got a python-adjacent question that could use some help on. I've got a client-server system written in python that sends data over a socket by pickling it on one side and de-pickling it on the other. we use it for sending messages up to ~500kb in size. I wanted to rewrite the server portion of it in as there were some computations that could greatly benefit from rust, but now I'm stuck on server performance - since I can't pickle the Rust side of things, I've been sending it over as a JSON byte stream, which tanks the performance: the time is dominated by converting the vector arrays on the Rust side into a JSON string, and loading on the python end. fwiw, it's largely tuples of floats that get sent over.
I'm curious if anyone has any suggestions on something to look into that can get me back to the python-python pickling / unpickling performance. I imagine something like protobuf might get me closer, but I'm really not sure. I've tried to use some of the faster JSON parsers (ujson, orjson, hyperjson), but it's still ~100x slower than the pickle-pickle side of things. This makes me think JSON is not the correct answer.
TL;DR: efficient way to send large amounts of data from a Rust app to a python client?
something like messagepack or protobuff is likely to work
or even bson
generally JSON is fast enough
like serde (serde_json is orjson for python) can process hundreds of MB/s in terms of data throughput
also
I wouldnt recommend using it but https://docs.rs/serde-pickle/0.6.2/serde_pickle/ does exist for rust 😅
API documentation for the Rust serde_pickle crate.
so technically, yes you can pickle rust
and rust can unpickle python stuff
within reason
I wonder if you could think of having unix sockets open for comm and just stream the data from one to another, also stream the response back
sorry, my bad. I lost context you have already covered
socket wise yeah, unix sockets are gonna be the fastest method of transport though
but he's saying it's not fast enough?
I dont see how that can really happen though for 500kb messages
there isn't much reason to use json as an interchange format between two programs you control, unless you actually have some reason for wanting human readability
Well i mean you get the easy of protocol compatability
you get the exact same thing with bson, or msgpack
writing or using a diffrent serializing setup can make life harder between programming language because of support
JSON is already hyper optimized for this stuff. Like I dont see how the format is the bottlekneck here
Everyone uses json for communication over the web and its very common to use jsons for simple data between two programs. It's got its advantages.
Because it's been used for communications over sockets for decades, pretty much every language has a setup for handling the format and often most serlizers and deserializes are massively optimized for handling it because of how much it's used
Primarily you're guaranteed everyone supports it
Yes but the implementations that follow it are what make it hyper optimized
the format itself isn't designed to be optimal in any performance sense, and there's only so much you can do to optimize performance of reading it
they try, yes, but it's still slow compared to a binary format
anyway, this is purely speculative? The guy said he benchmarked and the costs of sending via json are a big factor. We don't know what his timescale is.
Moving from json to bson or msgpack is very easy
But it should be fast enough for what they're dealing with, I find it hard to believe that the format is the bottleneck is considering that serde can process 400MB+ a sec per core
You don't know what they're dealing with though....
This feels like premature optimization to me to be honest. I mean, if you're sending 1gb worth of data don't use jsons I guess. But for small sizes, why should json not be used
oy
For what it's worth, in the grand scheme of things this decision won't really make or break your code regardless
in this case, they state that JSON serialization is too slow in rust. So either pick a faster JSON library or stop using JSON
Why not just actually give the person some useful advice, instead of telling he that he doesn't know what he's talking about, and that he's prematurely optimizing?
I swear, programmers
Because it is useful advise to tell them when it's premature
because you end up making sacrifices in the name of speed you dont need
You're making some pretty incredible assumptions here, that are totally unwarranted
Everything in the post indicates that he did reasonable due diligence, I have no idea why you two are still assuming that he's just wrong for wanting to move to a faster data format
recommend something like BSON has more negatives than benefits because sure, its faster to serialize and deserialize but IO is still by far the slowest thing in the equation and BSON takes up more space than JSON generally
hey, how about a memory mapped file dumped from rust and read in python using c extension? just match the structures and padding and it should fit, should work faster by eliminating the intermediate, no?
doesn't have to be a file as such. I just mentioned file for god know what reasons
it could be a stream
sending a file over a stream is non trivial
since a stream doesn't have a length
but the data likely does
except that he indicated that he does need the speed....
you literally know nothing about his domain
his timescale
the time is dominated by converting the vector arrays on the Rust side
seems like json is indeed the issue here
then the person should actually chime in and contribute some more to the discussion @naive apex
true
But in the world of performance, IO is the first and often biggest bottlekneck
the bigger your encoding format the slower you IO generally and the lower performance
In rust's case Recommending another format isnt going to change much due to the serde backbone
Reading up, for what it's worth I'm simply taking part in the conversation from where I joined in, my comments were not intended for the op
You're saying these things like they're profound.... Trying other formats is definitely worthwhile, we have no idea what his bottleneck is
its just tuples of float. a stream of fixed size byte buffer followed by known delimiter isn't that hard to put together. specially when you control both the systems
yes, generally, JSON should be fine
but in this case, it isn't, as, well, the person measured it
I only mentioned bson because it's exceptionally easy to try, if you're already using json
Yes but BSON is made for storage rather than transfer
certain, protobuf, or capn proto, etc, are better solutions, they're just a little more work to setup
It serializes to be a much bigger size than JSON because of metadata
I do find it odd, since I was sending 4G packed datasets with JSON over http realtime
it's not much bigger. It depends what you are storing.
If you are storing large arrays of floats, for example, it can be smaller
but well, I don't know what the exact specifics here are
Generally no, but sure we can put a pin in that for now 🙂
lets
amusingly, I just remembered that we moved some data coefficients for models from json to bson, I actually have an email from a coworker with sizes of some of these files:
Json: 375M
BSON: 134M
MSGPACK: 91M
Are you sure the JSON wasnt pretty formatted
I don't think so. but even so, it wouldn't explain a 3x in size. It's not very nested.
bearing in mind that with BSON you add a considerable amount of metadata per field
every field has a the type and key and the data itself
- you have the metadata for each document which is is another 2 bytes per doc for the size + the delimiters
So technically speaking i dont think it's ever possible to make BSON be smaller than JSON without compression
....
you're literally looking at a contradictory data point, first of all.
second of all, your comment makes me genuinely thing that you don't realize that a float is smaller in binary than in text
Im aware of that though you have to have a considerably sized integer or float to make it lesser than the text representation accounting for metadata
it doesn't need to be "considerably sized" in the case of a float, it's just storing the full precision
anyway, facts are facts

Floats can be really huge
well,
JSON doesn't store floats, does it?
as in, it doesn't specify the precision or anything like that
they're just stored as their text representation, doesnt have any concept of floats no
so 3.1 is just stored as 3 bytes
not the full 16 8* bytes
well, if it's a f64 / double
you mean 8 bytes then?
my point was that if you encode it as a double, you might lose information
I do this way too often
(e.g. 3.1 cannot be represented as a finite binary fraction)
well, usually you compute it as a double to start with
it depends where your numbers are coming from
i shouldn't say usually I suppose
but anyhow you can see a trivial example where json is larger than bson in two minutes:
d = {"hello": [random.random() for i in range(1000000)]}
for me this results in a 20 meg json file, and a 16 meg bson file
messagepack is only 8.6 megs though
If you share that many data through json, you are probably doing something wrong
This is really inefficient due to the ascii serialisation
ideally yes, sometimes you don't have control though.
laughs in a 3-gigabyte SQL query
also you can wang that though a compression algo and life is much nicer, although you can do that with any binary format really
but if you do have control over both ends, and you care about perf, then yeah json is just a prety bad choice
again sorta depends really
IO is still your slowest thing
sure if you have a big array of floats like that BSON will be slower (although others will be even smaller) which will mean less data to transfer but if its a bunch of strings etc.. theres a good chance JSON will slower
I mean I just stuck that 20 meg JSON file it produced into gzip and got some 8.8MB output
gzipping is expensive
you're taking "IO is all that matters" as an article of faith at this point
yeah it is a pretty expensive compression
unzipping is expensive, parsing strings into floats is actually also quite expensive
actually that was Zlib not gzip sorry
At any rate, all these approaches in the end are much slower than approaches with schemas
yes
If i were doing something like this I'd definitely be using something more like protobuf from day 1
it also, most likely, saves you having to write some kind of reasonable dataclass/struct to hold the data on either side. when you are sending data between multiple languages protobuf-like approaches are hard to beat
20269613 Bytes in
9578680 Bytes Out from zlib
0.20867420000000003 s
9578692 Bytes Out from gzip
0.22591870000000014 s
i've actually now just out of curiosity been trying to create somethign that will be smaller in json than messagepack, and have not been successful
messagepack must be fairly clever
In [46]: def get_random_string(length):
...: # choose from all lowercase letter
...: letters = string.ascii_lowercase
...: return ''.join(random.choice(letters) for i in range(length))
...:
...:
In [47]: d = {get_random_string(10): get_random_string(10) for i in range(1000)}
this still creates a 28K json file and 22K msgpack file
http://indiegamr.com/cut-your-data-exchange-traffic-by-up-to-50-with-one-line-of-code-msgpack-vs-json/ has a good tear down on the difference it produces
{} ? 🙂
@grave jolt touche 🙂 didn't try it
What do you guys think about a built in 'regex' object, designed for building complex regular expressions pattern by pattern, and pretty printing them (plus some other functionality I guess)
Isn't that just a parser combinator library?
I mean, someone who is half decent with regular expressions would probably have an easier time just writing it, but having a sort of 'toolkit' where you can go command by command, providing only a minimal amount of actual pattern, might be helpful for some
Oh probably. I don't really know what that is, but it sounds about right
It would be nice to have some options for parsing in the stdlib, though I would prefer something that can also support irregular expressions
well, there's lark
Lark often feels like overkill. Sometimes, you just need to parse sexprs without needing 50+ lines
well, yes
there's also CSON and i think some other binary json-like thing
has no idea what you guys are talking about 0.0
that person should really add more context to the problem he’s solving with rust. I have more questions lol
Like, why not just use a c extension to process it within Python and forget the whole json business all together
it sounds like their application already has a client-server architecture @warm wadi
however there might be a more specific format that makes sense for their usage
e.g. if it's a 500 kb array, maybe they should use the numpy data format
They have it Python server Python client and now to do compute heavy stuff they are adding rust to it
who's to say that they're on the same machine, or even the same local network?
maybe there are other good reasons why they need or want client-server?
but like salt said, it's already client-server, we should assume it's client-server for a reason
But that’s not the problem or point at all. Read that post again. They are happy with pickle performance of Python on both sides. Now only on server side they want to use rust to improve computation performance
So it’d become Python client and rust server
All I’m curious about is why can’t they use a c extension on Python server to improve computation performance. Then they don’t have to fight with encoding decoding stuff
i read the post....
Then what part have I understood wrong?
a single python program is also simpler than a python client and python server
but that's not what they have. so there's probably a reason for that, right?
i understand
@halcyon trail they are saying, instead of rewriting the server in rust, they could use a c extension to do the computation within the python server
or i suppose rust with cffi (?)
or maybe hpy https://hpyproject.org/ because apparently theres a lot of overhead with cffi https://blog.ian.stapletoncordas.co/2018/01/making-python-faster-with-rust-and-cffi-or-not.html
What is HPy?
HPy provides a new API for extending Python in C. In other words, you use
#include instead of #include .
What are the advantages of HPy?
Zero overhead on CPython: ex
okay, I misunderstood dave's post then, not the original post
i did too
To me personally, that actually seems worse, but maybe it's a matter of taste
it definitely does fix the "serialization is now really slow" problem
(You can address me as he/him, thanks 🙂 )
Oh, I just said "dave" there because it was getting confusing
not because of pronoun unsure-ity
that said, pickle is potentially very dangerous anyway - what if the client and server have different python versions? or any of the million other things that can go wrong with unpickling
people tend to overuse pronouns a lot in technical discussions, one of the first things I remember my boss drilling into me. The number of times that misunderstanding of "it" has cost 10 minutes...
Yeah, I mean, the "serialization is very slow" problem can be fixed in many ways, there's nothing that special about pickle
I actually think protobuff is a pretty nice and obvious solution here
In languages with grammatical gender, there's 2 to 3 times less confusion 🙂
I'm only half joking
it's a 2 for 1 value really, because protobuff gives you a) a serialization approach, b) an automatic translation/representation of data in both Rust and python at the same time
Writing extensions can still be quite a bit of work, and people tend to bring in libraries for that anyway if it's non-trivial, e.g. pybind11
but I guess it just depends
sometimes if code is really out of std library then simply changing to pypy gives ample performance boost. But again, hundreds of other things to care about for long term
@naive apex what kind of data is this? a big array of numbers? some kind of deeply nested dicts and lists?
!rule 9 @azure siren We don't allow requests or offers of paid work here.
Sorry
Maaaaaan
You guys are the best. I feel like I'm just drinking in knowledge talking with you all
So many smart people in here, definitely will come here to ask questions in the future. Thanks for having me.
Noob question, but can anyone point me towards some resources for learning Python VM bytecode?
Well, the dis module docs https://docs.python.org/3/library/dis.html#python-bytecode-instructions has a list of all the current bytecodes and their functionality, and the opcode module has a bunch of lists with the actual indexes. One key thing about the behaviour is that CPython uses a stack to hold all in use data - load instructions push to the top of the stack, then operators pop their inputs, and push the result.
You may also want to consult ceval.c, which implements all the bytecodes and the core eval loop.
https://github.com/python/cpython/blob/main/Python/ceval.c
I was reading the dis output, but without those docs it made no sense lol. Thanks for the response. I'll check over that in the morning.
Also for reference, the columns in dis are in order the line number, bytecode index, opcode name, opcode parameter (normally 0-255, with EXTENDED_ARG up to 4 bytes), then the decoded value of the parameter if useful (var name, constant value, etc).
The code object has a bunch of tuples the opcodes index into, like the constants array, the names array for global names looked up, etc.
Lines with a >> at the start are detected as the destination of a jump instruction.
I don't think I'm ever going to understand this parser O.o
A bit more on EXTENDED_ARG: because internally the opcode argument are stored in a signed 4 byte integer (-2,147,483,648 - 2,147,483,647), it is possible to set the opcode argument to a negative value using repeated EXTENDED_ARG opcodes in manually crafted bytecode
afaik, negative opcode arguments do not occur in generated bytecode
So dis doesn’t output the code in the same order it receives it?
It does output it in order, but it just displays the offsets and line counts so you can keep track.
!pban @wintry herald spam
:ok_hand: applied purge ban to @wintry herald permanently.
I dont know how to add graphics yet, anyways im trying to make a game that is almost completely reliant on achievments, achievments is how you beat the game, and like i said i dont know graphics yet so it is going to be a text based game. Any ideas for the game and the name for the game
This would be a question for #game-development, though all of us would probably say that if you aren't will to learn even basic graphics, you aren't invested enough to pull off what you're wanting to do. Game development is hard. It takes a lot of time and a lot of work. If you aren't prepared to do that work, it means your heart isn't in it and you should find something that you do like to do
Not to be mean or anything, but its the truth
I'd expect the description to be accessed by msg.embeds.description
If you need node.js help, you should ask in off-topic
Off-topic channels
There are three off-topic channels:
• #ot0-psvm’s-eternal-disapproval
• #ot1-perplexing-regexing
• #ot2-never-nester’s-nightmare
Their names change randomly every 24 hours, but you can always find them under the OFF-TOPIC/GENERAL category in the channel list.
Please read our off-topic etiquette before participating in conversations.
I'm starting to do the reading on PEGs and Python's new parser. I wanted to check my understand as it stands, and ask a couple of questions of you guys
So, a parser-expression-grammer is a set of rules written in a creole not unlike regular expressions, and is used to define the various patterns that constitute valid syntax. Unlike regular expressions however, no 'standard' universal-across-languages procedure exists to apply the expression against text. Additionally, a parser-expression assumes whatever system is applying the rules is capable of recursion and other mechanics not available in regular expressions
To apply the expression, the expression is fed to a 'parser generation' (possibly alongside a 'metagrammar': an additional set of rules/specifications which tell the parser-generator how to interpret the expression) which generates actual code capable of applying the patterns to text (or a stream of tokens)
Unlike Python's original pgen parser, a 'left-recursive pushdown parser with 1-token lookahead' (I have very little understand of what that means), a PEG-enabled parser is capable of both infinite lookahead and infinite lookbehind (left-recursion???). Additionally, it is a 'recursive-decent' parser: one which checks a given alternative all the way to completion or failure—consuming input in the event of success and moving on the next alternative in the event of failure without consuming input
The addition of an 'action' notation with a PEG enables an abstract syntax tree, as opposed to a concrete syntax tree, to be built directly within the parser. Use of 'memoization' (caching) and a few other tricks to save memory keep the parser running at linear speed
Did I get anything wrong? Have I displayed any poor or partial comprehension of anything important?
*Addendum: left recursion and infinite look ahead/behind allows for significantly more readable and sensical patterns, with fewer 'hacks' and reliance on post-processing
*Addendum: actions specified within the grammar act not unlike callbacks, and are used to actually generate the nodes constituting the ast (???)
I believe a PEG is a grammar for expressing certain kinds of languages, and there are time-efficient parsing algorithms for parsing languages expressed as PEGs
Afaik the linear time algorithm you described is called "packrat parsing"
I guess because it memoizes a lot of stuff
That's my understanding also
I guess the most important question—the possibly months-of-work-saving question is
Assuming no gigantic changes to the nature of the system, could I apply python's native parser generator to a modified version of Python's PEG and have the resulting parser work?
What do you mean by this?
Well, I'm building a language whose syntax is based off my Python's. A few minor tweaks but the real differences come from implementation, not syntax
it is possible. Currently CPython's parser generator
outputs in 2 different languages, c and python
Could I take Python's PEG, modify is as needed, and feed the modified expression to the same parser generator python uses to create working parser
🤤
and the C parser is pretty specific to CPython since it uses a lot of internal functions
but if you were to use the Python generator and port the grammar, then it would work
In fact, there is already a work in progress PR to do so
Being able to proceed rewriting only the grammar, and not the generator, would be a godsend
and i believe author of that PR is also working on a language that is based on python (a python superset tbmp)
Or use another parser generator, there's quite a number around.
Pegen is pretty cool tbh, especially with all the actions and it's custom expansion forms like ','.something+ etc
So it sounds to me as though my way forward is to treat the parser generator itself as a black box for now and instead focus on having a thorough understanding of the language used to write the expression.
Write the expression, feed it to the generator, and I guess see what happens?
if you are planning to make small changes on the grammar, you can even use the parser as is.
As it stands now, the only difference is a few additional operators as well as multiline lambdas (through arrow notation)
The latter might be a bit tricky. Blocks within blocks
for cases like this, i go with tokens. For example if you'd like to add something like $name, then you can simply alter the tokenizer to recognize ($) and then manually edit the token stream to replace $<something> with the form of __name_<something> and after the parser creates the AST, go over all the identifiers and replace the custom forms with their own nodes
for the case of arrow functions, you could simply replace them as normal functions with a weird name, e.g __anon_uuid(<sig>): and handle it as my previous example
You're not wrong with regards to your approach, but it would be a missed learning opportunity for me to go that route
What I'd much rather do is focus on learning how the grammar works first and feed it through a working generator, and once I know the grammar is sound, backtrack and build my own generator
Having both as question marks would make things way too ambiguous
if your main purpose is learning, then I guess the proper way would be not caring too much about thoroughness (like how much of esoteric stuff that you could parse) but rather find a version of old python grammar (perhaps something 3.8<) and try to write a parser for it (or even parser generator, if you don't like hand written stuff)
once you get the theory, you could even fork the old 'pgen' to add backtracking. 🙂
Ahh, in my reading I've seen that the core devs feel its time to put pgen out to pasture
Too old
well pgen is gone (there is still a fork of it living under lib2to3) but it is deprecated
and will be gone soon
though I'd say a LL(1) parser is much more fundamental and simple then the other variants out there
I'm glad to hear I'm not completely misunderstanding the problem. I think I might be on the right track
I'm glad I took the time to write this all out. I was going to take a shower. But the process of building the parser seems much less like magic now that I've had a chance to put it all out in words
😄
it is indeed really fun to work on. If you are interested in going even deeper, I'd really recommend 'Parsing Techniques: A Practical Guide' for other different methodologies
Oh thank you!
It seems that if you're working on a bunch of folders of interrelated Python files that aren't part of a library (you haven't installed it with pip install -e), the best way to avoid import errors is to use python -m from the root folder of the project. Am I right in thinking this?
it depends on the particular structure, but I think yes, python -m is more likely to work than anything else
Yes, since then sys.path keeps your working directory.
so i'm not exactly clear how python bytecode is turned into instructions for the computer, but basically i was wondering if, if python has, for example, two binary add opcodes in a row, if it used SIMD to execute it, since virtually every CPU supports that nowadays
i've been playing with simd in cython, and it's really cool, but i can't figure out how to check if python uses them
and if it doesn't, why not? guido had said that speeding up python is now a major goal
@silk pawn bytecode isn't turned into instructions, they are interpreted by a giant C switch statement.
@silk pawn and the add opcode has to figure out what "add" means for the object at the top of the stack.
let's say it determines that the object is a primitive int
Python/ceval.c line 1813
switch (opcode) {```
i'm not sure what you mean by primitive. All values are objects, including ints
@silk pawn binary add: https://github.com/python/cpython/blob/main/Python/ceval.c#L2033-L2057
sorry wrong terminology, i meant like if it determines that it's like a basic add for a C int
god i can't phrase this
A python int is an object with a type and a refcount.
so if you do
x = 1
y = 2
z = 3
a = x + y
b = y + z
will python just call pynumber_add twice or is there a special thing to add two sets of pyobjects that can be determined to represent an integer
it will have two BINARY_ADD bytecodes, and will call PyNumber_Add twice
because in c, i believe you can do (some syntax omitted)
int x = 1
int y = 2
int z = 3
int a = x + y
int b = y + z
and if you use some simd stuff then it does the adding in one instruction
this is CPython we're talking about. Other implementations like PyPy could be smarter
Python is very different than C
yes i understand that much, but i'd think python could try to emulate this behavior to make it faster
what is the barrier to python doing this
i think the reason it doesnt is because python has very few guarantees about the type of an object
There's more going on in those last two statements than simply adding numbers. There's object allocation. And the ints could be multi-precision in the first place.
the statement a = x + y results in LOAD_NAME 'x' LOAD_NAME 'y' BINARY_ADD STORE_NAME 'a'
BINARY_ADD is generic, so it will work for any object that defines __add__
ok yeah i forgot about object allocation, but could python detect if the int is multi precision and then fall back to the current way
the problem always boils down to the fact that Python doesn't know what type stuff is at compile time, I guess
@silk pawn did you see the comment in the BINARY_ADD switch case?
yes i'm just using this as an example
i saw the comment from victor about not micro optimizing
It's impossible for Python to know, unless everything's a constant.
And they're all local variables.
yeah
if you need to add fast…you have numpy
and if you need to add very fast…numpy + numba
all ints are multi-precision.
Interesting issue about subtyping and type narrowing
https://github.com/microsoft/pyright/issues/1899
from typing import Union, TypedDict
class Foo(TypedDict):
x: int
class Foo2(Foo):
y: int
class Bar(TypedDict):
y: str
def f(foobar: Union[Foo, Bar]):
if 'y' in foobar:
print(foobar['y'].lower())
x: Foo2 = {'x': 1, 'y': 2}
f(x) # fails at runtime:
Python doesn't have a special type of int that wraps a native integer.
every int in CPython is represented as an array of base 2**30 digits.
I think int in cpython is arbitrary number of digits
yes, arbitrary number of 30-bit digits
I guess that is a way to look at it
it's the way the implementation thinks of it.
I would have still thought in terms of an array of binary digits that expands in multiples of 30
I wonder why 30
I guess it uses the remaining 2 bits for something, not immediately obvious what
not sure, it might be so that overflows of digit/digit ops don't become inconvenient.
That makes sense
It can just do 32 bit operations, and if the most significant bit gets set, set it back to zero and it knows to set the least significant bit in the next one
the code has a constant, PYLONG_BITS_IN_DIGIT, which is either 15 or 30
I wonder why go this route when a huge fraction of machines running python are 64 bit
perhaps because 2**30 * 2**30 will fit into a 64-bit int
From a comment in the code: Type 'digit' should be able to hold 2*PyLong_BASE-1, and type 'twodigits' should be an unsigned integer type able to hold all integers up to PyLong_BASE*PyLong_BASE-1.
Interesting stuff
and twodigits is uint64_t
I wonder how it compares with arbitrary width integer implementations in C++
I suppose you could actually compare them directly, using the C code and not going via python
Hi there I'm new to the community and I need help with python for an assignment
#python-discussion might be a better place
got pinged....
there was a raid. it's over.
Another on, eh?
what is a "raid"
not a big discord user
some kind of attempt to do the discord equivalent of DDOS
yep. spam messages to disrupt conversation.
oof that's so sad
Thoughts on building a decorator to dynamically hint a method? Would I rebuild the method by a call to types.FunctionType and provide the new annotations as an argument?
why would you dynamically hint a method? 👀
I'm actually just getting to the point of asking myself that very question
🙂
Just, I find this rather unpleasant
def generateTokens(self, whitespace:str=None, comment:str=None, number:str=None,
string:str=None, keyword:str=None, operator:str=None, identifier:str=None,
**matchgroups:ExpressionToplevel.Matchgroups):
Well, the whole point of type hints is that they provide documentation, and that tools like mypy and pyright understand them. If you generate the typehints dynamically, you lose all of the benefits
What does that function do?
I'd much prefer (for my own language)
@annotate(matchgroups=ExpressionToplevel.Matchgroups)
@annotate(whitespace=str, comment=str, number=str, string=str)
@annotate(keyword=str, operator=str, identifier=str )
def generateTokens(self,whitespace, comment, number, string, keyword, operator, identifer):
Sorry that took so long to type. But its only just dawning on me in a formal sense that the language doesn't care about hints- introspection tools do, and they do so by searching the source literal
that would look a lot nicer just by splitting it over lines:
def generateTokens(
self,
whitespace: str = None,
comment: str = None,
number: str = None,
string: str = None,
keyword: str = None,
operator: str = None,
identifier: str = None,
**matchgroups: ExpressionToplevel.Matchgroups
):
Signatures have always been a bit of a sore spot for me. In a perfect world, I'd prefer
@annotate(matchgroups=ExpressionToplevel.Matchgroups)
@annotate(whitespace=str, comment=str, number=str, string=str)
@annotate(keyword=str, operator=str, identifier=str )
@default(whitespace=None, comment=None, number=None, key=Non)
@default(operator=None, identifier=None, matchg=None )
def generateTokens(self,whitespace, comment, number, string, keyword, operator, identifer):
That does indeed look better, but, my code is generally rather long- long lines I mean
what does this function do? why does it have so many parameters?
And hinting like that throws off the feng-shui for me (all of this is completely topical, and unimportant by the way)
Well as the name suggests, it generates tokens. Each one of those arguments is a capture group as returned by match.groups()
So its either a string, assuming the group matched, or none
Jeeze, my spelling is bad today
Is it possible that more than one of the arguments is not None?
Generally most of them are
Sorry, no, I misread
Generally speaking, only one will be not-none
So you have a regex like (?P<foo>...)|(?P<bar>...)|..., right?
More or less
So if you don't have a bug, it's impossible to have more than one matched group?
In this specific case, yes. 'generateTokens' only takes capture groups corresponding to the main lexical categories- only one will ever match. Other similar methods can have multiple matches though, for example
def generateBaseX(self, number:str=None, integer:str=None, floatpoint:str=None,
exponent:str=None, complex:str=None,
**matchgroups:ExpressionToplevel.Matchgroups):
"""Generate a number token of integer, floatpoint, exponent, or complex.
NOTE: Complex takes precedence over exponent, which takes precedence over other formats.
NOTE: Exponents are floating point numbers by definition."""
if complex:
return self.token('NUMBER', 'COMPLEX', self.source.advance, number);
if exponent:
return self.token('NUMBER', 'EXPONENT', self.source.advance, number);
if floatpoint:
return self.token('NUMBER', 'FLOATPOINT', self.source.advance, number);
if integer:
return self.token('NUMBER', 'INTEGER', self.source.advance, number);
Fix you beautiful son of a bitch
👀
+2
Or just accept the match as the argument.
???
the re.Match object
I could, though I generally prefer having a little more control than that. I might want to switch up how the object gets printed or else add some other functionality
Is python's parser process a single step? I've seen it separated in some contexts into 'syntax analysis' and 'semantic analysis', but those might be so closely interwoven that they could be executed in a single step
I remember a quick tokenizer made during a beazley talk:
!e
import re
from collections import namedtuple
tokens = [
r'(?P<NUMBER>\d+)',
r'(?P<PLUS>\+)',
r'(?P<MINUS>-)',
r'(?P<TIMES>\*)',
r'(?P<DIVIDE>/)',
r'(?P<WS>\s+)',
]
PARSER = re.compile('|'.join(tokens))
Token = namedtuple('Token', 'type value')
def tokenize(text):
scan = PARSER.scanner(text)
for match in iter(scan.match, None):
if match.lastgroup != 'WS':
yield Token(match.lastgroup, match.group())
print(*tokenize('2 + 3*4 - 5'))
@deft pagoda :white_check_mark: Your eval job has completed with return code 0.
Token(type='NUMBER', value='2') Token(type='PLUS', value='+') Token(type='NUMBER', value='3') Token(type='TIMES', value='*') Token(type='NUMBER', value='4') Token(type='MINUS', value='-') Token(type='NUMBER', value='5')
Can I get a hand my dudes? I've asked in the general and also in a channel, no bytes (get it?)
self.source = source if isinstance(source, str) else source.read().decode();
self.sourcelines = io.StringIO(self.source).readlines();
I'm taking in input as either a string or a file-like io object. I want to get both the source and its constituent lines in unicode form
This approach is better than my original, but still seems off
uh is there a reason why you cant just do self.sourcelines = self.source.split("\n")
I believe that different operating systems use different newline separators, no? Assuming that's true, I need to split the input using the same procedure a file-like object would
when reading from file, Python turns newlines to \n
Oh, well that certainly helps
str.splitlines also exists
we shall not speak of the ;
Now, should I actually care about decoding the input? For lexing purposes, does it matter?
why are you even splitting the source into lines?
hm, I guess it can be useful to store the line number for each token
In case of a syntax error I need to supply the full line of text on which the error resides. Keeping an array of the lines and referencing them by the lexer's line index seems to me the most logical route
About the decoding?
Uh who pinged?
I mean, yes, I'll want the string in unicode form so it can be matched against the regex, no?
who pinged me?
we had a raid.
nop, the parser is pretty unaware of any semantics applied. For example return 42 is a legal statement on it's own, but when the semantics applied in the later stages (compiler / symbol table) it becomes a SyntaxError.
Is this normally achieved through if statements, or is there some sort of grammar applied?
