#internals-and-peps

1 messages · Page 110 of 1

raven ridge
#

think of things like static analyzers for C that try to detect buffer overflows - they're in a statically typed language, performing difficult sorts of static analysis, in ways that are essentially orthogonal to the type system

#

You're drawing a correlation between static analysis and static typing that I don't see - beyond that obviously a statically typed language makes static analysis of types trivial

acoustic crater
#

I think you can statically figure out every expression in python as long as the inputs aren't doing certain things but it'd be very complex

halcyon trail
#

because they are bolted on after the fact, and they are heuristic

spark magnet
#

static languages don't have metaclasses. That doesn't imply that metaclasses are the hardest thing to statically analyze.

raven ridge
halcyon trail
#

in a mathematical sense, no

raven ridge
#

since a FIFO is read-once, you can't even peek in the file to see what that would do.

halcyon trail
#

the question was asking about static analysis in the context of optimization; to optimize based on static analysis you have to be 100% sure its correct. If you have 100% certainty of something based on static analysis, then effectively at that point, it could be made part of the type system, since that's what types are.

spark magnet
#

@halcyon trail the 100% is why i mentioned getattr. I think you said getattr was fine as long as the data met a certain criterion. You can't assume that criterion.

halcyon trail
#

If you have a static analyzer for C, sure, it's bolted on, so you still have a separate type system of C itself, and the conclusions of the static analyzing bounds checker, or what not. Effectively, if the latter had 100% certainty, it would be like having a second type system.

#

@spark magnet I mean I'm talking about python features, as they are typically used in day to day usage.

gleaming rover
#

If you have 100% certainty of something based on static analysis, then effectively at that point, it could be made part of the type system, since that's what types are.

#

feel like this is not valid

#

hm but I'm not sure I need to think about it

spark magnet
halcyon trail
#

A lot of usages of getattr could in practice be done, no problem, with a compile time reflection system

#

for me the most common usage of getattr is reflecting over things usually

#

all of the members of a dataclass, for example

spark magnet
#

that's one use, sure.

#

you said i wasn't allowed to choose a hard case. I think you aren't allowed to choose an easy case 🙂

raven ridge
#

@gleaming rover other things that are tough to statically analyze: import hooks, .pth files, # coding: comments

halcyon trail
#

Well, there's a distribution of hard and easy, is the point

#

with metaclasses it's all hard

#

with decorators, very often hard, not always

gleaming rover
#

but really cool

halcyon trail
#

a lot of decorators are ok because they only change implementation details of say a callable

#

in that case the situation vis-a-vis types is simple

raven ridge
halcyon trail
#

but as soon as the decorator starts changing the API, it's a mess

#

.... I mean in real world use cases of metaclasses

gleaming rover
#

hm

raven ridge
#

so you're picking hard cases.

halcyon trail
#

No, I'm talking about what's typical in real world code

gleaming rover
#

does it matter though

#

as long as there's a possibility of a hard case

#

you need to be prepared to deal with that

raven ridge
#

Right - which means that it isn't metaclasses themselves that make static analysis difficult, it's things that are done by the metaclass.

gleaming rover
#

anyway I feel like

halcyon trail
#

well, I dunno, it depends I guess what angle you are looking at it from

gleaming rover
#

that's what I was thinking

raven ridge
#

What are things that metaclasses can do that make static analysis harder?

gleaming rover
#

metaclasses are only hard because

#

you can do a LOT of things in Python

halcyon trail
#

when you asked "most dynamic", I thought about the features that on average do the most dynamic things

gleaming rover
#

but if you could do only what you would normally be able to do in a language with a stronger type system and less reflection

#

I don't think metaclasses would really be problematic

halcyon trail
#

metaclasses, again, involve executing completely arbitrary code, just to understand what the type looks like

raven ridge
#

did I mention that you can change the class of an instance in Python?

halcyon trail
#

so they would be quite problematic

gleaming rover
#

but I'm saying

halcyon trail
#

I guess it really all depends on how you want to compare. getattr means you'll need to know the value of a string at compile time. In theory, that can also involve executing arbitrary code.

#

So metaclasses and getattr are equally dynamic, in the black and white sense

raven ridge
#

!e ```py
class A:
pass

class B:
pass

a = A()
print(isinstance(a, A), isinstance(a, B))
a.class = B
print(isinstance(a, A), isinstance(a, B))

fallen slateBOT
#

@raven ridge :white_check_mark: Your eval job has completed with return code 0.

001 | True False
002 | False True
halcyon trail
#

In practice computing strings at compile time can often be done to a great extent, if they don't literally depend on information given at runtime

#

with the metaclass though you are still running arbitrary code. at least, that is how I see it.

raven ridge
halcyon trail
#

they're not less dynamic, because like I said, you need to execute the code in the metaclass in order to understand the type, and the type is something you want to understand statically

raven ridge
#

why do you need to execute code in the metaclass in order to understand the type, instead of statically analyzing it?

#

The answer needs to be that the metaclass does something that's resistant to static analysis, and that's the thing that we're looking for - right?

halcyon trail
#

If you want to do that then you can also just bury the answer all the way at the bottom

#

the most dynamic feature of python is that everything is a dictionary, and you can modify those dictionaries freely

#

it's not really getattr, it's what getattr does, etc

#

keep pushing it down

raven ridge
#

yes, I agree with that.

#

except getattr is the lowest level for __slots__ classes, which don't have a __dict__

#

but I definitely agree that direct manipulation of __dict__ also makes static analysis very tough.

grave jolt
#

isn't it also true that things which are difficult for static analysis are also kinda difficult for humans to understand/reason about?

#

(in general)

#

like, I remember programming in C for a little bit

gleaming rover
grave jolt
#

don't say the M-word plz lemon_pensive

raven ridge
#

hm. There's a correlation, definitely, but I don't think that's a rule.

try:
    from unittest import mock
except ImportError:
    import mock

my_mock = mock.MagicMock()

is reasonably dynamic, but not difficult for humans to reason about.

grave jolt
#

maybe

#

well, that's because there is an external explanation to a human

raven ridge
#

there's two different modules named mock with basically the same interface, which can be used interchangeably.

grave jolt
#

or I'm not sure what you're talking about

#

ah

raven ridge
#

mock is a backport of unittest.mock to older interpreters - and it's still actively maintained, so you can actually import it in newer interpreters to get newer versions of mock than the stdlib shipped with.

#

it's more like unittest.mock is a periodic fork of mock, heh

grave jolt
#

yeah, not sure how static analysis tools will cope with that

#

I think I'm damaged by static typing.

#

I'm somehow unable to enjoy dynamic typing

raven ridge
#

I just built a library for work that, based on a config file or an environment variable, does essentially

if use_new_stuff:
    from .new_submodule import Thing1, Thing2
else:
    from .old_submodule import Thing1, Thing2
__all__ = ["Thing1", "Thing2"]
#

mypy is not happy.

#

whereas people have no trouble with it, because I've guaranteed that those things are interface compatible, even if they're not strictly speaking the same type.

acoustic crater
#

at this point

grave jolt
fallen slateBOT
#

class enum.Enum```
Base class for creating enumerated constants. See section [Functional API](https://docs.python.org/3/library/enum.html#functional-api) for an alternate construction syntax.
grave jolt
#

or, for example, pydantic.BaseModel

#

or the stuff in SQLAlchemy

acoustic crater
#

ah nice examples

#

so, as I should have guessed, metaprogramming

raven ridge
#

yeah - metaclasses are always for metaprogramming, really

grave jolt
#

Yeah, you wouldn't write your own metaclass if you're making a web API or something. Unless you really care about job security.

rich cradle
#

@fallen slate uses a metaclass for getting info from config.yml iirc

raven ridge
#

the only reason to use a metaclass is that you want to do something that can't be done easily with just regular types.

acoustic crater
#

the closest I've gotten to a real use case for metaclasses is using __init_subclass__

raven ridge
#

and there's fewer of those cases now than ever, thanks to things like __init.... yeah.

grave jolt
#

and __class_getitem__

#

well, metaclasses aren't the only solution to those things, other languages have come up with their own ways of creating schemas or enums

acoustic crater
#

my favorite use case of metaclasses is when ppl say everything is an object in JAVA and C# and I can say "no, classes themselves aren't instantiated objects in those languages"

grave jolt
#

well, in Java and C# not everything is an object anyway

#

they have primitives

acoustic crater
#

ah I thought in C# types pretended to be classes

#

not very familiar with it though

valid rose
#

anyone know what gc.get_referents does?

grave jolt
#

!d gc.get_referents

fallen slateBOT
#

gc.get_referents(*objs)```
Return a list of objects directly referred to by any of the arguments. The referents returned are those objects visited by the arguments’ C-level [`tp_traverse`](https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_traverse "PyTypeObject.tp_traverse") methods (if any), and may not be all objects actually directly reachable. [`tp_traverse`](https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_traverse "PyTypeObject.tp_traverse") methods are supported only by objects that support garbage collection, and are only required to visit objects that may be involved in a cycle. So, for example, if an integer is directly reachable from an argument, that integer object may or may not appear in the result list.

Raises an [auditing event](https://docs.python.org/3/library/sys.html#auditing) `gc.get_referents` with argument `objs`.
grave jolt
#

basically, gets children of an object

valid rose
#

i still don't understand referred to

acoustic crater
#

but yeah metaclasses let me badger ppl who claim everything is an object in their language which is pretty good on its own

grave jolt
grave jolt
valid rose
#

!e ```py
import gc
gc.get_referents(int.dict)[0]['uwu'] = lambda s: print('uwu')

(5).uwu()

fallen slateBOT
#

@valid rose :white_check_mark: Your eval job has completed with return code 0.

uwu
valid rose
#

what is happening here

#

my mind is bending

acoustic crater
grave jolt
#

!e

import gc
print(gc.get_referents(int.__dict__))
fallen slateBOT
#

@grave jolt :white_check_mark: Your eval job has completed with return code 0.

[{'__repr__': <slot wrapper '__repr__' of 'int' objects>, '__hash__': <slot wrapper '__hash__' of 'int' objects>, '__getattribute__': <slot wrapper '__getattribute__' of 'int' objects>, '__lt__': <slot wrapper '__lt__' of 'int' objects>, '__le__': <slot wrapper '__le__' of 'int' objects>, '__eq__': <slot wrapper '__eq__' of 'int' objects>, '__ne__': <slot wrapper '__ne__' of 'int' objects>, '__gt__': <slot wrapper '__gt__' of 'int' objects>, '__ge__': <slot wrapper '__ge__' of 'int' objects>, '__add__': <slot wrapper '__add__' of 'int' objects>, '__radd__': <slot wrapper '__radd__' of 'int' objects>, '__sub__': <slot wrapper '__sub__' of 'int' objects>, '__rsub__': <slot wrapper '__rsub__' of 'int' objects>, '__mul__': <slot wrapper '__mul__' of 'int' objects>, '__rmul__': <slot wrapper '__rmul__' of 'int' objects>, '__mod__': <slot wrapper '__mod__' of 'int' objects>, '__rmod__': <slot wrapper '__rmod__' of 'int' objects>, '__divmod__': <slot wrapper '__divmod__' of 'int' objects>, '_
... (truncated - too long)

Full output: https://paste.pythondiscord.com/qotozigiyo.txt?noredirect

valid rose
#

hmm, so what is a slot wrapper

grave jolt
#

ah, i c

acoustic crater
#

a slot wrapper wraps a function defined in C

valid rose
#

so by modifiying this dict, we can add new methods heh?

acoustic crater
#

not exactly

valid rose
grave jolt
acoustic crater
#

yeah

grave jolt
#

gc.get_referents(int.__dict__) has that dict as the first element

valid rose
grave jolt
#

yes, that dict is mutable

valid rose
#

!e ```py
import gc
del gc.get_referents(int.dict)[0]['repr']

print(repr(5))

acoustic crater
#

I am surprised that works actually without ctypes.pythonapi.PyType_Modified

#

that is pretty interesting

fallen slateBOT
#

@valid rose :white_check_mark: Your eval job has completed with return code 0.

5
valid rose
#

why doesnt it delete

acoustic crater
#

you can't mess with dunders by editing the dict alone

valid rose
#

interesting

acoustic crater
#

because theya re defined in slots of the struct in C

valid rose
#

so, if i define __slots__ in my own classes? can people mess with my dunders

acoustic crater
#

no that's different

valid rose
#

only for c dunders eh?

acoustic crater
#

ya

#

fishhook module does it

valid rose
#

can ctypes access libc?

pliant tusk
#

yes

valid rose
#

wait a sec, can i then use malloc and free in python?

pliant tusk
#

technically, yes

valid rose
#

hmm...

pliant tusk
#
>>> from ctypes.util import find_library
>>> from ctypes import CDLL
>>> libc = CDLL(find_library('libc'))
>>> ret = libc.printf(b'%i\n', 1)
1
>>> ```
raven ridge
#

can, yes! Should, no.

#

ctypes is like writing an extension module, but worse and harder to maintain.

pliant tusk
#

ctypes should really only be used to integrate with code that doesnt have a python interface

raven ridge
#

everything ctypes can do, Cython can do better.

pliant tusk
#

ctypes doesnt require compiling tho

raven ridge
#

yes, and in exchange it gives you no compile time type safety. And it's not necessarily portable across machines.

pliant tusk
#

i still prefer Cython, but for quick prototyping with a c lib i tend to use ctypes

raven ridge
#

CFFI has a mode where it doesn't require pre-compilation, too

#

I'd use that before ctypes, too

pliant tusk
#

ctypes is stdlib

raven ridge
#

yes, but so are all sorts of other terrible modules you should never use in real world code

pliant tusk
#

fair enough

raven ridge
#

urllib.request comes to mind first, but there are plenty of other things in the stdlib that are substantially harder to use, less safe, and more error prone than the equivalent third party lib

static bluff
#

So the other day one of you guys mentioned that having a reference to a class inside its own definition could "lead to the seeping out of the uninitialized class"

#

I guess I can sorta understand that- but what I want to know is why that would be untenable

raven ridge
#

setting that aside, metaclasses make it impossible, so it's kind of a moot point

static bluff
#

Metaclasses, as they currently function

raven ridge
#

at least if we're talking about Python, and not a hypothetical Python-like language without metaclasses

#

right - metaclasses that take a namespace and turn it into a class.

#

with metaclasses as they exist today, you need to fill in a namespace first, before the class ever exists, because the namespace gets passed to the thing that makes the class.

static bluff
#

Right

#

I'm not trying to suggest that referencing a class in its own definition is a good idea, by the way. I'm just curious

raven ridge
#

setting that aside, in a hypothetical language without Python-like metaclasses, if the class existed before it was fully populated, you'd need to define semantics for what would happen if someone interacted with it in that intermediate state - what happens if you construct it? destroy it? call methods on it? Perform isinstance checks?

#

what happens if an exception occurs part way through defining the class, and so this class that you started to define never actually gets defined, despite something having obtained a reference to it?

acoustic crater
#

maybe just make the definition recurse lol

raven ridge
#

or maybe you just punt and say that it's all undefined behavior to touch it before it's finished being built - but in that case, what's the point of exposing it?

static bluff
#

I think "obtained a reference to it" is an important thing for me to take note of. In theory, if the class instantiation fails you just return an undefined or else throw and error that may or may not get caught, yada

#

But if you store a reference to the uninstantiated class somewhere outside itself and then it fails, what then, right?

acoustic crater
#

before a metaclass is fully instantiated and you refer to the instance... treat the reference as a new class constructor and run it with any new args n kwargs? It's dumb but that is what springs to mind

#

and the control flow is what you'd expect from recursion

#

it makes little sense but it makes the most sense as far as I can tell

raven ridge
#

even if the type existed, in that case __init__ hadn't yet been defined (nor even say_hello, and yet you're creating an instance of that type and calling a method on it.

acoustic crater
#

yeah there's no way to instantiate the metaclass instance if it itself isn't instantiated

static bluff
#

Well, I'm sure there's some way to do it

acoustic crater
#

recursive definition

static bluff
#

But its probably not a good idea

acoustic crater
#

is all I can think of

#

and yeah it's not haha

static bluff
#

I think the reason I had originally brought it up was because I was discussing how one might implement privacy in a language which allows new methods to be attached to a class after the fact

raven ridge
# static bluff Well, I'm sure there's *some* way to do it

there most certainly isn't. The metaclass can return an entirely different class depending on the values in the namespace. You can't have the class before the namespace is passed to the metaclass, because the metaclass can do something different depending on what's in the namespace it gets.

static bluff
#

You'd need to mark all of the methods which were defined directly inside the class as 'native', and define the method's 'owner' object as being the class

raven ridge
#

what makes those methods more important than others?

static bluff
#

You'd want the methods originally defined within the class to have access to the class' (and its instances') private attributes, but prevent any user defined methods from being able to access them

acoustic crater
#

if you can make class attributes private you can just have a different sort of attribute that's private

#

the question is just how to make them private

raven ridge
#

I'm not being entirely facetious - that would make class decorators far less useful, for instance, because they wouldn't be able to monkeypatch in new methods.

static bluff
#

Well, I might be wrong here but, if it was as simple as just attaching a method to an object and, tada, you have access to the private attributes- well- whats the point?

raven ridge
#

likewise with @unittest.mock.patch for unit testing, etc.

raven ridge
acoustic crater
#

how would access to itself change that?

#

where do you want those private attributes to be accessed?

#

during construction only?

static bluff
#

During construction, and within the scope of any methods defined directly within the class' definition space

acoustic crater
#

so you want a whole private namespace

#

for classes

static bluff
#

Yes

#

In theory

acoustic crater
#

but you also apparently want a liminal namespace for accessing "public" stuff

raven ridge
#

which also means getting rid of getattr and setattr

static bluff
#

liminal?

acoustic crater
#

it could be like javascript Symbols just inaccessible

#

liminal means in between

#

Symbols without special methods/functions for accessing them

raven ridge
#

because the contract getattr(obj, name) and setattr(obj, name, val) have no way of knowing if the caller of the function is allowed to call the function. They don't even know who the caller of the function is.

acoustic crater
#

but you'd also need the in between components

#

the public interface

#

but yeah as godlygeek is gettin at, the hard part is actually building that private namespace and public interface to access it

static bluff
acoustic crater
#

you need between public and private too

#

can't be altered but are accessible

#

liminal

static bluff
raven ridge
acoustic crater
#

the public getter has the private attribute exposed to it though

static bluff
acoustic crater
#

so u can just define a setter if the public stuff has access to the private stuff

static bluff
raven ridge
static bluff
#

provenance?

raven ridge
#

the identity and history, I guess

static bluff
#

Ahh, well, yeah more or less

raven ridge
#

origins

static bluff
#

Which doesn't entirely disagree with Python's objective nature in my opinion

#

Obviously it would take proper planning and a thorough understanding of the problem, but it certainly seems doable to me

raven ridge
#

have you done much unit testing? Both in Python, and in a language with access modifiers?

static bluff
#

I can't say I have

raven ridge
#

it's the best reason why access modifiers are a terrible idea

#

they don't do anything useful, they just get in the programmer's way.

static bluff
#

psssssssst

#

Psssssssssssssssssst godly

halcyon trail
#

Disagree

static bluff
#

You're ruining my fun

halcyon trail
#

Especially when languages have an internal access modifier or similar

raven ridge
#

they make all sorts of reasonable things that programmers want to do - like "test what happens if I make a database call through my class while the database handle is in an error state" - much more difficult.

halcyon trail
#

So you can have something that is private to the outside but a ailable to tests

static bluff
raven ridge
#

and they don't offer any benefits in exchange, because practically speaking, in every language with access modifiers, untrusted code is running in the same address space and can just choose to ignore the access modifiers.

halcyon trail
#

That's really not true

static bluff
#

yeah that doesn't sound right to me

raven ridge
#

what's a counterexample?

halcyon trail
#

"just ignore" via complicated tricks with reflection usually

raven ridge
#

right.

static bluff
#

And anything can be hacked

halcyon trail
#

Yeah that's not the point

raven ridge
#

that's not "hacking"

halcyon trail
#

Protect against Murphy, not Machiavelli

raven ridge
#

your code is running in the same process as the stuff that someone is trying to protect from your code. There's no trust barrier between the two things.

halcyon trail
#

This just doesn't have any relationship to the software engineering realities of access control

static bluff
halcyon trail
#

Nobody claims it's a security measure

raven ridge
#

so we can agree on one thing that it's useless for, security.

#

what's a thing that it's not useless for? 😄

halcyon trail
#

Literally nobody ever claimed otherwise

#

Access control

raven ridge
#

you would be surprised how often people claim otherwise.

raven ridge
#

what's it good for?

#

why is "access control" a good thing? It doesn't aid security, but it does aid... ?

static bluff
#

It dissuades people from screwing with the internals of the program

halcyon trail
#

It aids prevention of people mucking around with internals of your code, encapsulation

static bluff
#

Which, if you're designing for beginners for example, is a good thing

raven ridge
#

hm, why?

#

beginners have plenty of things that they're told "just don't do that", or that they don't understand.

static bluff
#

I built a project that essentially replicates a javascript runtime within python. Pythonic control over the elements in a web page. Elements had a 'style' attribute

raven ridge
static bluff
#

Now, you can use an underscore to denote privacy, but people are going to screw with it anyway. It's one thing for someone to try to change it immediately and get an error more or less right away, but if they change something deep in the internals of the language and then some day, maybe weeks or months down the way, start getting an error whose traceback may even have nothing to do with the modification you used

grave jolt
#

well, languages with private fields/methods usually have a way of circumventing that 🙂

raven ridge
#

I've never seen one that doesn't.

#

and any with any sort of C FFI immediately has a way of circumventing it.

static bluff
#

Protecting, really protecting the internals of a project make it more resislient

acoustic crater
#

hiring coders that aren't idiots is much better

#

or just name private attributes DO_NOT_USE_OR_YOURE_FIRED lol

grave jolt
# raven ridge I've never seen one that doesn't.
const counter = () => {
    let x = 0;

    const increment = () => { x++; };
    const getValue = ()  => x;
    return { increment, getValue };
};

const ctr = counter()

Here you can't change the x variable from outside (apart from using increment)

static bluff
#

For me, its not about absolute refusal of access, its about keeping the internals away from anyone who doesn't have the skills required to work with them

static bluff
raven ridge
acoustic crater
#

why does a project involving novices need to be resilient?

grave jolt
#

well, in Python you can change x 🙂

raven ridge
#

you're not just stopping idiots from touching your internals, you're also stopping people who know exactly what they're doing from touching your internals.

#

which isn't necessarily a good tradeoff.

static bluff
raven ridge
#

I have fixed bugs in production libraries through monkeypatches. It was the right call.

halcyon trail
#

You're making it significantly harder to touch internals by accident, and very explicit when you do touch internals

acoustic crater
#

it'll already be impossible to refactor or read, why would just hiding internals from them prevent them from making other mistakes that make the project untenable?

halcyon trail
#

So people can see it in code review and ensure it's truly necessary

static bluff
#

You've got two otherwise equivalent languages, one with privacy, one without. If you've decided privacy is something you want, go for it. If you've decided otherwise, go with the latter

halcyon trail
#

Also, monkey patching and access control are two different things

acoustic crater
#

except we don't have them we just have speculation about a theoretical language and can't even decide how to implement privacy

raven ridge
#

I'm convinced that "privacy" is a way to give security through obscurity (or perhaps correctness through obscurity?), and therefore isn't valuable.

halcyon trail
#

Seems like this is more of an issue of dynamic vs static

#

No

static bluff
#

I think maybe this comes down to schools of thought 😛 I'm going to exit the debate with a smile on my face, having learned a thing or two not least of which- access modifiers, an empassioned issue

halcyon trail
#

Nobody who knows anything about access control argues that it's related to security

raven ridge
#

that's absolutely not true.

halcyon trail
#

Frankly anybody who brings it up is just showing their own misunderstanding

raven ridge
#

I can point you to recent threads on python-ideas with people arguing that it's related to security.

halcyon trail
#

Then they don't understand it

raven ridge
#

I agree.

halcyon trail
#

I'm sorry but it's that simple

grave jolt
#

I have never seen anyone argue it's for security tbh

halcyon trail
#

Well that's what I said

acoustic crater
#

if people can accidentally mess with the guts of something, despite widely accepted prefixed underscore syntax, and would, why are they part of the project?

halcyon trail
#

The thing is that the underscore syntax + static enforcement would basically be access control

acoustic crater
#

yeah true I just thought that as I typed that

grave jolt
#

In my understanding it's a way to separate the public API of a thing from its implementaiton details, and to enforce it at the language level (with an escape hatch, as always)

halcyon trail
#

Just a primitive form and by convention

acoustic crater
#

pycharm yells at you for accessing underscore prefixed stuff, just enforce that and you're gold

#

no need to edit the language

halcyon trail
#

The fact is that most new statically.typed languages being created today, if they aren't say very purely functional, continue to include access control

static bluff
#

XD

acoustic crater
#

lol u have a lexer not a rewrite of python internals

raven ridge
halcyon trail
#

Why is static enforcement useful?

static bluff
#

What I mean to say is, I (and others) build languages for fun. 'No need to' implies tedium and unpleasantness

grave jolt
#

Oh, another thing that comes to mind is preventing name collisions.

grave jolt
#

Which is mostly solved with __ in Python, but I haven't seen many people use it

static bluff
#

Not the end of the world, but a nice bonus

grave jolt
#

which is, granted, something I have never seen

acoustic crater
#

a way to access mangled attributes set in a subclass without it being ugly and hacky might be nice

#

tho that is kinda not what mangling is for

grave jolt
#

a way to access mangled attributes set in a subclass
❓w ❓h❓y❓

acoustic crater
#

...the only reason I've done it is because my current CS prof thinks mangled means private and makes us use them

#

cuz he hates python

grave jolt
#

Why would you ever need to access a private variable of a subclass?

raven ridge
acoustic crater
#

for inherited methods that access that variable >_>

#

it makes no sense I know

grave jolt
#

In languages with access modifiers there are 'protected' variables

acoustic crater
#

idk what the people new to python did

grave jolt
#

You can't access parent's private variables/methods of a parent class AFAIK in those languages

raven ridge
#

because the parent class should have no knowledge of its subclasses, generally, for Liskov reasons.

acoustic crater
#

thinking back, maaaybe he meant for people to make more setters and getters

#

without explicitly saying so

raven ridge
#

getters and setters are generally considered an anti-pattern in Python.

acoustic crater
#

the tests for the assignments had methods like "setThis" and "getThis"

#

yeah it's gross

#

I tried using property and .setter but the tests required those naming conventions

#

and he calls instance methods class methods

#

_>

static bluff
#

I really don't see the point in ever setting both a getter and a setter, unless you're doing some sort of logic with the value being passed to setter of course

#

That might be my ignorance talking though

acoustic crater
#

yeah and you can just overload the attribute name in python

raven ridge
#

yeah. @property makes getters and setters unnecessary in Python, because you can evolve obj.attr = 42 to call a method in the future without callers needing to change their code.

acoustic crater
#

so there's no need to set them for everything initially

#

exactly

#

I'm just getting credits

#

ppl ask if they should take this class and I say "no, it's not Python"

raven ridge
#

the reason you need setters in Java is because there's no way to start with obj.attr = ... and add validation to it later without needing all of the callers to change

acoustic crater
#

yeah different languages have different standard practices for a reason

raven ridge
#

yeah. Teaching Python as though it's Java is, unfortunately, very common, though.

#

it's a truer OOP language than Java! Everything is an object!

acoustic crater
#

srsly

#

allows for functional programming but the functions are objects

static bluff
#

Quick question. Is it too early to start using the 3.10 beta?

acoustic crater
#

I am using it on a project and it's fine

static bluff
#

Pattern matching working okay?

acoustic crater
#

yeah

#

except my IDE hates it

#

in the project itself I've only used type union syntax but that works great

#

pattern matching will be perfect for one part though

static bluff
#

yeah, I get the impression everyone, certainly me, is really excited for it

acoustic crater
#

it makes me wish python had an option for better recursion handling

raven ridge
#

it's a whole weird DSL that looks like Python without behaving like Python, and is going to be annoying to teach...

#

and the presence or absence of a . in determining whether something is a load or a store makes me sad, still.

#

I'm sure I'll wind up using it, but I'm not excited for it.

acoustic crater
#

the dottedness to determine a constant is weird indeed

static bluff
#

It's the steady march of change, either way

acoustic crater
#

but you can use guards lol

#

I like the guard/pattern match combination

#

idk if it's unique

#

does another language have that?

raven ridge
#

I wish that instead of guards there was a "go to next case" statement, to break out of one that matched and continue matching on the next one. Though that would have the tradeoff of removing the option to evaluate cases in parallel.

static bluff
#

So, when you compile something into a code object, you can do so within a module's namespace or simply within its own little virtual space. What a term to describe whatever thing, module or otherwise, in which the code is being compiled?

raven ridge
#

namespace

static bluff
#

Rockin, thanks

acoustic crater
#

yeah I'm glad they don't have fallthrough

#

that was always a werid aspect of switch case especially to me especially because it seems seldom used

#

maybe if match case was a module though and they introduced module-level soft keywords 🤔

raven ridge
#

I'm betting all keywords going forward will be soft

acoustic crater
#

cuz the new mini language being global is weird

#

yeah they should be

#

someone in another server linked a tweet where someone was complaining about for (var of of of){} in js I'm like wtf do you want it to do the only nasty part of that is your IDE not understanding it

#

(all the ofs were keyword colored)

sacred yew
#

thats typescript im pretty sure 😛

acoustic crater
#

oh I mean var

sacred yew
#

i mean vanilla js is still pretty bad

raven ridge
acoustic crater
#

js is extremely janky haha

sacred yew
#

aren't guards what other FP langs do?

acoustic crater
#

afaik not directly in conjuction with pattern matching

#

but yes

halcyon trail
#

I like pattern matching a lot for statically typed languages

#

For python, the benefits just aren't as big

acoustic crater
#

maybe they will add less overhead to recursive function calls >_>

halcyon trail
#

So I'm less sure on whether I like it

acoustic crater
#

somehow

halcyon trail
#

Also python is starting to feel really kitchen sinky these days

raven ridge
#

I'm sure I'll find places to use it, but I don't find it... exciting. I have a begrudging acceptance of it. 🙂

acoustic crater
#

it's definitely not even something I think should be taught

unkempt rock
#

hi

#

Is here the pro daddy section ?

sacred yew
#

?

halcyon trail
#

?

unkempt rock
#

?

raven ridge
halcyon trail
#

Not necessarily

#

Depends on the feature

#

Some features are more for library authors

#

Like metaclasses

#

But in the case of pattern matching

#

I agree

raven ridge
#

yeah, that's a fair point.

halcyon trail
#

If it's so hard that most people use it, something is wrong

#

*don't use it

acoustic crater
#

is haskell wrong?

#

haha

halcyon trail
#

Yes :-)

raven ridge
#

it gives me serious regex vibes. It's its own special DSL jammed into the language, and it sort of looks like the language, but it doesn't behave like the rest of the language.

acoustic crater
#

it's only hard because of a barrier to entry

raven ridge
#

int(x) after case does something entirely different from what int(x) does everywhere else.

acoustic crater
#

it's not as convoluted or powerless as regex tho

#

lol you can even define real functions in it so it's your dict with multiline lambdas

#

...and weird rules haha

raven ridge
# acoustic crater it's not as convoluted or powerless as regex tho

Well... it's like regexes except that it matches arbitrarily nested objects of arbitrary types, instead of just textual strings. It's still pretty complex - at least, it's not intuitive, and it reuses syntax that means something entirely different in the rest of the language to mean a different thing

#

everywhere else in the language, int(x) takes an existing x and converts it to an int. Inside a case statement, int(x) takes an existing int and stores it to x

#

that's at least... weird.

acoustic crater
#

yeah

#

it seems like a lot of ppl were typing isinstance too much tho haha

halcyon trail
#

i still haven't come to terms with how weird walrus operator looks in comprehensions tbh 🤣

#

I'm behind

raven ridge
#

and you can't know how it will behave on an arbitrary type without knowing if that type defines __match_args__

acoustic crater
#

tbf that's all damn dunders

halcyon trail
#

pattern matching also gets kinda weird in python because python has two orthogonal type systems, the dynamic one and the static one

#

If you have Union[Foo, Bar] is a pattern match that checks for Foo and Bar exhaustive?

acoustic crater
#

does None even have defined behavior for __lt__? Who knows, it's got it though

raven ridge
#

Ooh, and int(x) and MyClass(x) do something entirely different in pattern matching

#

int(x) matches an int and stores it in x.
MyClass(x) matches a MyClass whose first match arg is x

acoustic crater
#

hmm what happens if MyClass is an int subclass?

raven ridge
#

you don't get the special int behavior.

acoustic crater
#

I guess int(x) would work

raven ridge
#

int(x) would work, but ClassName(x) would not

#

you instead would need to do ClassName() as x

acoustic crater
#

that sorta makes sense tho as far as the magic of builtins is concerned

#

only builtins really are apparently themselves not just defined to be as such

#

unless you implement a non built in in C or whatever

raven ridge
#

"built in" has two different meanings. It's used to describe both things in the builtins module (possibly only things that are in it by default, possibly including things that you add to it dynamically), as well as things that are defined in C extension modules

acoustic crater
#

so what does deque(x) do?

#

tho deque is an especially hacky builtin extension type object imo

raven ridge
#

As mentioned above, for the following built-in types the handling of positional subpatterns is different: bool, bytearray, bytes, dict, float, frozenset, int, list, set, str, and tuple.

#

those are the only ones that are handled specially.

acoustic crater
#

is it according to a new slotted dunder?

raven ridge
#

no.

acoustic crater
#

ah that is pretty weird

#

luckily can still be changed though

raven ridge
#

no... that can't ever be changed

#

it would be a backwards incompatible change to the language.

acoustic crater
#

ah true

raven ridge
#

deque(x) has a defined meaning in 3.10 - they can't change what it means in a later version.

acoustic crater
#

well, they can change the others to be slot dunders

#

without a change in functionality

raven ridge
#

well, sure - but why?

acoustic crater
#

idk

#

deque is weird in general tho

static bluff
#

_>

acoustic crater
#

behaves like it's defined in python

#

except for being a linked list

raven ridge
#

that set of 11 types will forever need to be handled specially

#

whether that set is hardcoded in the parser or given a special dunder that nothing else uses seems like an implementation detail.

acoustic crater
#

aren't they already?

#

can't mess with their methods without trickery

#

they just pretend to be classes

raven ridge
#

the same is true of complex, and that's not included in the list

#

it's an arbitrary subset of the builtin types

#

and this is normative - future versions of PyPy, for instance, will need to special case this same set of 11 types.

acoustic crater
#

at least it's just 11 then haha

raven ridge
#

it is, but... well, hm. I can't help but feel that section of the PEP didn't get enough discussion

#

is complex really so much less special than frozenset or bytearray? All 3 are pretty special case types...

acoustic crater
#

yeah lack of complex is weird

raven ridge
#

well, unfortunately, it can never be added hyperlemon

acoustic crater
#

long live the complexpy fork

#

skywalker u have a new calling

static bluff
#

I have module A containing class A and module B containing class B. Class A uses an instance of class B as an attribute, but class B requires knowledge of class A for type annotation. Whats the solution?

acoustic crater
#

don't

#

lol

static bluff
#

XD

raven ridge
#

from __future__ import annotations

static bluff
#

Oh, and whats this about my new calling?

raven ridge
#

(but also, don't)

acoustic crater
#

make complex behave like other builtins in pattern matching in a cpython fork

raven ridge
#

if you have a circular dependency in your types, it sounds pretty fishy - that seems to indicate something is factored wrong, more likely than not.

acoustic crater
#

at least they're not passing self to the instantiation of the composed class any more

#

...I hope

raven ridge
#

if an A has a B as an attribute, but a B has a method that returns an A, that's... suspicious. It may not always be bad, but it's a thing that's more likely to be bad than good, I think.

static bluff
#

Wise words as always my dudes

acoustic crater
#

good luck!

#

maybe you wanna pass a strategy for dealing with class A to class B... but rly shouldn't the concerned methods just be in class A?

raven ridge
#

it can happen in cases where there really is a circular dependency - like a tree with multiple types of nodes, where any type could contain another type, perhaps...

#

it's not always gonna be wrong, but it warrants a closer look.

static bluff
#

For the record, I just moved them both in to the same module

#

Spacing things out into multiple modules is, in my mind, important. But I've been known to take it too far

acoustic crater
#

yeah drawing that line can be tough

static bluff
#

My issue is that my lexer is already over 700 lines of code, comments included

#

And its working fine for everything except for strings, which are a rabbit hole

#

Fstrings need their own lexer and their own regular expression, which together will eat up a few hundred lines

acoustic crater
#

then u got ur b strings and r strings

static bluff
#

I'd really like to keep everything in one module, but I think it might be time to break things apart

#

Oh!

#

And, I want to implement arrow notation, those will require in the very least a good deal of coding within the normal lexer to handle, if not their own lexer and expression

#

Too much?

#

Abstract - Base class
Anonymous - Arrow functions
Interpolated - Fstrings
Namespace - Module level (or just plain namespace in the event of a 'compile()' without being provided a module to compile in)
Stringified - normal string

grave jolt
# sacred yew i mean vanilla js is still pretty bad

My favourite new trivia piece about JS:

Object.defineProperty(
  String.prototype,
  'onions',
  {
    set: () => { console.log("I don't like onions") }
  }
);
  
x = "foo";
x.onions = 2000;
console.log({'x.onions': x.onions});

When getting/setting properties or calling methods, primitives spawn a temporary boxed object

acoustic crater
static bluff
acoustic crater
#

but have you whittled a toothpick yet?

#

it seems like you have a half built knife machine haha

static bluff
#

I'm actually very pleased with my progress

acoustic crater
#

yeha you seem to be chuggin along

static bluff
#

😄

acoustic crater
#

just might be useful to have a new operator actually implemented and tested

static bluff
#

I mean- people have told me before that I need to walk before I can run, and I really really respect that position (and the people telling me that)

#

But I'm a trial by fire type. Its always been how I learn best

raven ridge
# static bluff Oh!

You should break things apart at boundaries between logical components, not arbitrarily based on size. If your lexer is a single logical component, there's nothing wrong with keeping it in a ten thousand line file

static bluff
#

In principal I agree

raven ridge
#

Replied to the wrong message. Stupid mobile.

static bluff
#

But I personally don't think its tenable to expect anyone to read through a document more than 2000 lines of code long. Theres just too much to keep track of

#

Fine for me sure because I wrote it- and maybe you too. But most of the people here in the advanced channel have the wits to be able to handle that. The most likely reason someone would be reading through the source would be to figure out how it works- very possibly starting from square one with no context to fall back on

#

I guess, I dunno. I'm trying to bring my coding style to within more standardized limits- keeping things united in their own modules for example

#

But my instincts are yelling at me to space things out at this point, and I'm not sure how thin or thick is 'normal'

raven ridge
#

if it's 2000 lines of code, it's 2000 lines of code. Reading through 2000 lines of code in a single file isn't necessarily worse than reading 2000 lines of code spread across 5 files.

#

if the boundaries are bad or arbitrary, reading the same 2000 lines of code can be much more difficult when it's spread across 5 files than all in one.

#

I'm not saying you shouldn't break things up into submodules, just that the criteria for where to split things should be based on the boundaries of logical components, and not on the size of the file

grave jolt
#

at least it's not spread across 5 npm packages

raven ridge
#

figuring out what should be a component and what shouldn't and where to draw the lines between them takes a lot of practice - you just need to read a lot of code, and see both good and bad divisions, before you can get any good at it.

#

as a professional programmer for many years, figuring out how to split things up so that the divisions make sense to other people, and so that things aren't too tightly coupled and don't have too many responsibilities, is still one of the parts of the job that takes the most effort for me.

#

but I have seen files of code that were ~40k lines where it wouldn't make any sense to divide them up - everything in them was closely related, they represented a single, large, logical component.

#

Any possible division would have been arbitrary, and wouldn't have aided comprehension.

static bluff
#

Fair ^^

#

Well, I'm making the judgement call, at least for now, that to have multiple nearly identical (and visually confusing) components with different uses so close to each other is only going to cause confusion. And, its what my instincts are telling me 😛

#

What do you do Godly?

raven ridge
#

I work on the Python Infrastructure team at Bloomberg - a big news and financial analytics company. My team is responsible for the health of the Python ecosystem at Bloomberg, from maintaining patched interpreters and keeping up with CVEs to providing Python bindings for first party C++ libraries that the company already had (our backend for a long time was Fortran, then C++, and now it's switching slowly towards Python)

#

slowly in the sense that there's a lot of new Python code being written, but there's a huge codebase of existing C++ code, and some Fortran, still in active use

static bluff
#

O.O

#

Thats amazing

#

I'm not normally one to gush but christ, its an honor

#

Thinks for keeping my job warm for me by the way 😉 I'll see you in ten years

raven ridge
#

eh, I'm just a developer. I'm a damn good developer, but there's a bunch of impressive people on my team. We've got a CPython core dev, and one of the maintainers of pip...

static bluff
#

Still

#

Any advice for a whippersnapper with big dreams and quick fingers?

acoustic crater
#

ur sayin my terminal will have 3.10 soon?!

#

I just got space invaders to run on the darn thing

static bluff
#

XD

#

I saw someone tried to install doom on a home pregnancy test with a digital readout

raven ridge
static bluff
#

Would one of you fine folks be willing to take a look at my code? (I'm asking here, because I feel like I'm learning more talking to you guys than I've ever learned anywhere else)

lapis mist
#

Hey Guys!!! I am just trying web development in Python using flask. I just created everything, but when I try to register and login it says "CSRF tokens are missing". What should I do here?

static bluff
#

!paste

fallen slateBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

static bluff
#

I've been trying to conform to norms a bit better. There's nothing really impressive happening in this module, but I wanted to know if you guys noticed any bad habits (aside from the semicolons :P)

static bluff
prime estuary
# static bluff https://paste.pythondiscord.com/uqibamatug.py Updated comments

Well, in Python nesting your classes like that does absolutely nothing, other than making you have to access it with a dotted name. It's probably easier to just put them at the top level above your main class. In Caret, you could use operator.add instead of the lambdas you're creating there - or really just directly add the three attributes, the iteration is a bit overkill. Why is the caret iterable in the first place anyway? Instead of unpacking to tuples, give it a as_tuple method, probably returning a NamedTuple.

static bluff
#

I had originally intended to have not just addition, but all four basic operations in caret- the lambdas are just a relic from that

#

As for the the iteration overkill, yeah, you're probably right

naive apex
#

hey all, I've got a python-adjacent question that could use some help on. I've got a client-server system written in python that sends data over a socket by pickling it on one side and de-pickling it on the other. we use it for sending messages up to ~500kb in size. I wanted to rewrite the server portion of it in as there were some computations that could greatly benefit from rust, but now I'm stuck on server performance - since I can't pickle the Rust side of things, I've been sending it over as a JSON byte stream, which tanks the performance: the time is dominated by converting the vector arrays on the Rust side into a JSON string, and loading on the python end. fwiw, it's largely tuples of floats that get sent over.

I'm curious if anyone has any suggestions on something to look into that can get me back to the python-python pickling / unpickling performance. I imagine something like protobuf might get me closer, but I'm really not sure. I've tried to use some of the faster JSON parsers (ujson, orjson, hyperjson), but it's still ~100x slower than the pickle-pickle side of things. This makes me think JSON is not the correct answer.

TL;DR: efficient way to send large amounts of data from a Rust app to a python client?

flat gazelle
#

something like messagepack or protobuff is likely to work

halcyon trail
#

or even bson

radiant fulcrum
#

generally JSON is fast enough

#

like serde (serde_json is orjson for python) can process hundreds of MB/s in terms of data throughput

#

also

#

so technically, yes you can pickle rust

#

and rust can unpickle python stuff

#

within reason

warm wadi
#

I wonder if you could think of having unix sockets open for comm and just stream the data from one to another, also stream the response back

#

sorry, my bad. I lost context you have already covered

radiant fulcrum
#

socket wise yeah, unix sockets are gonna be the fastest method of transport though

halcyon trail
#

but he's saying it's not fast enough?

radiant fulcrum
#

pithink I dont see how that can really happen though for 500kb messages

halcyon trail
#

there isn't much reason to use json as an interchange format between two programs you control, unless you actually have some reason for wanting human readability

radiant fulcrum
#

Well i mean you get the easy of protocol compatability

halcyon trail
#

you get the exact same thing with bson, or msgpack

radiant fulcrum
#

writing or using a diffrent serializing setup can make life harder between programming language because of support

#

JSON is already hyper optimized for this stuff. Like I dont see how the format is the bottlekneck here

halcyon trail
#

err what

#

how is json hyper optimized for this stuff

visual shadow
#

Everyone uses json for communication over the web and its very common to use jsons for simple data between two programs. It's got its advantages.

radiant fulcrum
#

Because it's been used for communications over sockets for decades, pretty much every language has a setup for handling the format and often most serlizers and deserializes are massively optimized for handling it because of how much it's used

visual shadow
#

Primarily you're guaranteed everyone supports it

halcyon trail
#

that doesn't mean it's "hyper optimized"

#

the data format is the data format

radiant fulcrum
#

Yes but the implementations that follow it are what make it hyper optimized

halcyon trail
#

the format itself isn't designed to be optimal in any performance sense, and there's only so much you can do to optimize performance of reading it

#

they try, yes, but it's still slow compared to a binary format

#

anyway, this is purely speculative? The guy said he benchmarked and the costs of sending via json are a big factor. We don't know what his timescale is.

#

Moving from json to bson or msgpack is very easy

radiant fulcrum
#

But it should be fast enough for what they're dealing with, I find it hard to believe that the format is the bottleneck is considering that serde can process 400MB+ a sec per core

halcyon trail
#

You don't know what they're dealing with though....

visual shadow
#

This feels like premature optimization to me to be honest. I mean, if you're sending 1gb worth of data don't use jsons I guess. But for small sizes, why should json not be used

halcyon trail
#

oy

visual shadow
#

For what it's worth, in the grand scheme of things this decision won't really make or break your code regardless

flat gazelle
#

in this case, they state that JSON serialization is too slow in rust. So either pick a faster JSON library or stop using JSON

halcyon trail
#

Why not just actually give the person some useful advice, instead of telling he that he doesn't know what he's talking about, and that he's prematurely optimizing?

#

I swear, programmers

radiant fulcrum
#

Because it is useful advise to tell them when it's premature

#

because you end up making sacrifices in the name of speed you dont need

halcyon trail
#

You're making some pretty incredible assumptions here, that are totally unwarranted

#

Everything in the post indicates that he did reasonable due diligence, I have no idea why you two are still assuming that he's just wrong for wanting to move to a faster data format

radiant fulcrum
#

recommend something like BSON has more negatives than benefits because sure, its faster to serialize and deserialize but IO is still by far the slowest thing in the equation and BSON takes up more space than JSON generally

warm wadi
#

hey, how about a memory mapped file dumped from rust and read in python using c extension? just match the structures and padding and it should fit, should work faster by eliminating the intermediate, no?

#

doesn't have to be a file as such. I just mentioned file for god know what reasons

#

it could be a stream

flat gazelle
#

sending a file over a stream is non trivial

#

since a stream doesn't have a length

#

but the data likely does

halcyon trail
#

except that he indicated that he does need the speed....

#

you literally know nothing about his domain

#

his timescale

flat gazelle
#

the time is dominated by converting the vector arrays on the Rust side
seems like json is indeed the issue here

warm wadi
#

then the person should actually chime in and contribute some more to the discussion @naive apex

radiant fulcrum
#

true

#

But in the world of performance, IO is the first and often biggest bottlekneck

#

the bigger your encoding format the slower you IO generally and the lower performance

#

In rust's case Recommending another format isnt going to change much due to the serde backbone

visual shadow
halcyon trail
#

You're saying these things like they're profound.... Trying other formats is definitely worthwhile, we have no idea what his bottleneck is

warm wadi
#

its just tuples of float. a stream of fixed size byte buffer followed by known delimiter isn't that hard to put together. specially when you control both the systems

flat gazelle
#

yes, generally, JSON should be fine

#

but in this case, it isn't, as, well, the person measured it

halcyon trail
#

I only mentioned bson because it's exceptionally easy to try, if you're already using json

radiant fulcrum
#

Yes but BSON is made for storage rather than transfer

halcyon trail
#

certain, protobuf, or capn proto, etc, are better solutions, they're just a little more work to setup

radiant fulcrum
#

It serializes to be a much bigger size than JSON because of metadata

flat gazelle
#

I do find it odd, since I was sending 4G packed datasets with JSON over http realtime

halcyon trail
#

it's not much bigger. It depends what you are storing.

#

If you are storing large arrays of floats, for example, it can be smaller

flat gazelle
#

but well, I don't know what the exact specifics here are

radiant fulcrum
halcyon trail
#

lets

#

amusingly, I just remembered that we moved some data coefficients for models from json to bson, I actually have an email from a coworker with sizes of some of these files:
Json: 375M
BSON: 134M
MSGPACK: 91M

radiant fulcrum
#

pithink Are you sure the JSON wasnt pretty formatted

halcyon trail
#

I don't think so. but even so, it wouldn't explain a 3x in size. It's not very nested.

radiant fulcrum
#

bearing in mind that with BSON you add a considerable amount of metadata per field

#

every field has a the type and key and the data itself

  • you have the metadata for each document which is is another 2 bytes per doc for the size + the delimiters
#

So technically speaking i dont think it's ever possible to make BSON be smaller than JSON without compression

halcyon trail
#

....

#

you're literally looking at a contradictory data point, first of all.

#

second of all, your comment makes me genuinely thing that you don't realize that a float is smaller in binary than in text

radiant fulcrum
#

Im aware of that though you have to have a considerably sized integer or float to make it lesser than the text representation accounting for metadata

halcyon trail
#

it doesn't need to be "considerably sized" in the case of a float, it's just storing the full precision

#

anyway, facts are facts

radiant fulcrum
undone hare
#

Floats can be really huge

grave jolt
#

well,

#

JSON doesn't store floats, does it?

#

as in, it doesn't specify the precision or anything like that

radiant fulcrum
#

they're just stored as their text representation, doesnt have any concept of floats no

#

so 3.1 is just stored as 3 bytes

#

not the full 16 8* bytes

grave jolt
#

16?

#

why 16?

radiant fulcrum
#

well, if it's a f64 / double

grave jolt
#

you mean 8 bytes then?

radiant fulcrum
#

fuck yes

grave jolt
#

my point was that if you encode it as a double, you might lose information

undone hare
#

I do this way too often

grave jolt
halcyon trail
#

well, usually you compute it as a double to start with

#

it depends where your numbers are coming from

#

i shouldn't say usually I suppose

#

but anyhow you can see a trivial example where json is larger than bson in two minutes:

d = {"hello": [random.random() for i in range(1000000)]}
#

for me this results in a 20 meg json file, and a 16 meg bson file

#

messagepack is only 8.6 megs though

undone hare
#

If you share that many data through json, you are probably doing something wrong

#

This is really inefficient due to the ascii serialisation

halcyon trail
#

ideally yes, sometimes you don't have control though.

grave jolt
#

laughs in a 3-gigabyte SQL query

radiant fulcrum
#

also you can wang that though a compression algo and life is much nicer, although you can do that with any binary format really

halcyon trail
#

but if you do have control over both ends, and you care about perf, then yeah json is just a prety bad choice

radiant fulcrum
#

again sorta depends really

#

IO is still your slowest thing

#

sure if you have a big array of floats like that BSON will be slower (although others will be even smaller) which will mean less data to transfer but if its a bunch of strings etc.. theres a good chance JSON will slower

#

I mean I just stuck that 20 meg JSON file it produced into gzip and got some 8.8MB output

halcyon trail
#

gzipping is expensive

#

you're taking "IO is all that matters" as an article of faith at this point

radiant fulcrum
#

yeah it is a pretty expensive compression

halcyon trail
#

unzipping is expensive, parsing strings into floats is actually also quite expensive

radiant fulcrum
#

actually that was Zlib not gzip sorry

halcyon trail
#

At any rate, all these approaches in the end are much slower than approaches with schemas

radiant fulcrum
#

yes

halcyon trail
#

If i were doing something like this I'd definitely be using something more like protobuf from day 1

#

it also, most likely, saves you having to write some kind of reasonable dataclass/struct to hold the data on either side. when you are sending data between multiple languages protobuf-like approaches are hard to beat

radiant fulcrum
#
20269613 Bytes in
9578680 Bytes Out from zlib
0.20867420000000003 s
9578692 Bytes Out from gzip
0.22591870000000014 s
halcyon trail
#

i've actually now just out of curiosity been trying to create somethign that will be smaller in json than messagepack, and have not been successful

#

messagepack must be fairly clever

#
In [46]: def get_random_string(length): 
    ...:     # choose from all lowercase letter 
    ...:     letters = string.ascii_lowercase 
    ...:     return ''.join(random.choice(letters) for i in range(length)) 
    ...:      
    ...:                                                                                                                                   

In [47]: d = {get_random_string(10): get_random_string(10) for i in range(1000)}   
#

this still creates a 28K json file and 22K msgpack file

radiant fulcrum
radiant fulcrum
halcyon trail
#

@grave jolt touche 🙂 didn't try it

static bluff
#

What do you guys think about a built in 'regex' object, designed for building complex regular expressions pattern by pattern, and pretty printing them (plus some other functionality I guess)

grave jolt
#

Isn't that just a parser combinator library?

static bluff
#

I mean, someone who is half decent with regular expressions would probably have an easier time just writing it, but having a sort of 'toolkit' where you can go command by command, providing only a minimal amount of actual pattern, might be helpful for some

#

Oh probably. I don't really know what that is, but it sounds about right

flat gazelle
#

It would be nice to have some options for parsing in the stdlib, though I would prefer something that can also support irregular expressions

grave jolt
#

well, there's lark

flat gazelle
#

Lark often feels like overkill. Sometimes, you just need to parse sexprs without needing 50+ lines

grave jolt
#

well, yes

paper echo
#

there's also CSON and i think some other binary json-like thing

static bluff
#

has no idea what you guys are talking about 0.0

paper echo
#

or cjson?

#

idk

#

i know neovim settled on msgpack as their message format

warm wadi
#

that person should really add more context to the problem he’s solving with rust. I have more questions lol

#

Like, why not just use a c extension to process it within Python and forget the whole json business all together

paper echo
#

it sounds like their application already has a client-server architecture @warm wadi

#

however there might be a more specific format that makes sense for their usage

#

e.g. if it's a 500 kb array, maybe they should use the numpy data format

warm wadi
paper echo
#

who's to say that they're on the same machine, or even the same local network?

#

maybe there are other good reasons why they need or want client-server?

halcyon trail
#

but like salt said, it's already client-server, we should assume it's client-server for a reason

warm wadi
#

But that’s not the problem or point at all. Read that post again. They are happy with pickle performance of Python on both sides. Now only on server side they want to use rust to improve computation performance

#

So it’d become Python client and rust server

#

All I’m curious about is why can’t they use a c extension on Python server to improve computation performance. Then they don’t have to fight with encoding decoding stuff

halcyon trail
#

i read the post....

warm wadi
#

Then what part have I understood wrong?

halcyon trail
#

a single python program is also simpler than a python client and python server

#

but that's not what they have. so there's probably a reason for that, right?

paper echo
#

i understand

#

@halcyon trail they are saying, instead of rewriting the server in rust, they could use a c extension to do the computation within the python server

#

or i suppose rust with cffi (?)

halcyon trail
#

okay, I misunderstood dave's post then, not the original post

paper echo
#

i did too

halcyon trail
#

To me personally, that actually seems worse, but maybe it's a matter of taste

paper echo
#

it definitely does fix the "serialization is now really slow" problem

warm wadi
#

(You can address me as he/him, thanks 🙂 )

halcyon trail
#

Oh, I just said "dave" there because it was getting confusing

#

not because of pronoun unsure-ity

paper echo
#

that said, pickle is potentially very dangerous anyway - what if the client and server have different python versions? or any of the million other things that can go wrong with unpickling

halcyon trail
#

people tend to overuse pronouns a lot in technical discussions, one of the first things I remember my boss drilling into me. The number of times that misunderstanding of "it" has cost 10 minutes...

#

Yeah, I mean, the "serialization is very slow" problem can be fixed in many ways, there's nothing that special about pickle

#

I actually think protobuff is a pretty nice and obvious solution here

grave jolt
#

I'm only half joking

halcyon trail
#

it's a 2 for 1 value really, because protobuff gives you a) a serialization approach, b) an automatic translation/representation of data in both Rust and python at the same time

#

Writing extensions can still be quite a bit of work, and people tend to bring in libraries for that anyway if it's non-trivial, e.g. pybind11

#

but I guess it just depends

warm wadi
#

sometimes if code is really out of std library then simply changing to pypy gives ample performance boost. But again, hundreds of other things to care about for long term

paper echo
#

@naive apex what kind of data is this? a big array of numbers? some kind of deeply nested dicts and lists?

grave jolt
#

!rule 9 @azure siren We don't allow requests or offers of paid work here.

fallen slateBOT
#

9. Do not offer or ask for paid work of any kind.

azure siren
#

Sorry

static bluff
#

Maaaaaan

#

You guys are the best. I feel like I'm just drinking in knowledge talking with you all

sullen wolf
#

So many smart people in here, definitely will come here to ask questions in the future. Thanks for having me.

modern bough
#

Noob question, but can anyone point me towards some resources for learning Python VM bytecode?

prime estuary
# modern bough Noob question, but can anyone point me towards some resources for learning Pytho...

Well, the dis module docs https://docs.python.org/3/library/dis.html#python-bytecode-instructions has a list of all the current bytecodes and their functionality, and the opcode module has a bunch of lists with the actual indexes. One key thing about the behaviour is that CPython uses a stack to hold all in use data - load instructions push to the top of the stack, then operators pop their inputs, and push the result.

You may also want to consult ceval.c, which implements all the bytecodes and the core eval loop.
https://github.com/python/cpython/blob/main/Python/ceval.c

modern bough
prime estuary
#

Also for reference, the columns in dis are in order the line number, bytecode index, opcode name, opcode parameter (normally 0-255, with EXTENDED_ARG up to 4 bytes), then the decoded value of the parameter if useful (var name, constant value, etc).

#

The code object has a bunch of tuples the opcodes index into, like the constants array, the names array for global names looked up, etc.

#

Lines with a >> at the start are detected as the destination of a jump instruction.

static bluff
#

I don't think I'm ever going to understand this parser O.o

pliant tusk
#

afaik, negative opcode arguments do not occur in generated bytecode

modern bough
prime estuary
#

It does output it in order, but it just displays the offsets and line counts so you can keep track.

flat gazelle
#

!pban @wintry herald spam

fallen slateBOT
#

failmail :ok_hand: applied purge ban to @wintry herald permanently.

velvet cradle
#

I dont know how to add graphics yet, anyways im trying to make a game that is almost completely reliant on achievments, achievments is how you beat the game, and like i said i dont know graphics yet so it is going to be a text based game. Any ideas for the game and the name for the game

static bluff
#

Not to be mean or anything, but its the truth

real ruin
#

I'd expect the description to be accessed by msg.embeds.description

rich cradle
#

If you need node.js help, you should ask in off-topic

fallen slateBOT
static bluff
#

I'm starting to do the reading on PEGs and Python's new parser. I wanted to check my understand as it stands, and ask a couple of questions of you guys

#

So, a parser-expression-grammer is a set of rules written in a creole not unlike regular expressions, and is used to define the various patterns that constitute valid syntax. Unlike regular expressions however, no 'standard' universal-across-languages procedure exists to apply the expression against text. Additionally, a parser-expression assumes whatever system is applying the rules is capable of recursion and other mechanics not available in regular expressions

#

To apply the expression, the expression is fed to a 'parser generation' (possibly alongside a 'metagrammar': an additional set of rules/specifications which tell the parser-generator how to interpret the expression) which generates actual code capable of applying the patterns to text (or a stream of tokens)

#

Unlike Python's original pgen parser, a 'left-recursive pushdown parser with 1-token lookahead' (I have very little understand of what that means), a PEG-enabled parser is capable of both infinite lookahead and infinite lookbehind (left-recursion???). Additionally, it is a 'recursive-decent' parser: one which checks a given alternative all the way to completion or failure—consuming input in the event of success and moving on the next alternative in the event of failure without consuming input

#

The addition of an 'action' notation with a PEG enables an abstract syntax tree, as opposed to a concrete syntax tree, to be built directly within the parser. Use of 'memoization' (caching) and a few other tricks to save memory keep the parser running at linear speed

#

Did I get anything wrong? Have I displayed any poor or partial comprehension of anything important?

#

*Addendum: left recursion and infinite look ahead/behind allows for significantly more readable and sensical patterns, with fewer 'hacks' and reliance on post-processing
*Addendum: actions specified within the grammar act not unlike callbacks, and are used to actually generate the nodes constituting the ast (???)

paper echo
#

I believe a PEG is a grammar for expressing certain kinds of languages, and there are time-efficient parsing algorithms for parsing languages expressed as PEGs

#

Afaik the linear time algorithm you described is called "packrat parsing"

#

I guess because it memoizes a lot of stuff

static bluff
#

That's my understanding also

#

I guess the most important question—the possibly months-of-work-saving question is

#

Assuming no gigantic changes to the nature of the system, could I apply python's native parser generator to a modified version of Python's PEG and have the resulting parser work?

static bluff
#

Well, I'm building a language whose syntax is based off my Python's. A few minor tweaks but the real differences come from implementation, not syntax

true ridge
#

it is possible. Currently CPython's parser generator

#

outputs in 2 different languages, c and python

static bluff
#

Could I take Python's PEG, modify is as needed, and feed the modified expression to the same parser generator python uses to create working parser

#

🤤

true ridge
#

and the C parser is pretty specific to CPython since it uses a lot of internal functions

#

but if you were to use the Python generator and port the grammar, then it would work

#

In fact, there is already a work in progress PR to do so

static bluff
#

Being able to proceed rewriting only the grammar, and not the generator, would be a godsend

true ridge
#

and i believe author of that PR is also working on a language that is based on python (a python superset tbmp)

prime estuary
#

Or use another parser generator, there's quite a number around.

true ridge
#

Pegen is pretty cool tbh, especially with all the actions and it's custom expansion forms like ','.something+ etc

static bluff
#

So it sounds to me as though my way forward is to treat the parser generator itself as a black box for now and instead focus on having a thorough understanding of the language used to write the expression.

#

Write the expression, feed it to the generator, and I guess see what happens?

true ridge
#

if you are planning to make small changes on the grammar, you can even use the parser as is.

static bluff
#

As it stands now, the only difference is a few additional operators as well as multiline lambdas (through arrow notation)

#

The latter might be a bit tricky. Blocks within blocks

true ridge
#

for cases like this, i go with tokens. For example if you'd like to add something like $name, then you can simply alter the tokenizer to recognize ($) and then manually edit the token stream to replace $<something> with the form of __name_<something> and after the parser creates the AST, go over all the identifiers and replace the custom forms with their own nodes

true ridge
static bluff
#

You're not wrong with regards to your approach, but it would be a missed learning opportunity for me to go that route

#

What I'd much rather do is focus on learning how the grammar works first and feed it through a working generator, and once I know the grammar is sound, backtrack and build my own generator

#

Having both as question marks would make things way too ambiguous

true ridge
#

if your main purpose is learning, then I guess the proper way would be not caring too much about thoroughness (like how much of esoteric stuff that you could parse) but rather find a version of old python grammar (perhaps something 3.8<) and try to write a parser for it (or even parser generator, if you don't like hand written stuff)

true ridge
static bluff
#

Ahh, in my reading I've seen that the core devs feel its time to put pgen out to pasture

#

Too old

true ridge
#

well pgen is gone (there is still a fork of it living under lib2to3) but it is deprecated

#

and will be gone soon

#

though I'd say a LL(1) parser is much more fundamental and simple then the other variants out there

static bluff
#

I'm glad to hear I'm not completely misunderstanding the problem. I think I might be on the right track

#

I'm glad I took the time to write this all out. I was going to take a shower. But the process of building the parser seems much less like magic now that I've had a chance to put it all out in words

#

😄

true ridge
#

it is indeed really fun to work on. If you are interested in going even deeper, I'd really recommend 'Parsing Techniques: A Practical Guide' for other different methodologies

static bluff
#

Oh thank you!

boreal umbra
#

It seems that if you're working on a bunch of folders of interrelated Python files that aren't part of a library (you haven't installed it with pip install -e), the best way to avoid import errors is to use python -m from the root folder of the project. Am I right in thinking this?

raven ridge
#

it depends on the particular structure, but I think yes, python -m is more likely to work than anything else

prime estuary
#

Yes, since then sys.path keeps your working directory.

silk pawn
#

so i'm not exactly clear how python bytecode is turned into instructions for the computer, but basically i was wondering if, if python has, for example, two binary add opcodes in a row, if it used SIMD to execute it, since virtually every CPU supports that nowadays

#

i've been playing with simd in cython, and it's really cool, but i can't figure out how to check if python uses them

#

and if it doesn't, why not? guido had said that speeding up python is now a major goal

spark magnet
#

@silk pawn bytecode isn't turned into instructions, they are interpreted by a giant C switch statement.

silk pawn
#

oh

#

can you point me to the big switch table on github

spark magnet
#

@silk pawn and the add opcode has to figure out what "add" means for the object at the top of the stack.

silk pawn
#

let's say it determines that the object is a primitive int

spark magnet
fallen slateBOT
#

Python/ceval.c line 1813

switch (opcode) {```
spark magnet
silk pawn
#

sorry wrong terminology, i meant like if it determines that it's like a basic add for a C int

#

god i can't phrase this

spark magnet
silk pawn
spark magnet
#

it will have two BINARY_ADD bytecodes, and will call PyNumber_Add twice

silk pawn
#

because in c, i believe you can do (some syntax omitted)

int x = 1
int y = 2
int z = 3
int a = x + y
int b = y + z

and if you use some simd stuff then it does the adding in one instruction

spark magnet
#

this is CPython we're talking about. Other implementations like PyPy could be smarter

spark magnet
silk pawn
#

yes i understand that much, but i'd think python could try to emulate this behavior to make it faster

#

what is the barrier to python doing this

pliant tusk
#

i think the reason it doesnt is because python has very few guarantees about the type of an object

spark magnet
pliant tusk
#

the statement a = x + y results in LOAD_NAME 'x' LOAD_NAME 'y' BINARY_ADD STORE_NAME 'a'

#

BINARY_ADD is generic, so it will work for any object that defines __add__

silk pawn
grave jolt
#

the problem always boils down to the fact that Python doesn't know what type stuff is at compile time, I guess

spark magnet
#

@silk pawn did you see the comment in the BINARY_ADD switch case?

silk pawn
#

i saw the comment from victor about not micro optimizing

prime estuary
#

It's impossible for Python to know, unless everything's a constant.

#

And they're all local variables.

gleaming rover
#

yeah

#

if you need to add fast…you have numpy

#

and if you need to add very fast…numpy + numba

silk pawn
#

ok cool

#

thanks for answering, nedbat, gm, fix, chilaxan, teamspen

raven ridge
grave jolt
#

Interesting issue about subtyping and type narrowing
https://github.com/microsoft/pyright/issues/1899

from typing import Union, TypedDict

class Foo(TypedDict):
    x: int

class Foo2(Foo):
    y: int

class Bar(TypedDict):
    y: str

def f(foobar: Union[Foo, Bar]):
    if 'y' in foobar:
        print(foobar['y'].lower())

x: Foo2 = {'x': 1, 'y': 2} 

f(x)  # fails at runtime:
raven ridge
#

Python doesn't have a special type of int that wraps a native integer.

#

every int in CPython is represented as an array of base 2**30 digits.

halcyon trail
#

I think int in cpython is arbitrary number of digits

spark magnet
halcyon trail
#

I guess that is a way to look at it

spark magnet
#

it's the way the implementation thinks of it.

halcyon trail
#

I would have still thought in terms of an array of binary digits that expands in multiples of 30

#

I wonder why 30

#

I guess it uses the remaining 2 bits for something, not immediately obvious what

spark magnet
#

not sure, it might be so that overflows of digit/digit ops don't become inconvenient.

halcyon trail
#

That makes sense

#

It can just do 32 bit operations, and if the most significant bit gets set, set it back to zero and it knows to set the least significant bit in the next one

spark magnet
#

the code has a constant, PYLONG_BITS_IN_DIGIT, which is either 15 or 30

halcyon trail
#

I wonder why go this route when a huge fraction of machines running python are 64 bit

spark magnet
#

perhaps because 2**30 * 2**30 will fit into a 64-bit int

#

From a comment in the code: Type 'digit' should be able to hold 2*PyLong_BASE-1, and type 'twodigits' should be an unsigned integer type able to hold all integers up to PyLong_BASE*PyLong_BASE-1.

halcyon trail
#

Interesting stuff

spark magnet
#

and twodigits is uint64_t

halcyon trail
#

I wonder how it compares with arbitrary width integer implementations in C++

halcyon trail
#

I suppose you could actually compare them directly, using the C code and not going via python

cold dew
#

Hi there I'm new to the community and I need help with python for an assignment

versed fable
#

got pinged....

spark magnet
static bluff
#

Another on, eh?

halcyon trail
#

what is a "raid"

#

not a big discord user

#

some kind of attempt to do the discord equivalent of DDOS

raven ridge
#

yep. spam messages to disrupt conversation.

versed fable
static bluff
#

Thoughts on building a decorator to dynamically hint a method? Would I rebuild the method by a call to types.FunctionType and provide the new annotations as an argument?

grave jolt
static bluff
#

I'm actually just getting to the point of asking myself that very question

grave jolt
#

🙂

static bluff
#

Just, I find this rather unpleasant

    def generateTokens(self, whitespace:str=None, comment:str=None, number:str=None,
                       string:str=None, keyword:str=None, operator:str=None, identifier:str=None,
                       **matchgroups:ExpressionToplevel.Matchgroups):
grave jolt
#

Well, the whole point of type hints is that they provide documentation, and that tools like mypy and pyright understand them. If you generate the typehints dynamically, you lose all of the benefits

#

What does that function do?

static bluff
#

I'd much prefer (for my own language)

@annotate(matchgroups=ExpressionToplevel.Matchgroups)
@annotate(whitespace=str, comment=str, number=str, string=str)
@annotate(keyword=str, operator=str, identifier=str )
def generateTokens(self,whitespace, comment, number, string, keyword, operator, identifer):
#

Sorry that took so long to type. But its only just dawning on me in a formal sense that the language doesn't care about hints- introspection tools do, and they do so by searching the source literal

native flame
static bluff
#

Signatures have always been a bit of a sore spot for me. In a perfect world, I'd prefer

@annotate(matchgroups=ExpressionToplevel.Matchgroups)
@annotate(whitespace=str, comment=str, number=str, string=str)
@annotate(keyword=str, operator=str, identifier=str )
@default(whitespace=None, comment=None, number=None, key=Non)
@default(operator=None, identifier=None, matchg=None )
def generateTokens(self,whitespace, comment, number, string, keyword, operator, identifer):
static bluff
grave jolt
#

what does this function do? why does it have so many parameters?

static bluff
#

And hinting like that throws off the feng-shui for me (all of this is completely topical, and unimportant by the way)

#

Well as the name suggests, it generates tokens. Each one of those arguments is a capture group as returned by match.groups()

#

So its either a string, assuming the group matched, or none

#

Jeeze, my spelling is bad today

grave jolt
#

Is it possible that more than one of the arguments is not None?

static bluff
#

Generally most of them are

#

Sorry, no, I misread

#

Generally speaking, only one will be not-none

grave jolt
#

So you have a regex like (?P<foo>...)|(?P<bar>...)|..., right?

static bluff
#

More or less

grave jolt
#

So if you don't have a bug, it's impossible to have more than one matched group?

static bluff
#

In this specific case, yes. 'generateTokens' only takes capture groups corresponding to the main lexical categories- only one will ever match. Other similar methods can have multiple matches though, for example

    def generateBaseX(self, number:str=None, integer:str=None, floatpoint:str=None,
                      exponent:str=None, complex:str=None,
                      **matchgroups:ExpressionToplevel.Matchgroups):
        """Generate a number token of integer, floatpoint, exponent, or complex.
        NOTE: Complex takes precedence over exponent, which takes precedence over other formats.
        NOTE: Exponents are floating point numbers by definition."""

        if complex:
            return self.token('NUMBER', 'COMPLEX', self.source.advance, number);

        if exponent:
            return self.token('NUMBER', 'EXPONENT', self.source.advance, number);

        if floatpoint:
            return self.token('NUMBER', 'FLOATPOINT', self.source.advance, number);

        if integer:
            return self.token('NUMBER', 'INTEGER', self.source.advance, number);
grave jolt
#

Why not make a single object for the match result?

#

and then access its fields

static bluff
#

Fix you beautiful son of a bitch

grave jolt
#

👀

static bluff
#

+2

grave jolt
#

Or just accept the match as the argument.

static bluff
#

???

grave jolt
#

the re.Match object

static bluff
#

I could, though I generally prefer having a little more control than that. I might want to switch up how the object gets printed or else add some other functionality

static bluff
#

Is python's parser process a single step? I've seen it separated in some contexts into 'syntax analysis' and 'semantic analysis', but those might be so closely interwoven that they could be executed in a single step

deft pagoda
#

I remember a quick tokenizer made during a beazley talk:

#

!e

import re
from collections import namedtuple

tokens = [
    r'(?P<NUMBER>\d+)',
    r'(?P<PLUS>\+)',
    r'(?P<MINUS>-)',
    r'(?P<TIMES>\*)',
    r'(?P<DIVIDE>/)',
    r'(?P<WS>\s+)',
]

PARSER = re.compile('|'.join(tokens))
Token = namedtuple('Token', 'type value')

def tokenize(text):
    scan = PARSER.scanner(text)
    for match in iter(scan.match, None):
        if match.lastgroup != 'WS':
            yield Token(match.lastgroup, match.group())

print(*tokenize('2 + 3*4 - 5'))
fallen slateBOT
#

@deft pagoda :white_check_mark: Your eval job has completed with return code 0.

Token(type='NUMBER', value='2') Token(type='PLUS', value='+') Token(type='NUMBER', value='3') Token(type='TIMES', value='*') Token(type='NUMBER', value='4') Token(type='MINUS', value='-') Token(type='NUMBER', value='5')
static bluff
#

Can I get a hand my dudes? I've asked in the general and also in a channel, no bytes (get it?)

#
self.source = source if isinstance(source, str) else source.read().decode();
self.sourcelines = io.StringIO(self.source).readlines();
#

I'm taking in input as either a string or a file-like io object. I want to get both the source and its constituent lines in unicode form

#

This approach is better than my original, but still seems off

sacred yew
#

uh is there a reason why you cant just do self.sourcelines = self.source.split("\n")

static bluff
#

I believe that different operating systems use different newline separators, no? Assuming that's true, I need to split the input using the same procedure a file-like object would

grave jolt
#

when reading from file, Python turns newlines to \n

static bluff
#

Oh, well that certainly helps

native flame
#

str.splitlines also exists

grave jolt
#

we shall not speak of the ;

static bluff
#

SPLIT-lines. I KNEW there was a method for that

#

I thought it was readlines

native flame
#

readlines exists too

#

but its not a string method

static bluff
#

Now, should I actually care about decoding the input? For lexing purposes, does it matter?

grave jolt
#

why are you even splitting the source into lines?

#

hm, I guess it can be useful to store the line number for each token

static bluff
#

In case of a syntax error I need to supply the full line of text on which the error resides. Keeping an array of the lines and referencing them by the lexer's line index seems to me the most logical route

#

About the decoding?

pallid trout
#

Uh who pinged?

static bluff
#

I mean, yes, I'll want the string in unicode form so it can be matched against the regex, no?

unkempt rock
#

who pinged me?

grave jolt
true ridge
static bluff
#

Is this normally achieved through if statements, or is there some sort of grammar applied?