#Python vs gleam parallelism problem

1 messages · Page 1 of 1 (latest)

untold shoal
#

Hi! So, I'm trying this thing called gleam. Have you ever heard of it before?

Anyway, I've heard a lot about the parallelism model in erlang and how performant it is. So, as a python developer I was very interested in it and decided to check it out. I recreated an example from the homepage of the site in python.

Python parallel task example:

import multiprocessing as mp

def spawn_task(i: int):
    print("Hello from python task ", str(i))

def main():
    mp.Pool().map(spawn_task, range(200_001))

if __name__ == "__main__":
    main()

Gleam parallel tasks example:

import gleam/int
import gleam/io
import gleam/list
import gleam/otp/task

fn spawn_task(i) {
  task.async(fn() {
    let n = int.to_string(i)
    io.println("Hello from gleam task " <> n)
  })
}

pub fn main() {
  list.range(0, 200_000)
  |> list.map(spawn_task)
  |> list.each(task.await_forever)
}

And I got a very strange result. Python was about 7 times faster than gleam. And yes, I built and exported project to erlang_shipment, so this a problem with program, not the runtime.

So, now I have a question: what I did wrong?

P.S. To be honest, I also wrote a python script using the asyncio library in the same way. And yes, it runs slower, but still faster than gleam.

Python asyncio example:

import asyncio

async def spawn_task(i: int):
    print("Hello from asyncio python task", str(i))

async def main():
    await asyncio.gather(*[spawn_task(i) for i in range(200_001)])

if __name__ == "__main__":
    asyncio.run(main())
calm prairie
#

Personally I've never found these kinds of non-representative benchmarks useful. But some differences:

  • Seems Pool().map() maps the data in chunks, so it's not running 200k separate tasks but rather fewer tasks that each map multiple items.
  • The Gleam version will wait for the result of each task in order, so if there are later tasks that are done first, it will still wait for the earlier ones
zenith phoenix
#

All three examples there are doing wildly different things

#

I'm not really sure what you're trying to measure

#

In this specific example the bottleneck is going to be the printing

#

You're spinning up 200,000 processes and then they all fight stdout which is a single shared resource that cannot be used concurrently.