#How unsafe is this?

70 messages · Page 1 of 1 (latest)

safe wyvern
#

I am writing a Rust extension for Python using PyO3 and I am planning to finalise my release for v1.0 which introduces the following async API.

// function signature of `future_into_py` as reference
// pub fn future_into_py<F, T>(py: Python, fut: F) -> PyResult<Bound<PyAny>>
// where
//    F: Future<Output = PyResult<T>> + Send + 'static,
//    T: for<'py> IntoPyObject<'py>

fn hash_async<'a>(&self, py: Python<'a>, bytes: &'a [u8]) -> PyResult<Bound<'a, PyAny>> {
    let seed = self.seed;
    let hasher = self.hasher;
    let bytes_static = unsafe { std::mem::transmute::<&'a [u8], &'static [u8]>(bytes) };

    future_into_py(py, async move { gxhash(hasher, bytes_static, seed) })
}

It uses std::mem::transmute to transmute a bytes array, owned by the Python interpreter, into a static lifetime so that it can be used in future_into_py which is essentially a wrapper for tokio::spawn. I understand that doing this is extremely dangerous but my rationale here is that bytes has the same lifetime as py, which is the Python interpreter. As a Rust extension for Python, should I even care what Tokio is doing with my bytes array if the Python interpreter no longer exists?

AFAIK, the only other way to make this work is to clone bytes which I am not willing to do because it is an order of magnitude slower than just passing the bytes array directly.

My question is:

  1. What are the possible implications for my Python users, if at all?
  2. Is there a better way to do this without affecting performance?
umbral estuary
#

I'm not familiar with pyo3, but this is almost certainly a use-after-free bug

#

if you want to perform the hashing asynchronously then you also have to incrementally read the bytes from the Python world, with the lock held.

#

which says it is an immutable block of bytes, so if your async task holds it, it can borrow &[u8] from that

safe wyvern
umbral estuary
#

sorry, let me clarify: I meant you would take the GIL repeatedly to read chunks of bytes

#

however, PyBytes looks like a much better option than doing any of that

#

so your code should be something like

fn hash_async<'a>(&self, py: Python<'a>, bytes: Py<PyBytes>) -> PyResult<Bound<'a, PyAny>> {
    let seed = self.seed;
    let hasher = self.hasher;

    future_into_py(py, async move {
        let slice = {
            let py = todo!("you need to get a new `Python` here somehow");
            bytes.as_bytes(py)
        };
        gxhash(hasher, slice, seed)
    })
}

I don't know how the part I marked todo! is supposed to work though

safe wyvern
#

Yeah, I am testing it right now. Praying it doesn't clone 😄

safe wyvern
#

as_bytes indeed does not copy.

pub(crate) fn as_bytes(self) -> &'a [u8] {
    unsafe {
        let buffer = ffi::PyBytes_AsString(self.as_ptr()) as *const u8;
        let length = ffi::PyBytes_Size(self.as_ptr()) as usize;
        debug_assert!(!buffer.is_null());
        std::slice::from_raw_parts(buffer, length)
    }
}
#

Anyways, it seems to work. Performance seem to have taken a 25% hit but it's not as bad as cloning.

safe wyvern
unborn cipher
#

its unclear why this function is async

#

depending on what gxhash is doing you'll probably be better off forgetting that async exists and using allow_threads instead

safe wyvern
#

Rayon would be more useful for something like a hash_batch function.

unborn cipher
#

I'm thinking it would be more useful to let the event loop run while you are hashing at the same time

safe wyvern
#

If you are hashing in a web server, do you want to let it block other unrelated requests from coming through?

unborn cipher
#

Assuming you're actually using multiple threads

safe wyvern
#

Python has no way of knowing that the event loop can continue without using async.

unborn cipher
#

But you are holding the gil the entire time.

safe wyvern
#

future_into_py drops the GIL.

unborn cipher
#

Things aren't happening at the same time, they're just taking turns

safe wyvern
unborn cipher
unborn cipher
#

Any time you have a Python token in scope, either explicitly or implicitly, that means no other python code gets to run

safe wyvern
#

I have a bunch of tests that shows that it does not block the event loop while hashing a 10 GB file and I have compared it with a non-async variant as well.

unborn cipher
#

are you not using pyo3_asyncio

safe wyvern
unborn cipher
#

yeah that's good, that you aren't using it

safe wyvern
unborn cipher
#

which the pyo3 async crate might end up doing? It's not entirely clear to me, I wasn't personally involved with it

safe wyvern
#

I think PyO3 implicitly drops the GIL if you aren’t using with_gil.

unborn cipher
#

gather works concurrently, not parallel

safe wyvern
#

As you know, gather essentially just runs the coroutines and waits for a callback.

#

It doesn’t know whats happening in the coroutine, especially if it’s calling non-Python code

unborn cipher
safe wyvern
unborn cipher
#

Also the gil is always held when a pyfunction is called, you can put a python token in the signature

unborn cipher
unborn cipher
#

i think you would be better off having a regular function (in which you call py.allow_threads and do the hashing in that), and use it with asyncio.to_thread

safe wyvern
unborn cipher
#

like rust fn hash<'a>(&self, py: Python<'a>, bytes: &'a [u8]) -> ... { let seed = self.seed; let hasher = self.hasher; py.allow_threads(||{ hash(bytes) }) }

#

Anyway, was fun diving into the async runtimes crate, im much less familiar with it

safe wyvern
#

Yeap

#

I ran a benchmark on both of these implementations

#

and I had the same performance on both of them

safe wyvern
# unborn cipher that's why I linked https://pyo3.rs/v0.25.0/parallelism.html

I ran PyStack on the program and as you can see (thread 1477088), there is no contention for the Python thread. I also found a bug(?) in pyo3_async_runtime. It seems that tokio is spawning more threads than necessary.

Anyways, I looked through the pyo3_async_runtime codebase, and like you said, they never drop the GIL anywhere. My guess is that any threads spawned outside of Python will not contend with the GIL and Python will somehow know to let other bytecode run?

unborn cipher
#

it looks like it ends up calling the spawn method on the runtime (like tokio), so it might get to run on a different thread

safe wyvern
#

And as expected, running all 4 hash units in parallel with asyncio.gather saturates 4 CPU cores

unborn cipher
#

id suggest offloading this to spawn_blocking maybe

safe wyvern
# unborn cipher id suggest offloading this to spawn_blocking maybe

That'd be nice but I'd have to reimplement all the annoying Python async context and cancellation stuff. I tried running 24 hash units (double my core count) in parallel to see if there would be thread starvation, but it seems tokio properly queues them without any extra intervention.

unborn cipher
#

you can do quite a bit of blocking things in tokio's task pool before things really degrade

safe wyvern
#

I'll think about it

#

If I am going to rewrite and use spawn_blocking, I might as well use Rayon instead.