#Help debugging deadlock

49 messages · Page 1 of 1 (latest)

lost pewter
#

I have a WebSocket server with async-tungstenite. After two clients connect, authenticate, and disconnect, the server deadlocks, times out, and fails to accept any new requests.

The relevant code is here: https://github.com/Nextflow-Cloud/easylink/blob/main/src/services/socket.rs

I already drop unused references, however that does not help. I spawn a thread for each new socket connection and log Socket connected but they fail to log the message.

plush egret
#

just sprinkle some printlns if you need to

#

anyway, there are some questionable design decisions here

#

for example, why pub heartbeat_tx: Arc<Mutex<Sender<()>>>,?

#

that mutex is useless

#

i think rust if let Some(client) = clients_moved.get(&id_moved) { let mut socket = client.socket.lock().await; socket.close().await.expect("Failed to close socket"); drop(socket); clients_moved.remove(&id_moved); } looks particularly gnarly, but i am not sure

lost pewter
#

I tried removing it before

#

Further debugging shows that on the close event the clients.remove is deadlocking

#

Interestingly enough, the first time a client disconnects, that doesn't run but the program does not deadlock, but the second time a client disconnects, the program deadlocks

plush egret
lost pewter
plush egret
#

Full error please

#

from cargo check

lost pewter
#
`std::sync::mpsc::Sender<()>` cannot be shared between threads safely
the trait `std::marker::Sync` is not implemented for `std::sync::mpsc::Sender<()>`
required for `Arc<std::sync::mpsc::Sender<()>>` to implement `std::marker::Send`
required for `dashmap::util::SharedValue<RpcClient>` to implement `std::marker::Send`
required because it appears within the type `(std::string::String, dashmap::util::SharedValue<RpcClient>)`
required for `hashbrown::raw::inner::RawTable<(std::string::String, dashmap::util::SharedValue<RpcClient>)>` to implement `std::marker::Send`
required for `lock_api::rwlock::RwLock<dashmap::lock::RawRwLock, hashbrown::map::HashMap<std::string::String, dashmap::util::SharedValue<RpcClient>, RandomState>>` to implement `std::marker::Sync`
required because it appears within the type `[lock_api::rwlock::RwLock<dashmap::lock::RawRwLock, hashbrown::map::HashMap<std::string::String, dashmap::util::SharedValue<RpcClient>, RandomState>>]`
required for `Unique<[lock_api::rwlock::RwLock<dashmap::lock::RawRwLock, hashbrown::map::HashMap<std::string::String, dashmap::util::SharedValue<RpcClient>, RandomState>>]>` to implement `std::marker::Sync`
required for `Arc<DashMap<std::string::String, RpcClient>>` to implement `std::marker::Send`
#

The full error is too long to send here

plush egret
#

ah right, that's std's channel

#

try using an async one, like tokios for example

#

anyway, that's probably not why you are deadlocking

#
if let Some(client) = clients_moved.get(&id_moved) {
     clients_moved.remove(&id_moved);
}
``` should deadlock on its own
#

that client_moved thing is a dashmap, and get creates a lock into the dashmap that will be held for the entire block

lost pewter
#

Interesting, so when remove is called the client is still held which means it deadlocks?

plush egret
#

yeah

#

you could do rust if let Some(client) = clients_moved.remove(&id_moved) { /// }

lost pewter
#

I see, that makes sense

#

So it would be like this

#
                if let Some((_, client)) = clients_moved.remove(&id_moved) {
                    let mut socket = client.socket.lock().await;
                    socket.close().await.expect("Failed to close socket");
                    drop(socket);
                }
plush egret
#

oh right it's key, value

#

yeah

lost pewter
#

Thank you

plush egret
#
let mut socket = client.socket.lock().await;
socket.close().await.expect("Failed to close socket");
drop(socket);``` in itself can be problematic because you're holding this asynchronous mutex across an await
lost pewter
#

How would that best be refactored

#

I do realise that holding locks over await may be bad but I don't see a clear method of fixing it

plush egret
#

(it's tokios docs, not async_std's, but the idea is pretty much the same)

lost pewter
#

Seems like that fixed the deadlock.

plush egret
#

in general you want to get rid of mutexes if you can. If you need one, try putting them in a struct and act through methods on it (that's pretty much what a channel is, by the way)

lost pewter
#

Also the first sentence of the paragraph:
"The feature that the async mutex offers over the blocking mutex is the ability to keep it locked across an .await point. "

plush egret
#

and if you can just use a channel over a mutex do that

#

for example, you can just create a task whose only responsibility is managing a connection, and you communicate with it through channel(s)

lost pewter
#

Ah I see how that would work

#

But it sounds like extra complexity I'd have to manage

plush egret
#

well, the complexity has to go somewhere

#

it's best to tuck it away in a good abstraction so you dont have to worry about it

lost pewter
#

I've moved the socket sending functionality into a separate task

#

The deadlocking just started happening again

#

Still on the close I believe

plush egret
#

i dont have any sus places this time. but it would help if you split up this function, it is very long