Hey, at work we (have to) use a (proprietary) library that previously allowed access from multiple threads. Now they changed it and we can't do that anymore. For over 10 years the program evolved around multi-threaded usage of this library, so it's a total fucking mess (I'm am not employed there for nearly that long btw.). I think the clean solution would be to encapsulate the whole API behind some thread-safe queue (with a single worker thread that accesses the API), but that would take too much time right now (likely many weeks). So I first thought about protecting all API accesses behind a single mutex, just to make it work at least. In the past we had one thread waiting for messages from this library in a loop and other threads doing other stuff (like sending messages to it). The message reception function has a timeout which previously was ~100ms, but to not lock the mutex for too long, I moved it to the lowest possible value, 1ms. Sadly this leads to the receiving thread busy looping and locking the mutex constantly. Other threads therefore have to wait up to 50-70ms for this mutex. Previously our requirements were < 5ms for a single transaction, of which interaction with this library is only a small part of, so this is not a good option. I think this is a classic priority inversion scenario and what I wanted to do next was implement some sort of fifo mutex. I know it's hard to make suggestions without knowing much more, but does someone see other (smarter) ways to work around this?
#Work around priority inversion problem
18 messages · Page 1 of 1 (latest)
When your question is answered use !solved to mark the question as resolved.
Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question run !howto ask.
just so you know that at least someone is looking and people aren't ignoring your question... well, I don't have a great answer for you, sorry
I'd think that you could centralize the queue itself with a simple lock that is not time bound in really any way
then just put a timer/throttle on the actual thing that pops messages off the queue to send them to theh API
in other words, I'm not sure why there needs to be a time throttle on the mutex locking
but, this is all a bit much to wrap my head around and understand, so yeah, don't have a great answer for you, maybe someone else does
It sounds like contention and not priority inversion to me. Contention is just many threads wanting a resource that only 1 thread can have. Priority inversion is when a high priority thread hold a resource that a low priority thread wants, thus effectively giving priority to the low priority thread.
You can't really do much about contention. When 10 threads want to hold the mutex 20% of the time, well, it's just not possible no matter what you do. You can make an attempt at speeding the operation up somehow so everyone gets enough, but that is probably not possible.
You can give the thread waiting for messages a sleep so other threads get a chance to push their message. Just make sure the mutex implementation isn't "clever" and uses a spinlock for waits less than x milliseconds which would make it a busy loop with extra overhead.
Another option you could explore is that you effectively make your own queue. An std::deque with a mutex to push messages into and a thread that bulk-pushes those messages into the library every x milliseconds. It may or may not reduce contention.
all solutions that i can think of (there are a few) fail your "i just want to do this quickly" requirement.
sounds like you got that backwards
sleep() is ALWAYS a silly hack, don't. use events, or semaphores or mutexs.. don't use sleep
Well yeah, you want to just have the library call a callback and be done with it. I got the impression the library is garbage and a good solution is not possible.
Also I got the impression that part of the problem is the receiver thread locking the mutex all the time and starving the other threads. A sleep is something that can quickly help there. It's terrible, it adds unnecessary delay while still wasting resources, but it will work better than without and might be the quick fix hack that gets things to run within hours, not weeks.
Thanks for the results. The fifo mutex actually does solve the issue single transactions, so I do not think it's contention per se, because the problem is not that I wait for the locks everywhere, but just not in the busy looping thread. Now the problem is of course, that the reception thread will always be second in line and lots of time will be spent waiting for that 1ms timeout, severely limiting the throughput. I think this is not just a software problem, but an organizational one. We shouldn't use this broken pos library and it should never have changed to single-threaded access only. Before I do anything else, I'll ask THEM what they think we are supposed to be doing now. I feel like this whole API makes no sense anymore. your suggestions were good to think about, thanks a lot for your help!
!solved