#How to parallel this better?
1 messages · Page 1 of 1 (latest)
Can't improve something if it's not measured. So that would be the first thing to do I think. There are a lot of performance optimizations you can make but generally two things stand out as low hanging fruit. First, try to maintain cache locality. Do as many operations on the same data as you can. Second, don't allow multiple threads to fight over access to same data. Sometimes that means looking closer at data access. Cache lines are 64bytes on most systems, so the that's what gets read and written at a time when going to main memory. Two threads trying to read/write to different memory but if those different memory locations fall into the same cache line then it will be trouble. Memory access that jumps around (pointer chasing) will have the worst performance.
Maybe virtual threads would help, I would test it
virtual threads are for i/o bound tasks
Yes, but they said there are CPU usage drops, so if I'm not mistaken it might help load CPU
He'll have to provide more information but generally regular threads will be better once he figures out how to load it.
it drops because some of the threads end and find there's nothing left in the pool.
then the while loop restarts and the pool is ... refilled.
if i have 8 threads at the end of the list the 1st worker finishes, looks in the pool, sees it is done, and now sits idle until the other 7 finish. I'm sorry if I'm mangling terms, I trust you get what I mean.
Yea, you'll have to provide more about the computation as to why tasks aren't equal sized.
path tracing is non-deterministic. light enters the scene, bounces around, and monte carlo might end it in a few bounces or in 25.
why is that relevant?
So you can't take smaller chunks?
what are you talking about, "chunks"?
Seems like there is work to be done but it's just sitting in some other threads queue doing nothing.
that's what I was saying from the start?
I think you'll have to show code. Like what is allPixels, is it an array? Is tracePixel accessing memory in that same array?
It may be intractable, but until you start MEASURING we can guess all day. Do 90% take one bounce? Do 50% take 25 bounces. that sort of thing
there's clearly a misunderstanding here. You're concerned about what's going on inside each thread while they're working and I'm concerned about scheduling between tasks.
i have 8 workers that are going to process a hundred pixels 50 times each. because i do a while() { parallel-all-pixels }, there's a moment at the end of the loop where some workers are done and some are not. i want to eliminate the gap and keep them working.
Yea so decrease that 100 pixels to 50 or 25 or whatever. But you have to measure something.
Have you tried more threads?
parallel() decides the threads for me.
omg. stahp.
My experience is that just because you have 8 processors on your cpu, doesn't mean you can only see the best performance with 8 threads. I just call it taskPerProcessor. You can try increasing that or reducing task size in general. But you'll get no where if you don't measure anything.
why the fuck is the original message deleted?
ugh
seemed like something fun to figure out
@lost tendon did you solve it?
I dropped the subject because surly was making me crazy. it was easier to flip the table than to be rude.
You've done both now. Flipped the table and be rude.
The original question was literally this:
allPixels.stream().parallel().foreach(this::trace)
with no further information.