#How to parallel this better?

1 messages · Page 1 of 1 (latest)

untold inletBOT
#

<@&987246717831381062> please have a look, thanks.

keen lion
#

Can't improve something if it's not measured. So that would be the first thing to do I think. There are a lot of performance optimizations you can make but generally two things stand out as low hanging fruit. First, try to maintain cache locality. Do as many operations on the same data as you can. Second, don't allow multiple threads to fight over access to same data. Sometimes that means looking closer at data access. Cache lines are 64bytes on most systems, so the that's what gets read and written at a time when going to main memory. Two threads trying to read/write to different memory but if those different memory locations fall into the same cache line then it will be trouble. Memory access that jumps around (pointer chasing) will have the worst performance.

verbal lynx
#

Maybe virtual threads would help, I would test it

keen lion
#

virtual threads are for i/o bound tasks

verbal lynx
keen lion
#

He'll have to provide more information but generally regular threads will be better once he figures out how to load it.

lost tendon
#

it drops because some of the threads end and find there's nothing left in the pool.

#

then the while loop restarts and the pool is ... refilled.

#

if i have 8 threads at the end of the list the 1st worker finishes, looks in the pool, sees it is done, and now sits idle until the other 7 finish. I'm sorry if I'm mangling terms, I trust you get what I mean.

keen lion
#

Yea, you'll have to provide more about the computation as to why tasks aren't equal sized.

lost tendon
#

path tracing is non-deterministic. light enters the scene, bounces around, and monte carlo might end it in a few bounces or in 25.

#

why is that relevant?

keen lion
#

So you can't take smaller chunks?

lost tendon
#

what are you talking about, "chunks"?

keen lion
#

Seems like there is work to be done but it's just sitting in some other threads queue doing nothing.

lost tendon
#

that's what I was saying from the start?

keen lion
#

I think you'll have to show code. Like what is allPixels, is it an array? Is tracePixel accessing memory in that same array?

#

It may be intractable, but until you start MEASURING we can guess all day. Do 90% take one bounce? Do 50% take 25 bounces. that sort of thing

lost tendon
#

there's clearly a misunderstanding here. You're concerned about what's going on inside each thread while they're working and I'm concerned about scheduling between tasks.

#

i have 8 workers that are going to process a hundred pixels 50 times each. because i do a while() { parallel-all-pixels }, there's a moment at the end of the loop where some workers are done and some are not. i want to eliminate the gap and keep them working.

keen lion
#

Yea so decrease that 100 pixels to 50 or 25 or whatever. But you have to measure something.

#

Have you tried more threads?

lost tendon
#

parallel() decides the threads for me.

keen lion
#

Well don't use that, it's historically shit.

#

Executors

lost tendon
#

omg. stahp.

keen lion
#

My experience is that just because you have 8 processors on your cpu, doesn't mean you can only see the best performance with 8 threads. I just call it taskPerProcessor. You can try increasing that or reducing task size in general. But you'll get no where if you don't measure anything.

hoary umbra
#

why the fuck is the original message deleted?

#

ugh

#

seemed like something fun to figure out

knotty musk
#

@lost tendon did you solve it?

lost tendon
#

I dropped the subject because surly was making me crazy. it was easier to flip the table than to be rude.

keen lion
#

You've done both now. Flipped the table and be rude.

#

The original question was literally this:

allPixels.stream().parallel().foreach(this::trace)

#

with no further information.

knotty musk
#

Well, do not use parallel, but the Executor service you need for your situation, configured with the thread type you want.