I ran your code through the profiler. Here's what I saw. 75-80% of the time is spent on managing Executor stuff, of that 5% is time waiting on latch.await, so it's mostly task submission that is the slow down. I increased the iteration to 1million to give it more to chew on and runtimes really don't change much. Originally it was 5 to 6 secs, regardless of number of threads used. After bumping iterations to 1million, 53-58sec.
Looking at the timeline feature of the profiler, which shows green/red coloring for time the thread is active/waiting, it shows that almost all the work is begin done in the first thread, and roughly logarithmically less for each additional thread. So number of threads doesn't matter because most of the work is being assigned to the first thread.
I remove the executor completely and just ran single threaded with 1million iterations and it was 22seconds, very different from 53-58seconds. So you can see that even using a single threaded executor is twice as slow (on my machine). So it's no wonder adding more threads just adds more overhead and zero performance gain.
Conclusion: the task you are running, the row calc, is too fine grained.