#Is my use of CompletableFuture and runAsync() is thread safe ?

1 messages · Page 1 of 1 (latest)

fresh eagle
#
 private final Map<Integer, Long> sujetCount;
...

public void updateStatistical(CompactTriple t){
        if (t.hasAnyUnknownKey() || t.isSujetVariable() || t.isPredicatVariable()|| t.isObjetVariable())
            throw new IllegalArgumentException("Le paramètre t doit être composé seulement de littéraux connus");

        sujetCount.merge(t.sujet(), 1L, Long::sum);
...
  private boolean addTriple(CompactTriple triple) {
        AtomicBoolean wasAdded = new AtomicBoolean();

        CompletableFuture.allOf(
                CompletableFuture.runAsync(() -> wasAdded.set(spo.add(triple.toT3(OrdreT3.SPO)))),
                CompletableFuture.runAsync(() -> sop.add(triple.toT3(OrdreT3.SOP))),
                CompletableFuture.runAsync(() -> pso.add(triple.toT3(OrdreT3.PSO))),
                CompletableFuture.runAsync(() -> pos.add(triple.toT3(OrdreT3.POS))),
                CompletableFuture.runAsync(() -> osp.add(triple.toT3(OrdreT3.OSP))),
                // CompletableFuture.runAsync(() -> ops.add(triple.toT3(OrdreT3.OPS))),

                CompletableFuture.runAsync(() -> statisticalStore.updateStatistical(triple))
        ).join();

        return wasAdded.get();
    }

Is there any problem with this implementation and use of CompletableFuture.runAsync()?
Doing it in parallel gives me a performance gain, even though it's just for one Add (i am doing it 100k times).

ancient krakenBOT
#

<@&987246399047479336> please have a look, thanks.

lost hedge
#

no

#

not unless this map is thread safe

#

Map<Integer, Long> sujetCount

fresh eagle
#

Even if i call sequentially the addTriple, i use CompletableFuture for 7 async opérations
BUT I wait each thread to finish (.join())
So there is no risk to get a race conditions in my updateStatistical right ?

remote solstice
#

These are all separate hashmaps running in their own thread I think so no problem.

#

This uses the fork join pool. You could use your own pool that is limited to number of cores. I suspect you're limited by cache thrashing.

#

I forget what this project was about.

#

You have some RDF file? with triplets of strings, correct? You've mapped the strings to unique int ids and are just using the ids here?

#

Using ids to create these six permutations of subject, predicate, object maps?

fresh eagle
#

Yes that's right !*

#

I am a file of 100k triplets after putting all of them in a list, i loop the list to add them in my Hexastore

#

Using addTriplet Sequentially

#

addTriplet use 7 threads, and i don't really understand the idea of a custom pool.
Is it better for performance, like it's use the same threads so we don't need to search for one each time we call add Triplet ?

plush acorn
#

You could potentially have a data race problem. The join would establish a happens-before order but you're probably trying to be too clever if you're trying make use of it.

remote solstice
#

I don't know his machine specs but it's a reasonable amount of data. So I think he's just bumping up against cache thrashing issues by trying to run all threads at the same time instead of 3 and 4.

plush acorn
#

If each map is only read and written by one thread that might be fine but if one thread writes and one reads then it's still a data race.

fresh eagle
#

wow

remote solstice
#

He's just storing data at the moment.

fresh eagle
#

If i make a Executor of exactly 7 threads, then I modify the way i store counter to a ConcurrentHashMap, is it overkill ?

remote solstice
#

Does your cpu go brrrr to 100% currently?

fresh eagle
#

I dont think so haahah

remote solstice
#

Do you have a lot of ram?

fresh eagle
#

27% my task manager says

remote solstice
#

When all these 7 threads are running?

fresh eagle
#

32GO of ram

remote solstice
#

If you think it's fast enough and your getting correct results... what is the issue?

fresh eagle
#

I need to write a report of my work and i just want to understand what i am doing

#

and learn if its the correct way

#

by instance I dont really know if TreeSet is faster than Hashmap<Integer, Hashmap<Integer, Set<Integer>>> for each index.

I want something very fast

remote solstice
#

I would probably opt to process all 100k triples in each thread instead of doling it out one at a time to each thread. There's unnecessary overhead.

#

TreeSet is just a set that remember order

#

I don't ever think we got to how the data is supposed to be used.

fresh eagle
fresh eagle
#

I don't think i need to order things, but it seems very haunting to use Hashmap<Integer, Hashmap<Integer, Set<Integer>>> hahahaha

#

With this project and these type of questions, I am scared you are all thinking of me like a crazy dev

remote solstice
#

I just call it 'soak time'. The amount of time needed to fully grasp the interactions of the data. So the queries return sets, and you do an intersection of the sets.

#

set1.retainAll(set2)

lost hedge
#

with just EAV - not even EAVT

#

datomic gets away with just these indexes

remote solstice
#

He was following an academic paper, trying to implement their solution.

#

I didn't read it. but I'm sure he'll post it again for you.

fresh eagle
lost hedge
fresh eagle
#

Subject Predicate Object

lost hedge
#

but it does ^ that

#

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied ...

#

the most famous database which uses datalog is datomic

#

which is like RDF except it also adds a time to each fact

#

which is the "time of assertion"

fresh eagle
#

ohhhhh okay !! Nice

fresh eagle
#

It's only intersections of sets

#

no order needed i think lol

remote solstice