#Best Way Filter Array
1 messages · Page 1 of 1 (latest)
While you are waiting for getting help, here are some tips to improve your experience:
If nobody is calling back, that usually means that your question was not well asked and hence nobody feels confident enough answering. Try to use your time to elaborate, provide details, context, more code, examples and maybe some screenshots. With enough info, someone knows the answer for sure.
Don't forget to close your thread using the command </help-thread close:1027500463647621170> when your question has been answered, thanks.
Here is an AI assisted attempt to answer your question 🤖. Maybe it helps! In any case, a human is on the way 👍. To continue talking to the AI, you can use </chatgpt:1108714622413963314>.
The choice between multithreading and parallel streams depends on the specific requirements of your filtering task. Here are some factors to consider:
-
Task Complexity: If the filtering task is computationally intensive and can be easily divided into smaller independent subtasks, then multithreading might be a better option. You can create multiple threads to process different portions of the array simultaneously.
-
Data Dependency: If the filtering task requires accessing and modifying shared data, then you need to ensure thread safety. This can add complexity and potential synchronization overhead when using multithreading.
-
Simplicity: Parallel streams provide a higher-level abstraction compared to explicit multithreading. They allow you to express your filtering logic in a more declarative and concise manner, which can make your code easier to read and maintain.
-
Performance Testing: It's always recommended to perform performance testing with both approaches on representative data sets to determine which one performs better for your specific use case.
In general, if your filtering task is simple, does not involve shared data, and can be easily expressed using functional programming constructs, parallel streams are often a good choice due to their simplicity and built-in optimizations for parallel execution. However, if you have complex or computationally intensive filtering logic with shared data dependencies, multithreading might provide better performance by leveraging multiple CPU cores effectively.
how large is the array? whats in the array? is the filter stateless? is the filter fast or slow?
it is a normal string array of about 2 milion element and i need to filter it with an string array of 300 thousand element so i need to know how to filter as quickly as possible
what do you meam by "filter with a atring array of 300k elements"?
um delete the 300k array elements contained in the 2m array
can u make a concrete example please? it's very unclear and vague what ur saying
I have an array A with 2 million elements and an array B with 300k elements, I want to delete the elements of array B contained in array A, I want to know how to do it fastest
i just want to know how to do it
should i use multithreading ?
u can't delete items from an array to begin with
arrays are fixed size
please make a concrete example
@velvet tusk can u tell from where those arrays are coming from ? is it a file, database or whatever ?
um from database
um i can do it with stream.filter
but it’s the not best idea
private ArrayList<String> filter(ArrayList<String> arr1, ArrayList<String> arr2) {
int t = arr1.size() * 10 / 100;
var rs = new AtomicReference<ArrayList<String>>(new ArrayList<>());
AtomicInteger ct = new AtomicInteger(0);
for (int i = 0; i < arr1.size() / t; i++) {
int fi = i;
int z = fi + 1;
Thread.startVirtualThread(() -> {
for (int j = fi * t; j < t * z; j++) {
if (!arr2.contains(arr1.get(j))) {
rs.get().add(arr1.get(j));
}
}
ct.set(ct.get() + 1);
});
}
while (true) {
if (ct.get() == (arr1.size() / t)) {
break;
}
}
return rs.get();
}
Detected code, here are some useful tools:
private ArrayList<String> filter(ArrayList<String> arr1, ArrayList<String> arr2) {
int t = arr1.size() * 10 / 100;
var rs = new AtomicReference<ArrayList<String>>(new ArrayList<>());
AtomicInteger ct = new AtomicInteger(0);
for (int i = 0; i < arr1.size() / t; i++) {
int fi = i;
int z = fi + 1;
Thread.startVirtualThread(() -> {
for (int j = fi * t; j < t * z; j++) {
if (!arr2.contains(arr1.get(j))) {
rs.get().add(arr1.get(j));
}
}
ct.set(ct.get() + 1);
}
);
}
while (true) {
if (ct.get() == (arr1.size() / t)) {
break ;
}
}
return rs.get();
}
this code i use with multithread however it sometimes loses data
Specifically i divide the arr1 into 10 parts , i create 10 threads , each thread will check to see if element arr2 is not in arr1 then add it into the rs array and rs array is th result
i used stream.filter() it is very simple to implement but it very slow
i am still waiting for answers to my questions
until they are answered, i can't help u
Also, ur code is not working with arrays. ur working with lists
that's day and night
contains in a list is super slow
u should convert the smaller to a hashset
and then run sth along the lines of
list.parallelStream()
.filter(Predicates.not(set::contains)
.toList()
the details depend on how u answer my questions
for example whether u need to modify it inline
or whether its a one shot task or ull be doing these remove calls on the list multiple times
cause then u should do preparation work instead
and so on