I have one file contains many lines, every line is a "info" of one video, and I need to process it. But the file it too large, so using one single thread to load&process it is a bad idea. So I'm trying to use maybe 6 thread to do that. Since my ops is read-only (get_line and process_line one by one), my first idea is load the file ==> load different chunks (with different offset) of the file. Code:[image]. The code can be complie and run. However, my MAP size is not right. My file should contain 13,000,000 lines, but MAP size is or so 12,990,000. I am very sure No duplicate data here. Then why? Even though eventually I adopted one reader + two workers(process lines) to do that. I still wanna know why there are some lines missed? Where is the mistake?
#Load and process a file in multi-threads using C++ ifstream, but meet something wrong
30 messages · Page 1 of 1 (latest)
When your question is answered use !solved to mark the question as resolved.
Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.
I have read many posts on stackoverflow but none of them satisfy me 😂
- wtf is that sysconf? use
std::thread::hardware_concurrency();
!sc
@sonic tide
They're hard to read and prevent copying and pasting.
- what's
video2tagids_map_ - since you're chunking the file by raw bytes, it's possible that whatever you're looking for gets split between different threads. Have you considered that?
@feral surge is our bot
@round furnace it store every line's info. one line -> one record in that map. Question 2: Suppose I have 6 threads, then I divide them into six blocks, but when I actually continue getline in each thread, I guess only the line where the junction/split is located will have a problem (right?) I fixed this so it won't be a problem, and even if it does cause a problem, it should only be for those 6 rows, right?
@digital wind 😅oops
yes
Does it work if you set it to use just a single thread?
Like, this multithreaded code, but set max_threads_number = 1
i don't see you updating video2tagids_map_ anywhere though... if you are updating it from within the worker threads then you should wait until they all finish first (std::future::wait etc) before you fetch the size
otherwise some threads may be slower and not yet finished when you retrieve the size
I would guess he's simply missing a few lines at the end due to a rounding error
also you should be making sure that the map supports concurrent writes (just in case)
yes, single thread works right
*Rounding error in this line:
size_t chunkSize = fileSize / max_threads_number;
in his example it does say that he is missing ~10k lines :p
easily possible considering the max is 13 mil
size_t endPos = (i == max_threads_number - 1)? fileSize : (i + 1) * chunkSize;
i thought that was the case too :p
but it seems to be ok
I think you're right about that, seems i did not wait until they all finish,
👍
This question is being automatically marked as stale.
If your question has been answered, type !solved.
If your question is not answered feel free to bump the post or re-ask.
Take a look at !howto ask for tips on improving your question.
!solved