#Korean recognition problem LZA
58 messages · Page 1 of 1 (latest)
yeesh i don't blame the computer for misreading that, that is a subtle difference lol
Then, what is the reason for the error?
nothing special, it just had a hard time deciding which character it was at that resolution. do you have a screenshot of what it was seeing when it threw the error?
String Match Result -> 도구 파워: 불
It should have been 도구 파워: 볼
Thank you for reporting the text detection failure and giving the screenshot 
I have added this ambiguity case into resource file. The program when loading the new resource file will solve this issue
You can wait for the next release to have the updated resource file
You can also add this on your own to disambiguate the two similar characters and use the program now. Just replace the content of Resources/Tesseract/CharacterReductions.json with this file
If you meet any new Korean text detection failure, please report to us so we can fix them for all Korean users 
Thank you for your response.😀
I keep getting an error.
I'm guessing it seems like the error occurs every time Tool Power: Ball Lv. 3 appears in the upper row.
Can you give the screenshot and log please?
This is the same error before
Have you updated this json file?
Yes, replacing the file didn't fix it.
Are you running the program on Windows?
yes window11
Can you show your folder structure of this CharacterReductions.json? I want to make sure the file is at the correct location
@bold pawn can you help debug why the json file didn’t apply to the program?
I’m going to bed 😴
I think you need to reboot the program before changes take effect.
I've already tried rebooting my computer
Oh Gin, the character replacements are run after the KD normalization.
The Korean characters are split into their components before running the replacements. Therefore it won't match the replacements.
Can we run replacement first?
yes with an extra round-trip conversion to/from UTF-32. But I wouldn't move it there. At least run it both before and after.
But I think that's just masking a bigger underlying problem.
Those two texts in Korean are nearly identical and are almost completely matching. So the OCR is correct to return both as potential hits.
Which one is the correct one? Is it the bottom one? 
Here it looks like the one stroke misread just happens to match the first hit, thus giving it a closer score than the correct one.
Personally, I want to call this unfixable without sprite matching.
OCR is good for identifying the existence of text with in a longer text. But it is terrible at distinguishing two nearly identical texts. And this is precisely what we're trying to do here.
Even an average person (as opposed to the machine) can struggle to tell the difference when the characters are small.
This problem is solvable. The two characters are very similar compared to their complexity, but so do many Chinese characters.
As long as two similar characters are not used similarly in pokemon games, character reduction should work.
For example, if the characters for two pokemon types are this close, no way we can solve this without a better OCR. But if one character describes ball in pokeball and another is part of the word “fire”, they are used differently and can be distinguished in a text dictionary matching scope
We want to be respectful here. Koreans Japanese and Chinese read complex characters since they are kids. They can easily tell the difference with help of the contexts the words are in. It could be seen disrespectful when making comments to someone’s native language that does not align with what their native speakers think
I meant absolutely no disrespect, I meant a human can struggle since it's a small difference. That was all
Apologies for the poor wording.
I was effectively just saying "even a human could struggle to see that so i don't blame the computer for having a hard time"
But East Asians don’t struggle to read complex characters
humans can struggle, regardless of origin
small details can be hard to pick apart regardless of any linguistic origin
We have context of a word to help differentiate it
That isn't relevant to my point, my point is literally just that any human, regardless of origin or background at all, can struggle to see the single stroke difference of a small rendered character at these resolutions. Not that everyone will or must struggle, but that anyone CAN.
It was completely independent of any cultural assumption whatsoever...
What will you think if sth you think should be easily achievable due to your background is expressed as difficult to achieve by another culture when solving a machine error, and implying that we all should accept machines won’t work reliably on your culture?
Not at all, if anything its exactly the opposite. It's impressive you can reliably tell the characters apart, I would not be able to. There is no assumption or implication at all, it was just a simple observation that the difference between the characters was objectively small and I wouldn't blame anyone for missing it at all.
Just because it's easily achievable for you doesn't mean it is for everyone.
Expressing difficulty of such task itself is no offense. But saying it in the context of solving a machine error can be seen as being inconsiderate to the other culture
I personally disagree, I believe that's your interpretation. I did not in any way, shape, or form intend to convey any disrespect or inconsideration. A machine originally written and designed to function with the comparatively visually simple characters of Western languages will logically have difficulty visually distinguishing characters it isn't intimately familiar with. So you adjust the machine accordingly, teach it. No implications at all.
Sorry if I came across wrong at all.
This reminds me of that Google AI image labeling incident 
AI labels an image of a very dark skin person as a chimpanzee. I saw people of lighter color laughed in comment “It’s understandable AI would struggle telling them apart”. What would dark skin persons think of those comments?
Strip away any assumptions and analyze it from a purely factual, logical angle. Without any other assumptions at all, based purely on color matching, it is understandable. In that context. To be clear this makes absolutely zero connections in any way between the person with dark skin and the chimpanzee, with the singular exception of "these colors look somewhat similar."
That's the angle I was trying to look at it from mostly, if that makes sense
I would have the same argument as this issue in English too, if you had a capital I and a lowercase l. Yes, with context, I can usually figure it out, but if context isn't enough, I might not be able to visually tell which is which alone. That makes no assumptions about my English skill, anyone else's english skill, or how the machine should work, it's just saying "those characters look really similar to each other."
Sorry if it seems to make you feel I’m accusing you of being disrespectful. I want to solve the machine issue for Korean users. Feel free to dm me if you like to continue the discussion
@chrome cipher Checking back on this, have you encountered the bug again on the latest program? I have made some changes to the code afterwards. It should have fixed this particular character. If you find any more bugs please inform me 