#How to split Devanagari bi-tri and tetra consonantal conjuncts as a whole from a string

22 messages · Page 1 of 1 (latest)

round sonnet
#

Are you intentionally printing two spaces in print!("{} ", ...?

daring minnow
#

oh, yes, it's just so that I understand, nothing else

raw solstice
#

Maybe try a split by whitespace?

daring minnow
#

Spliting by whitespace would give just word by word. Not conjuncts consonents

raw solstice
#

Ah, I see

#

Could it be that there’s a zero-width joiner of some sort between those graphemes?

daring minnow
#

Could be

#

Though I haven't checked if Devanagari conjuncts have zero width joiner. I have seen it in other script while started working on it

raw solstice
#

If you iterate with .chars(), examine each char to see something "strange" that might occur in your characters to identify whether there’s a special char to denominate the ligature

#

Or if it’s just a font thing, and you’d have to know, for each grapheme, if it merges with another

daring minnow
#

I have tested with chars() at the first try. But it does not give me conjuncts as a whole like न्दी most times

raw solstice
#

Input: हिन्दी क्त्र क्ष्ण्य असम के मुख्यमंत्री हिमंत

['ह', 'ि', 'न', '\u{94d}', 'द', 'ी', ' ', 'क', '\u{94d}', 'त', '\u{94d}', 'र', ' ', 'क', '\u{94d}', 'ष', '\u{94d}', 'ण', '\u{94d}', 'य', ' ', 'अ', 'स', 'म', ' ', 'क', '\u{947}', ' ', 'म', '\u{941}', 'ख', '\u{94d}', 'य', 'म', '\u{902}', 'त', '\u{94d}', 'र', 'ी', ' ', 'ह', 'ि', 'म', '\u{902}', 'त']
(for reference)

#

There seem to have a special class of characters that will be ligatured with the next one

daring minnow
#

Yes, that's something I got. Also, I was trying to get letter instead of unicode numbers that are shown but could not get it

raw solstice
#

Like 'ि or 'ी

#

I basically did ```rs
fn main() {
let hs = "हिन्दी क्त्र क्ष्ण्य असम के मुख्यमंत्री हिमंत";
let hsi: Vec<_> = hs.chars().collect();
println!("{hsi:?}");
}

daring minnow
raw solstice
#

Well, that is some tough shit

daring minnow
#

Yeh, true

#

I was trying to build it so that I can get the way I am seeking. But no luck yet. : ( : (

raw solstice
#

Hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm ferrisThonk