#About CJK punctuation compression and “flexible” spacing in Typst
129 messages · Page 1 of 1 (latest)
I am looking for this because I want to extend Typst's native punctuation compression behavior in CJK text to create a comprehensive punctuation compression system that works line InDesign's CJK layout options.
In this case I will need a "flexible" horizontal space, that has a default amount, a max and min amount, which can be adjusted by the layout engine to balance the paragraph content.
For this, we need more detail about what exactly you're doing. A MWE would be best, something showing the structure of what you're trying to space and balance.
In general though, you would do something like:
#h(1cm, weak:true)#h(1fr, weak:true)
This has a minimum size of 1cm, and a maximum size of 100% of the outer box.
As for setting the maximum, you must know the number of things going into the block (because you're about to put them there), and the width of 100% (which you do, because layout exists).
So you can either set the outer width to 100%, or num_items*max_width.
I would provide an example for testing:
#let paragraph-example = [在概率论中,有一个很经典的问题叫作*生日悖论*,这个问题说的是,在一个至少多少人的群体里,才能保证存在两个生日相同的人的概率大于一半? 这个问题的答案是 23 人,但是人们会倾向于高估这个数字,这是因为混淆了「存在两个人同天生日」和「存在一个人和你同天生日」这两个事件。]
#paragraph-example
#show ",": sym.zwj + box(",")
#show "「": box("「") + sym.zwj
#show "」": sym.zwj + box("」")
#paragraph-example
This is a simple paragraph explaining what is called Birthday Paradox in probability theory
I would like to make all the commas takes 0.5em horizontal space by default, and can take any amount of space ranging from 0.5em to 1em, which is determined further by layout
And for start and end quotes 「」, they should take 0.625em of space by default and can vary from 0.5em to 1em determined by layout
I've found that with the help of box() (explicit size not set) you can easily make a CJK punctuation half-width, so if I need extra space at the beginning or the end, I can attach an extra h()
In softwares like InDesign there is a specialized dialog for CJK punctuation compression options, it just works like what I just have described.
And this is the effect of my example paragraph without and with punctuation compression
I see. Not quite what I thought you meant, so different answer.
There might be an advanced way to do it, but i can fake it.
First, Make the box width a parameter to be passed in.
I believe you're optimizing for "fewest full lines" right?
So you need to iterate from min to max width, measuring the resulting size (use layout to do it virtually)
That gives you a map of box widths to layout height. Filter for only the values before a height transition. (ie: right before a line is added). Then use the one lowest to the default width.
If no transitions, just use default.
I think this can be a show rule on par, to automate it.
(sorry, can't actually test until tomorrow late)
This would be good but perhaps manual layout would be extremely complicated. Also, apart from "fewest full lines", avoiding orphans and widows also need to be considered.
And for different lines of the paragraph, the ideal amount of punctuation compression is different. Setting the same amount for all lines might not create the optimal effect.
So is it possible to utilize the layout engine of Typst?
just asking in general, did you see this resource https://forum.typst.app/t/chinese-layout-gap-analysis-clreq-gap-for-typst/4691 which in some cases have solutions and workarounds to lots of identified issues. It probably offers nothing new for you (?)
also - just in case - best to specify minimum working examples in Typst with an explicit font and set text(lang: ..) and set text(region: ..) both set because the outcome in CJK text seems to depend on all of that
No, what I suggested is automatic. I guess I didn't explain well enough. Please hold for example.
#set page(width:10.8cm)
#let cjk-spacing-min = 0.5em
#let cjk-spacing-max = 1em
#let cjk-spacing-iter = 0.025em
#let cjk-spacing-default = 0.625em
#show ",": sym.zwj + box(width:cjk-spacing-default,",")
#show "「": box(width:cjk-spacing-default,"「") + sym.zwj
#show "」": sym.zwj + box(width:cjk-spacing-default,"」")
#let paragraph-example = [在概率论中,有一个很经典的问题叫作*生日悖论*,这个问题说的是,在一个至少多少人的群体里,才能保证存在两个生日相同的人的概率大于一半? 这个问题的答案是 23 人,但是人们会倾向于高估这个数字,这是因为混淆了「存在两个人同天生日」和「存在一个人和你同天生日」这两个事件。]
#paragraph-example
#let cjk-adaptive-spacing(it) = {
layout(ctx =>
{
let cjk-space = cjk-spacing-min
let working-space = cjk-spacing-min
let prev-height = none
let best-space = none
while cjk-space < cjk-spacing-max {
let sz = measure(width:ctx.width,{
show ",": sym.zwj + box(width:cjk-space,",")
show "「": box(width:cjk-space,"「") + sym.zwj
show "」": sym.zwj + box(width:cjk-space,"」")
it
}
)
if(prev-height == none){
prev-height = sz.height
} else if(prev-height != sz.height){
prev-height = sz.height
if(best-space == none or calc.abs(cjk-spacing-default - best-space) > calc.abs(cjk-spacing-default - working-space)){
best-space = working-space
}
}
working-space = cjk-space
cjk-space = cjk-space+cjk-spacing-iter
}
if(best-space == none){
best-space = cjk-spacing-default
}
show ",": sym.zwj + box(width:best-space,",")
show "「": box(width:best-space,"「") + sym.zwj
show "」": sym.zwj + box(width:best-space,"」")
it
}
)
}
#cjk-adaptive-spacing(paragraph-example)
#show par: cjk-adaptive-spacing
#paragraph-example
So, there's an example, but there's something wrong with the show par rule. I think someone with a better idea of the internals could help.
Thanks! Trying to understand how this works.
The major problem I encountered is that, I want to add space after one punctuation, but the space is determined by the content afterwards
whether it is another punctuation, or non-punct text
but with the replacement show rule one cannot see what is after the punctuation
Also, judging whether two punctuations are closely connected is a headache
because they may be separated in different elements but are still connected
such as the ?( in the second line of my text,the ( is in a sub-element with smaller font size
So how can I let one punct-box element know what is the punctuation right after it?
Then determine the needed amount of space afterwards
it sounds like you've cornered yourself into the same problem as the compiler :p
tbh , i'd personally try to implement this on the compiler if possible, at least you'd have much more information that way
but on the typst side, im not sure if thats possible without , like, redoing par layout by yourself...
only the parent can know whether two elements are siblings, at least arbitrarily speaking
one approach could indeed be to apply show rules on par for example, but yeah, im not sure how much easier that will make your job
especially given that not all text is in a par
I currently tries to identify sibling puncts by setting a state:
show regex(cjk-punct-regex): it => context {
in-cjk-punct.update(true)
punct-box(it.text)
}
show regex("[^" + registered-puncts.get().keys().join() + "]+"): it => {
in-cjk-punct.update(false)
it
}
so when in-cjk-punct is already true before I set it to true, then it must be followed by another punctuation
I don't know whether using state is a good option, but I cannot think of a better one
operating on the par level perhaps I need to go over every element in the par, but this would be painstaking because you may encounter a context()
Whose content you cannot know at the point
I found that putting punct in a box would reduce it to the minimum space possible, so if I add a h that is closely connected to the box then I can add space
yeah , in the compiler it can do realization which is the process of applying show rules to stuff, evaluating context and so on
it doesnt sound simple at all to do this in pure typst
with that said, i suppose state is one possible workaround, but it's definitely gonna be a bit painful in terms of layout iterations, especially if you have a ton of those
I think if I can know the next elembic punct-box element's info then I can determine the spacing, so what I am tring to do is to keep track of the previous punct-box element
Then if I can get the previous element (the very specific, just one element) with e.data stored in my state, from the next element I can then modify the previous element's fiend to apply spacing
So that's why I am asking whether e.set_ can be applied to modify one specific element
moving this here so i can stay in the channel
lol
so , im trying to understand what you are doing here
note that this set_ would only apply to a punct-box inside a punct-box
also you said you couldnt generate a label, but ill note that its the only way besides .with(), typst doesnt have reference equality
i.e. if all fields match then it is the same element. its impossible to tell elements apart that way
so a label is just a way to make one field differ.
with that said
i dont see a way other than using some counter for your punct-box
and then yuo store the counter value or something
ill note that elembic provides a counter by default
with e.counter(punct-box)
by default it steps by 1 on each element and is never reset
so thats something you can look into
probably the only way to assign a unique id to something
i guess thats all the help i can provide for the moment, it is a complex problem
well, i'll also note again that rules inside show_ only apply to children
you can try using a custom filter (e.filters.custom) in a cond-set , though the fact that you can use context in it is not really guaranteed...
but i guess you can anyways
¯_(ツ)_/¯
well actually
i dont think this should be handled with fields to be honest
you should just add something to the element's display which does that logic
sounds infinitely simpler
that, or replace it with something else (e.g. none) in the show rule instead of changing a field
anyway, those would be the things i'd try
I think maybe I'll try my hand on labels, by generating labels like <_punct0> <_punct1>, ...
i think you will have to use the counter for this
but yeah
though at the moment it's not really trivial to append a label in a show rule if you're thinking about that
at least, it wont be detected by elembic
unless
you set labelable: true in your element's declare
then it might work
though
it has the caveat that it depends on figures
so it would break all of your text
😦
unless you box(the whole punct-box) in a show rule i guess
still facing the layout did not converge problem
I dont know why but it takes place once I calls get-punct-spacing and it involves the current punct's sym field
I dont know why this happens so I just temporarily wrote 1em
My method is: keep track of the previous elembic punct-box element's data
And then collect all the sibling puncts into a dictionary maintained in state
Then in the next show rule, find those elements and set their space-after fields
What a nightmare, I think I must temporarily give it up right now, or just apply fixed amount of space to all puncts
I have no information about where the not converge problem arises from. If there is more detailed information I may try to find out a solution. But just a problem, no more hints, what can I do
Yeah this is what I meant here
Each update to a state requires one layout iteration to apply
If you have a bunch of update followed by get , this is what happens
Do this 5 times and it will surpass the limit of iterations (5), and you will get the warning
Basically it will update , then the value returned by the next .get changes in the next iteration, prompting its own update() to change; then in the next iteration the next .get updates, ..
So unfortunately you can't do that
The only way will then be to act on the parent
That is, the paragraph
And then at that point I'd suggest you to propose a change in the compiler's paragraph layout instead of going any further
It's probably gonna be many times simpler there
Or at least, more correct
With that said, if you still want to try something else, you can take a look at what wrap-it does since it seems to do some hacks for manual par layout, maybe that helps you in some way
But if not then I don't see an alternative
Finally I solved (perhaps) it with metadata