#Codex

3836 messages · Page 4 of 4 (latest)

storm whale
#

Is anyone working on multi character symbols currently? If not, I might have a look at doing it

midnight tangle
#

No one is working on it

past python
storm whale
lapis moth
#

yeah I was under the impression that all work on multi-character symbols was blocked on typst. If everyone is fine with implementing them on our side anyway, I can give it a go—seems like it'd be an easy change.

storm whale
lapis moth
#

cool

midnight tangle
#

Just to make sure you don't both start working on the same thing simultaneously, who wants to work on that?

lapis moth
#

I already did :P

midnight tangle
#

Yeah I just saw that, thanks!

lapis moth
#

The PR blocking graph is almost getting too long to keep it in my head now😂

midnight tangle
#

Hopefully once we merge those PRs the amount of meta PRs should reduce.

grizzled granite
grizzled granite
#

Do we want to hardcore the variation selectors, or add them automatically to any symbol with both a text and emoji presentation at compile time?

#

It's presumably available in some Unicode data

midnight tangle
#

I think it would be better to have them in the .txt files. That way, we decide what happens for each symbol

#

Unless you mean for codepoints that have multiple possible variation sequences, and one is considered the default in case there is no variation selector. In this case, it may make sense to add the VS if not present. But maybe this should happen later, such as in Typst. Additionally, I'm not sure the notion of a default variant is formalized in a way that would let us do that

lapis moth
#

I think for our symbols, even for the ones that may have a default style, we should definitely add variation selectors both ways. Software robustness principle and all that

grizzled granite
#

Yes, we should add them to both. My point was that it's known which symbols have both a text and emoji presentation, so we wouldn't actually have to enter them manually in sym.txt and emoji.txt

#

And yes, there is technically a default presentation for each one

storm whale
#

With the multi character symbols, are we opening up the symbol constructor to take them? I think we should still enforce it being a single 'symbol', by checking it is a single grapheme cluster (at least to start with).

midnight tangle
#

Yes I agree

#

Allowing clusters instead of single codepoints only is really a fix rather than a new feature

lapis moth
#

wait but why? Wasn't the goal to open up symbols more? I mean I guess it can still happen later, but I also don't see a good reason to delay it...

midnight tangle
storm whale
#

Allowing arbitrary content definitely feels too far I think, so it needs to stop somewhere

lapis moth
#

why would it be too far? Imo the sort of thing that classifies as a "symbol" is not something we can encapsulate in the type system, so allowing arbitrary content is the only way to allow everything that should be allowed

midnight tangle
grizzled granite
#

I don't understand why we're conflating these things. For the purposes of codex we do not require arbitrary content as symbols

lapis moth
#

No, but if allowing multi-char symbols means that when typst uses updated codex, it has to support them as well, so changes to typst symbols have to be done anyway, so we might as well talk about those too

grizzled granite
#

My feeling is that allowing arbitrary content as symbols will complicate many things, and is better left as a possibility for the future

#

Right now it should just be a string corresponding to a single grapheme cluster

lapis moth
midnight tangle
#

This is less about complications in the source code than it is about design decisions for the user facing interface imo

storm whale
#
    Some(match c {
        '〈' => '⟨',
        '〉' => '⟩',
        '《' => '⟪',
        '》' => '⟫',
        _ => return None,
    })

Does anyone know if these CJK compatability character mappings are actually needed?

#

Wait it isn't a CJK thing? The single ones aren't, but the double ones are in the CJK block.

#

Oh wait I think there's a mistake here lol, there are CJK ones for the single, but those aren't the ones above! It's the "Left-Pointing Angle Bracket (U+2329)", which decomposes into the CJK "Left Angle Bracket (U+3008)", as opposed to the "Mathematical Left Angle Bracket (U+27E8)".

grizzled granite
storm whale
grizzled granite
#

It's still not clear to me to what extent some of the symbols are there merely for backwards compatibility, or if they're actively still used

lapis moth
midnight tangle
midnight tangle
#

Solving #6441 (comment) is harder than I initially thought.
Consider the following case:

symbol
  @deprecated: reason 1
  .modifier
  @deprecated: reason 2
  .modifier.alt

Where should there be warnings in the following Typst code?

#let a = symbol.modifier
#let b = symbol.modifier.alt
#let c = a.alt
#

To summarize the problem: we probably want a single warning on each line. But I have no idea how this could be implemented.

storm whale
# grizzled granite Does the spec address this?

I found it in the spec. "22.5.4 Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF":

...

Mathematical Brackets. The mathematical white square brackets, angle brackets, double angle brackets, and tortoise shell brackets encoded at U+27E6..U+27ED are intended for ordinary mathematical use of these particular bracket types. They are unambiguously narrow, for use in mathematical and scientific notation, and should be distinguished from the corresponding wide forms of white square brackets, angle brackets, and double angle brackets used in CJK typography. (See the discussion of the CJK Symbols and Punctuation block in Section 6.2, General Punctuation.) Note especially that the “bra” and “ket” angle brackets (U+2329 LEFT-POINTING ANGLE BRACKET and U+232A RIGHT-POINTING ANGLE BRACKET, respectively) are deprecated. Their use is strongly discouraged, because of their canonical equivalence to CJK angle brackets. This canonical equivalence is likely to result in unintended spacing problems if these characters are used in mathematical formulae.

The flattened parentheses encoded at U+27EE..U+27EF are additional, specifically-styled mathematical parentheses. Unlike the mathematical and CJK brackets just discussed, the flattened parentheses do not have corresponding wide CJK versions which they would need to be contrasted with.

...

#

So maybe one could justify mapping the "bra" and "ket" angle brackets (maybe not though as they are canonically equivalent to the CJK ones), but definitely the CJK ones shouldn't.

storm whale
midnight tangle
#

IIRC there was more to it but I may be wrong

grizzled granite
#

I don't think there's anything special there other than lack of variation selectors and perhaps an unfortunate fallback chain

#

Could probably be closed

storm whale
#

I finally got round to emailing the NewCM maintainer about the roubdhand/chancery variation selectors. He got back to me and said he'd be interested in adding it but needs to read into it more.

grizzled granite
#

I thought someone already created a fork of it supporting the selectors

#

very possible I'm misremembering

#

I'm wondering if we need to find additional people interested in codex. Requiring 3 reviews when we have like 4 people total actively involved doesn't seem sustainable.

storm whale
storm whale
grizzled granite
#

that's what I meant about sustainable

midnight tangle
grizzled granite
#

I just realized that 3 reviews is in fact reviews from all four, since the person who creates the pr is out

past python
#

if we don't have enough people, we can drop the 3 back to 2

grizzled granite
#

Maybe we just leave anything breaking up for longer then?

#

like wait a month or something

#

@midnight tangle

#

or is that too slow

midnight tangle
midnight tangle
# grizzled granite or is that too slow

I don't think slowness is an issue. Symbol changes are often unrelated so breaking changes don't often block other things, and there is no point in merging PRs fast because Typst doesn't release a new version very often

grizzled granite
#

see the message above mine

midnight tangle
#

oh yeah sorry this is what I meant

#

I don't know exactly how it works in the rustc world, but essentially this would be similar to having a final comment period for breaking pull requests

grizzled granite
#

E.g. https://github.com/typst/codex/pull/59 has been open for two months with essential consensus. I'd just like confirmation that you guys are okay with the final change (add the currency sign, and use generic instead of general)

#

there is no real hurry, but it would also be nice to not have a scramble in order to get everything done right before 0.14

midnight tangle
#

I my case the reason why I did not respond with another confirmation is because this kind of PR is hard to review in the sense that I either have to trust you (which defeats the purpose of reviews), or check everything which takes a lot of time. And even then, I just don't have a strong opinion on the changes because I am not knowledgeable enough. Maybe sometimes we should accept that there will be no strong opinions, and (1) express our weak opinions and (2) treat weak opinions as strong when there is no strong opinion.

#

I don't like my phrasing for (2) but I hope my point gets accross

grizzled granite
#

you did

midnight tangle
past python
#

I think it might be time to research the term translations that LaTeX uses instead of continuing to have translation PRs trickle in one by one, with new terms decided by a random user that speaks that language (and the inconsistencies that result from this). C.f. https://github.com/typst/typst/pull/6519 (I posted this here because it conceptually fits with the broader subject of the forge.)

storm whale
#

We partly ended up doing that for the "page" translations

grizzled granite
#

hm, I just realized that we didn't handle the peso breaking change in any way. Any previous use of peso as ₱ will become $

#

not that I think it will affect many users, but it's still unfortunate

midnight tangle
#

I'm sorry if this was answered already, but wouldn't it be better to just have peso be ₱. If users want $, they can already use dollar

grizzled granite
#

I guess we could revert the change to peso specifically

#

I think the motivation was that a bunch of countries use the peso (with a much bigger population than the philippines), and the symbol even predates the dollar.

midnight tangle
#

Ok. I understand that the choice map peso to ₱ might then be "controversial". At the same time, having two symbols mapping to the same character, and having to use a variant of one to access a different character, seems like a waste

grizzled granite
#

I don't have very strong feelings either way. I don't think there is a perfect solution

#

let's hear what @storm whale has to say

storm whale
#

I mean pataca is so unlikely to collide with another symbol in the future, so I don't think there's much harm in having it. I figure we want to try cater to as many different languages, where possible (especially for naming symbols that are specific to a country/language). Population wise, it's most likely a user will type peso and expect $, instead of ₱. The Philippines is the only country that uses the word peso and not the dollar $ for it.

#

I don't see much issue having two symbols mapping to the same character. At least, I think for these currencies there are valid reasons to have them - that is, they have multiple names depending on where they are used.

tall quail
past python
midnight tangle
#

@grizzled granite if you want to see something similar to Unicode, but much better organized, take a look at the SMuFL specification. For example: https://w3c.github.io/smufl/latest/tables/time-signatures.html
SMuFL is a standard that uses Unicode private use areas to encode musical characters in music fonts for use in various music notation softwares. I discovered their spec recently, and it is so great compared to Unicode's: recommended ligatures and variation sequences, as well as implementation notes, and also related blocks (which they call "groups"), are in together in the same place.
I don't know if they provide all this information in a machine readable format, though.

grizzled granite
#

why couldn't we have that for unicode :/

past python
past python
grizzled granite
#

That may change of course, but that's the reality right now

grizzled granite
#

what is currently missing for text/emoji selectors?

heady fulcrum
#

the hope is that I'll be able to find a flow for it, so as to be able to contribute even when I'm busier

midnight tangle
grizzled granite
midnight tangle
lapis moth
#

Additionally, there are the two points of the code where I don't know how to proceed (typst-docs and typst-ide)—I'm hoping for @past python to step in and say what should be done there.

midnight tangle
lapis moth
#

yeah adding a sentence to the docs is what I meant

midnight tangle
#

@tall quail In case you did not know, the way to formally approve PRs is through the "Review changes" button in the "Files changed" tab.

tall quail
#

damn how is tihs silent thing supposed to work?

grizzled granite
#

we've moved away from using him for approval

midnight tangle
#

Breaking PRs no longer require approval from him

midnight tangle
#

I think it's too late now

#

You've already summoned him

tall quail
#

damn let's run for cover

#

@midnight tangle regarding your comment on #68 you mean between single and double brackets? Or indeed between brackets and quotes (but those are different families under different symbols?)

midnight tangle
#

brackets and quotes

#

My question is, if some random user types chevron, is it more likely that they want the quote or the bracket?

grizzled granite
#

bracket

#

for quotes they would presumably just use "

#

(smart quotes)

tall quail
#

It's also good for consistency: many other brackets have their own top-level symbol (paren, brace, bracket...)

#

Regarding #75 (dominos) I'm not sure the trick with submodules for order-dependent modifiers is a good idea

#

Breaking commutativity seems like a big deal (and the gender PR was careful to emulate it with duplicate definitions .male.female and ..female.male). The user probably won't know or care about submodules vs modifiers so this could be confusing

grizzled granite
#

In any case we've discussed making order dependence a thing, but only to resolve ambiguous situations.

#

We should likely finish that discussion first

#

Not like dominos are hugely important

grizzled granite
#

I've come to really dislike stroked.

tall quail
#

though I would also prefer to make the symbol top-level instead of a stroked variant

grizzled granite
#

yeah but the adjective stroked means something else

grizzled granite
midnight tangle
#

Yeah either one PR per change, or remove the controversial changes from this one and open one PR for each of them

#

In other words, either one PR per change, or one PR per controversial change

#

That way we can merge what there is consensus on

grizzled granite
#

I think 41 PRs would be too much, but I can batch them into logical units

#

😉

midnight tangle
#

Yeah this is what I meant

tall quail
grizzled granite
tall quail
#

But I don't think this distinction is very meaningful. The point is that "stroked" is derived from "stroke" and has the correct meaning

#

(There are other meanings but I don't see why they would be more of a problem than for stroke)

midnight tangle
#

I think we should try not to have too many "active" open PRs at once, as I feel like it makes discussing them harder: you have to context switch between many different topics and concerns.
Or maybe it's just me struggling to remember everything in which case it's not that bad and I can just read the comment threads again if that lets us move faster.

grizzled granite
midnight tangle
#

We can keep them open

grizzled granite
grizzled granite
midnight tangle
#

Let's ask the author before closing it

grizzled granite
#

I just discovered that stix two math has hundreds of otherwise missing symbols in the private use area

midnight tangle
#

Is there a list available?

grizzled granite
midnight tangle
grizzled granite
#

In particular the variants of the Greek and blackboard bold letters would be super great

#

Shame

grizzled granite
#

@midnight tangle I think .alt or .slant could work, but there really needs to be a more ergonomic way of using them. Having to separately redefine all of them sounds miserable.

midnight tangle
#

Maybe the solution would be to have some way to specify which version to use by default, when using just, e.g., lt.eq.

#

Ok I have an idea

grizzled granite
#

Yes that is essentially what I meant

#

There's also the question of what to do with the symbols that only have the slanted version...

#

Of which I think there's a couple at least?

midnight tangle
# midnight tangle Ok I have an idea

If we were to implement #6028, then lt.eq could return an element with a slanted property, which could be set using a set rule. The same element would be used for other symbols with possible slanted things.
The problem is that this would break the symbol -> str conversion

midnight tangle
midnight tangle
grizzled granite
#

I think it would essentially have to be a property on eq, equiv and tilde if we wanted to go that route

midnight tangle
#

I gotta go now, but I think the solution to our problem is not in trying to find a short modifier, but rather in making the version to use by default configurable

lapis moth
#

I'd like that

grizzled granite
#

@pastel violet are you still intending to do anything with https://github.com/typst/codex/pull/9 ? No pressure, just wondering if we should leave it open or not. Of course, even if it's closed you are welcome to open another one.

pastel violet
#

Oh I forgot about that feature. It shouldn't be much work to adapt it to what we decided. I'll get it on my todo list 👍

grizzled granite
#

Maybe I misunderstood @midnight tangle , it seemed completely impossible to reach a consensus about the previous pr

midnight tangle
#

You mean on the .slant issue?

#

If so maybe you are right that it's better to add the remaining symbols now and figure out a better way to change the default later

grizzled granite
midnight tangle
#

@past python just as a heads-up, PR typst/codex#96 contains some minor documentation changes. Would you like to be notified about those minor meta PRs in the future, or only those that affect the public interface?

GitHub

These were slightly out of date after #92.
I've also taken the liberty to rephrase them a bit; Ever since #46, the crate-level docs have explained what a variant is, so we can just use that...

lapis moth
#

Let's discuss whether symbols should be required to be single grapheme clusters at typst#6489.

grizzled granite
#

I briefly looked at the code, and renaming it shouldn't be so hard, but I have no idea how deprecation would be handled

grizzled granite
#

I thought this was #1176478139757629563 , but apparently not

past python
grizzled granite
#

am I doing something wrong?

storm whale
#

Maybe rebase won't cause them to go stale? I've got no idea though. I'll reapprove

past python
#

Also, the merge commit could contain arbitrary further edits

#

Same goes for rebase, so I'd assume it behaves the same

grizzled granite
#

I guess, though it is a bit annoying

#

oh well

grizzled granite
midnight tangle
#

I think it is just missing a review, right?

grizzled granite
#

but I don't want to rush anyone

#

I suspect 0.14 is some way off anyway

midnight tangle
#

Maybe @lapis moth has more thoughts about it, as they were the only person to raise a concern about the change

grizzled granite
#

I think they already voiced their opinion

midnight tangle
#

I understand that asking for reviews here feels like you are rushing people, but most of the time we just forget about a PR. And if we don't have the time to review it we know it's okay

lapis moth
# grizzled granite I think they already voiced their opinion

pretty much, yeah. I still don't like the name and was personally mostly fine with angle.
In my personal math package, I've defined them as a function called ang, so we could also use that in theory, but I also understand if that's too unclear.

#

Not really sure if withholding an approval based on a personal opinion is warranted... I think we do have enough people to be able to merge this without me right?

lapis moth
#

@past python Just letting you know if you don't already: I'm pretty sure #105 still needs your input since it's a substantial change to how symbols work, in codex and later also in typst (which is also already part of of the discussion thread)

past python
midnight tangle
lapis moth
#

@midnight tangle I hope you don't mind that I edited your message in #113 to remove a typo

grizzled granite
#

Should the emoji flag sequences live in codex? I seem to recall @past python mentioning that they are problematic

#

Still, it's important to be able to use regional indicators I think

past python
#

could you elaborate what "regional indicators" are? I'm not up to speed right now.

grizzled granite
# past python could you elaborate what "regional indicators" are? I'm not up to speed right no...

The regional indicator symbols are a set of 26 alphabetic Unicode characters (A–Z) intended to be used to encode ISO 3166-1 alpha-2 two-letter country codes in a way that allows optional special treatment.
These were defined by October 2010 as part of the Unicode 6.0 support for emoji, as an alternative to encoding separate characters for each...

#

You have 26 of them, and you can combine two to correspond to a region

#

Most emoji fonts display that as the flag for that region

past python
#

ah okay just these, I see

grizzled granite
#

Now that we have multi character symbols we could have the sequences in codex

past python
#

my last position on this was that we (probably?) don't want to define all 26^2 combinations, so we'd need to decide which are actual countries, which I'd prefer to avoid doing

#

but I remember someone saying that Unicode might actually state which are valid?

#

idk whether I remember correctly

grizzled granite
#

Its just the iso codes with a few modifications

past python
#

then I guess it's probably fine

midnight tangle
#

Alternatively, maybe we can find a way to have emoji.flag.xy be valid for any x and y lowercase Latin letter, without having to actually define all symbols?

grizzled granite
#

It's also not a good experience for users

#

It should only be valid for combinations in the cldr

#

There's also the question of whether we want to use "flag". I think "region" would be better

midnight tangle
grizzled granite
#

They're very adamant about fonts not necessarily using flags

#

It can just be some other thing

midnight tangle
grizzled granite
#

I didn't mean manually adding them necessarily

#

But we should rely on the cldr

#

I assume we're already using some crate with this information?

midnight tangle
grizzled granite
midnight tangle
#

If there is such a guarantee then I would be okay with only defining flags that correspond to defined country codes

past python
grizzled granite
grizzled granite
#

Deprecated regions are included in the list of valid region sequences so that deprecations in the future do not invalidate previously valid emoji flag sequences.

#

Not sure why they are referred to as flag sequences here, when they're adamant in other places....

#

Right

#

Although a pair of REGIONAL INDICATOR symbols is referred to as an emoji_flag_sequence, it really represents a specific region, not a specific flag for that region. The actual flag displayed for the pair may be different on different platforms, for example for territories which do not have an official flag. The displayed flag may change over time as regions change their flags and platforms update their software.

#

So region.xy is likely the safest bet

midnight tangle
grizzled granite
#

Maybe

#

I see there are also sequences for subregions

lapis moth
grizzled granite
lapis moth
#

None that I know of, tho I may be ignorant

#

I haven't ever seen a Hamburg flag emoji for example

grizzled granite
#

The uk subregions seem to be widely supported

#

At least

lapis moth
#

okay, right, that's the one exception

#

UK gets special treatment for some reason😑

grizzled granite
#

Most likely because font support for everything else is lacking

#

Chicken or egg situation....

lapis moth
#

yeah

grizzled granite
#

this one has support for 73 subregions

past python
#

region is very undiscoverable and if I'd read it in a document, I wouldn't know what it means as a user

midnight tangle
#

Let me remind you of #114, which is almost ready to be merged, with the exception of an open question regarding a test: currently, I synchronously query the list of presentation sequences defined by Unicode from the internet every time the tests are run, and I'm not sure this is the best way. Notably, the tests may start failing if Unicode updates their list of variation sequences (although that can be solved easily by pinning a specific version of Unicode).

Note that the completeness of the added tests means you don't have to review each individual changes to the symbol lists, as errors would have been caught by the tests.

midnight tangle
#

I would like to specify a proper license for the Proposal document. I was thinking of CC BY-SA. @pastel violet since you are the only person other than me who contributed to the document, would you be okay with that? Or would you prefer another license?

grizzled granite
#

I have no skin in this game, but I like CC-BY-SA.

#

Is this because of the benchmarking @midnight tangle ?

#

In that case you should just make sure that the "SA" part is compatible with what Laurenz wants it for

midnight tangle
#

This is what made me remember we did not have a license for that, but in general it is useful to provide documents under appropriate licenses

pastel violet
#

something something I hereby officially release all edits past and future made by myself to the aforementioned "Proposal Document" under the Creative Commons BY-SA license

midnight tangle
#

Thanks!

grizzled granite
#

@brisk geyser as in, any emoji which has two forms gets an emoji selector in emoji and text selector in sym

brisk geyser
#

ah okay, sounds reasonable, though as I mentioned from what I remember, the text variation selector can only be applied to certain emojis

#

but I might remember wrong

grizzled granite
#

I'm just saying there are some conflicting efforts here, so it would be a good idea to not immediately go ahead with a pr 🙂

midnight tangle
#

I opened a pull request to typst/codex to initiate a discussion on the implementation: #116.

grizzled granite
#

Would it be better to use "han" instead of Chinese?

#

It's shorter, and they're also used outside of china

midnight tangle
# grizzled granite Would it be better to use "han" instead of Chinese?

I know nothing about Chinese numerals, but Wikipedia does not seem to refer to it as Han numerals anywhere on the Chinese numerals page.

Chinese numerals are words and characters used to denote numbers in written Chinese.
Today, speakers of Chinese languages use three written numeral systems: the system of Arabic numerals used worldwide, and two indigenous systems. The more familiar indigenous system is based on Chinese characters that correspond to numerals in the spoken languag...

#

Also, does it make sense to convey upper/lower through case? (like I did for Latin vs. latin)

grizzled granite
#

ISO 15924, Codes for the representation of names of scripts, is an international standard defining codes for writing systems or scripts (a "set of graphic characters used for the written form of one or more languages"). Each script is given both a four-letter code and a numeric code.
Where possible the codes are derived from ISO 639-2, where the...

midnight tangle
#

Ok

grizzled granite
#

Maybe we could get some input from actual CJK users

midnight tangle
#

Yes that would be the best I think

#

For this PR, I think the main goals are to figure out naming conventions and assign a name to existing numeral systems. Adding the missing ones can be done later

grizzled granite
midnight tangle
#

I gotta go now. We can discuss more later

grizzled granite
lapis moth
grizzled granite
midnight tangle
grizzled granite
#

That would also work

#

I don't know what kind of parsing limitations there are

midnight tangle
#

Actually the \vs15 you proposed is probably best, because the number inside the braces for Unicode escapes are hexadecimal, but VS numbers are usually expressed Un decimal

lapis moth
#

I don't think of the braces in \u{} as denoting "hexadecimal", but rather just delimiting a number that can be arbitrarily zero-padded (i.e. \u{10} and \u{0010} are the same thing, whereas \u10 and \u0010 seem like different identifiers), so I'm personally in favor of \vs{...}. Tho ofc it's not necessary since these are only two-digit numbers at most, but I really like the \vs{emoji|text} idea, so it's still good for consistency.

grizzled granite
lapis moth
grizzled granite
#

Well now I'm sad

#

Apparently the guy who maintained it died a month ago

past python
#

@grizzled granite It's so curious to me that you it is you that opened https://github.com/typst/typst/issues/6355 even though you've previously often argued against new math shorthands. Does the situation feel different to you in math?

#

Shorthands are kinda tricky because basically any change is breaking.

#

Especially with -! and -?, I also wonder whether they encourage mixing writing and formatting.

grizzled granite
#

Regarding shorthands in math, I think I've been pretty consistent in saying that the main issue is that the rules for where they apply are too flexible, leading to ambiguity.

midnight tangle
#

@past python regarding presentation selectors, there is an issue which is that a font that can display a character X will always be able to display both X\vs{text} and X\vs{emoji}. I believe this is by design, as it means a cluster can be unable to be displayed just because the font doesn't support a specific presentation.
But for us, this means X\vs{emoji} won't fall back to the emoji font if X is supported by the current font. I'm not sure how to solve that.
I won't have the time to investigate that more sadly.

past python
#

yeah, that's annoying

lapis moth
#

otherwise I don't follow

storm whale
#

Would it even be possible to tell whether the shaper has selected a glyph using the variation selector or not?

grizzled granite
#

Surely there has to be a way to check whether a font supports a specific variation sequence

midnight tangle
midnight tangle
# lapis moth otherwise I don't follow

The problem is that the main text font doesn't fall back to the emoji font because its non-variated glyphs (i.e., text presentation glyphs) can be used for emoji presentation sequences as well

grizzled granite
#

That's a stupid design

midnight tangle
grizzled granite
midnight tangle
#

I don't know if Unicode says anything about font fallback though

grizzled granite
#

I don't see why typst shouldn't be free to have that feature

grizzled granite
#

Or are you saying we would be in violation of something?

#

This would also be useful for scr

#

and other variation sequences in math

#

optional of course

lapis moth
#

I'm not too informed on the matter, but I presume the challenge would be a technical one, since there's probably no way to tell whether a variation sequence came out correctly

#

Best way I can think of would require explicit searching of the relevant font table (not sure which one is responsible for variation sequences) to see if the one we want to shape is included or not

midnight tangle
storm whale
grizzled granite
#

@midnight tangle is there something like \ vs for zwj? It'll be useful for emoji zwj sequences

midnight tangle
#

Do you mean in our DSL?

#

If so, no. But I would say just copy and pasting the characters is probably easier

#

We could do that for variation selectors as well actually, not sure what's better

grizzled granite
grizzled granite
midnight tangle
grizzled granite
#

Having invisible symbols everywhere isn't great

midnight tangle
grizzled granite
#

Yeah

lapis moth
#

idk, I kinda partially disagree. Sure, for invisibles it's a good thing and this is why we have it to begin with, but for zwj sequences I'm not too keen on writing them out explicitly...

grizzled granite
#

I'm mostly talking about the exceptional zwj sequences for specific symbols, not gender and skin color

lapis moth
#

I think the distinction is between things that are (a) freestanding and invisible, (b) just modifications of an existing thing and (c) a completely separate thing that just happens to be made from smaller constituents, and I think escaping a and b is fine, but I dislike it for c.

grizzled granite
#

Those would be automatically generated presumably

#

It just doesn't seem maintainable to me

#

Also if you put in the actual zwj symbols your editor will render the resulting emoji instead of the constituents

lapis moth
#

To give an actual example, I wouldn't like having the pirate flag as 🏴\zwj☠\vs{emoji}

grizzled granite
#

I would like that

#

Is there even a text variation of it? Is the emoji variation selector necessary?

lapis moth
#

idk, my input method added it

tall quail
#

are we talking about how users could write these in Typst strings? or files internal to codex?

midnight tangle
#

Internal files

tall quail
#

ok and what's the alternative to 🏴\zwj☠? If you replace \zwj by an actual word joiner it would just look like a pirate flag so you don't see how it's made of different code points

#

or am I missing something?

#

it seems useful to have at least the option of showing how a symbol is built

midnight tangle
#

Disallowing invisible characters forces us to make sure we input the right thing

past python
#

As a heads-up: We are slowly preparing for an upcoming 0.14 release candidate and to reduce last-minute stress as much as possible, I'd like to already cut a codex release fairly soon. I merged the presentation selector PR just now. Is there anything else that should still land?

midnight tangle
grizzled granite
#

Unicode 17 is supposedly going to be finalized today

midnight tangle
#

It adds some interesting arrows (some of which might be hard to name), but more importantly some emojis we might want to have in Typst 0.14

grizzled granite
#

I don't think waiting for 0.15 is a big deal. Font support is going to be lacking anyway.

midnight tangle
grizzled granite
#

Up to @past python I guess

brisk geyser
#

doesnt this require support for unicode 17 in rustybuzz first?

grizzled granite
#

Is rustybuzz basically in maintenance mode because of harfrust?

brisk geyser
#

dont think anyones gonna update it

brisk geyser
grizzled granite
brisk geyser
#

probably

past python
brisk geyser
#

its going to happen in some shape of form but I dont know when and also how

#

(as in, if its gonna replace usvg or if it will be a new crate, etc.)

past python
brisk geyser
#

yeah because of that

#

i hope it doesnt come to taht tho

past python
#

that would be sad for the ecosystem

#

and it wouldn't help anybody's compile time either if the original usvg is just unmaintained

brisk geyser
#

sad indeed

grizzled granite
#

I don't understand the deal with giving up the maintainer role, but then still insisting on making maintainer decisions

idle dew
#

what does new emoji have to do with the shaper? do shapers shape emoji

grizzled granite
#

but don't quote me on that

idle dew
#

i cant find the data files in rustyvuzz repo

#

oh the python scripts

midnight tangle
#

@past python I won't have the time to implement better deprecation warnings for symbols. IIRC, @lapis moth's solution may make it possible to implement your dram warning so that may be a good starting point.

GitHub

This is the companion PR for typst/codex#114.
As mentioned on Discord,1 this PR won't change the appearance of emojis and symbols with the default font because emoji presentation sequences ...

past python
#

though I think my dream warning would also require changes in codex

#

since the message is different

midnight tangle
#

Oh right I didn't notice that

grizzled granite
#

Harfrust was just updated to Unicode 17

past python
#

@midnight tangle What do you think about deprecating modifiers instead of variants? I have a draft of that working. The diff on sym.txt would look like this: https://github.com/typst/codex/commit/ea0677a6f1319dd6ac4a36165ed94e0a8298d823

It is a tiny bit less flexible because you can't deprecate a single variant when all of its modifiers are also used in other non-deprecated variants, but in the existing deprecations I haven't found a case where that would be necessary. And the warning messages are much clearer (no duplicated warnings, the deprecation message & span are both related to the modifier, and the deprecation messages don't reference modifiers that you haven't actually applied). On sym.txt, it also leads to some deduplication.

On Typst, it would look like this:

warning: the `double` modifier is deprecated, use `stroked` instead
  ┌─ hi.typ:2:38
  │
2 │ $ bracket.stroked.l ast.small bracket.double.r $
  │                                       ^^^^^^

Instead of the current version:

warning: `bracket.l.double` is deprecated, use `bracket.l.stroked` instead
  ┌─ hi.typ:2:38
  │
2 │ $ bracket.stroked.l ast.small bracket.double.r $
  │                                       ^^^^^^

warning: `bracket.r.double` is deprecated, use `bracket.r.stroked` instead
  ┌─ hi.typ:2:45
  │
2 │ $ bracket.stroked.l ast.small bracket.double.r $
                                                 ^

The duplication needs to go either way of course, but I think even if we would deduplicate the current version, it's a bit less clear than talking specifically about the modifier in the message.

lapis moth
#

Personally, I like the idea, but also think we should keep the option to deprecate variants around.

midnight tangle
#

The fact that it would not be possible to deprecate a single variant seems like something that may be limiting in the future. Essentially, this means deprecations can only be used to rename or remove entire modifiers instead of variants, as you noted. This is already a bit visible in the planck.reduce case, where the variant planck.reduce is renamed to planck, instead of the modifier itself being renamed (same with circle.nested being renamed to compose.o)

lapis moth
#

and we already have the code for it, so removing it and later noticing we do need it after all and have to bring it back would be a bit of a waste of work...

past python
#

in theory we could also extend it such that a modifier set can be deprecated, i.e. it triggers once all modifiers in the set are in the symbol

#

I dislike a bit how much of the API surface of codex would be just deprecation handling if we kept both

grizzled granite
past python
#

And having various different deprecation mechanisms can also create confusion which one is the appropriate one

grizzled granite
#

You may want to split a modifier into two different ones for instance, since it may be used for two slightly different things and we suddenly need to make that distinction clear

#

We may also have introduced a symbol in the wrong place, or it shouldn't have been introduced in the first place

past python
#

the problem is that the error message needs to appear on some modifier in the end

#

and deprecations on variants don't have that information

midnight tangle
grizzled granite
#

I guess we could've been better at writing the deprecation messages, but they quickly get very long

past python
#

essentially you only emit a warning when going from non-full coverage of the set to full coverage

midnight tangle
past python
midnight tangle
#

I mean the solution proposed here: https://github.com/typst/typst/pull/6441#issuecomment-3002011213
Which changes the implementation of warnings on Typst's side to only emit the warning the first time it occurs for variant deprecations.

GitHub

This is the companion PR to typst/codex#86.
The deprecation message is emitted when the symbol is modified. This means accessing the variant is what triggers the warning (i.e., the symbol does not ...

past python
# midnight tangle I mean the solution proposed here: https://github.com/typst/typst/pull/6441#issu...

My proposal is a bit different. I mean to change the code here: https://github.com/typst/typst/blob/721a7b18dd105e9255dc45f1af1510465e78bf02/crates/typst-library/src/foundations/symbol.rs#L157-L159

Currently, it warns every time a symbol is modified and the currently resolved variant is deprecated.

If, instead we could deprecate a combination of modifiers like this:

@deprecated(chevron.l): `chevron.l` is deprecated, use ... instead
quote
  .double "
  .single '
  .chevron.l.double «
  .chevron.l.single ‹
  .chevron.r.double »
  .chevron.r.single ›
  .angle.l.double «

This would turn into a deprecation set S = { chevron, l }

And then in the code linked above, when we have current modifiers M and the new modifier m, we check whether the new modifier brings the set to full coverage, i.e. |S inter M| < |S| but |S inter (M union {m})| = |S|.

#

Perhaps though, we could use the same approach with the existing deprecation warnings? I.e. check whether the symbol already resolved to the same variant as an indicator that the warning was already emitted?

#

That might be simplest!

#

The warning message would still be slightly less clean than directly deprecating a modifier (for that kind of deprecation), but I think if that works it would be satisfactory.

midnight tangle
#

This could still cause an issue in the following case:

foo
  @deprecated: ...
  .bar
  .bar.baz

Writing foo.bar.baz would cause a warning at foo.bar

past python
#

That seems kind of unavoidable

#

At least without major hacks

#

I.e. let x = foo.bar; x.baz behaving differently

#

I think we should just avoid such a thing

past python
grizzled granite
#

I think this is a narrower view of what a modifier is than how they're used in practice

midnight tangle
#

Your proposal would only emit a single warning in the following case, with foo.bar.baz, where #1277628305142452306 message would emit two warnings.

foo
  @deprecated: `bar` is deprecated
  .bar
  @deprecated: `bar` is deprecated
  .bar.baz
#

Maybe it makes sense to be able to deprecate modifiers and variants independently

grizzled granite
#

Other delimiters still have double

past python
grizzled granite
#

I mean, it was more general, but that's one example

past python
#

the exact wording could be different, but I'd like to avoid that bracket.double.r gives "warning: bracket.l.double is deprecated, use bracket.l.stroked instead" (note the difference between l and r) because bracket.double is the first warning thing and resolves to the l variant

#

that could be avoided by a different wording

#

because in essence l and r can have the same deprecation wording

past python
#

two deprecation warnings on one symbol would typically be confusing anyway I think, even if it would technically make sense

midnight tangle
#

Then I can't think of a case where this wouldn't work if we word warning messages carefully

past python
#

on a technical level, it could even be a bool that essentially says "this symbol already emitted a warning" instead of the set stuff above

#

not sure which would be cleaner, but that's just an implementation concern

past python
midnight tangle
past python
#

I think so too. I think the approach is definitely good enough for 0.14.

#

That's all I want 😄

grizzled granite
#

it's specifically these variants that are deprecated, not the modifier

#

obviously not the end of the world, but still

past python
#

could just be "bracket.double is deprecated, use bracket.stroked instead"

#

just omitting the irrelevant extra modifiers

grizzled granite
midnight tangle
#

@past python to be clear, are you planing to fix the warnings yourself?

past python
#

It's a pretty short diff

midnight tangle
#

I'll take a look at it later today

past python
#

Also, we now have both angle.azimuth and angzarr. Is that intentional?

grizzled granite
#

It was only recently that it was discovered what the meaning of the symbol was

past python
grizzled granite
#

Sorry for creating a mess!

past python
lapis moth
#

Re #120: Has someone checked that all the emoji listed here are correct? Because my font is apparently totally scrambled for some reason😅

gilded roost
#

looks correct to me, your emoji font is probably outdated

lapis moth
#

Downgrading fixed it, apparently Noto Emoji 2.051 is totally bugged😅

midnight tangle
lapis moth
balmy bison
past python
#

For codex by the way I think keeping the changelog up to date with changes would be much more helpful and worthwhile than for Typst as the structure is much more fixed. Creating the changelog was 1-2 hours of rather mindless work, so it's easy for mistakes to slip in.

#

Perhaps it's still not worth it, but if it's worthwhile in any of the Typst repositories, then here.

midnight tangle
#

We can try do maintain the changelog with each new PR in the future

lapis moth
#

yeah, shouldn't be a problem, I try to do this with a lot of my personal projects too.

lapis moth
#

Now that we're done with the 0.2 release for Typst 0.14, we can go back to the longer-standing open issues, so I'd particularly like some reviews on #112 and #93.

midnight tangle
#

We should probably not merge any major change until Typst 0.14 is released, just in case a bug fix needs to be made in Codex. But of course we can discuss future things without waiting!

lapis moth
#

Well, it hopefully won't be that long anyway😅
Not sure if you meant to include reviews in "discuss", but those can ofc also already be given and we can just wait with merging the PRs even if they end up already having enough approvals.

#

also #112 can be merged immediately since it doesn't touch the functionality.

#

Tho it might need @past python's approval. (Also I hope this doesn't ping. I recently had a moment where I got pinged even tho someone used @silent)

midnight tangle
#

Or more generally meta changes

lapis moth
#

Yeah and #112 is a meta change.

#

Adding contribution guidelines.

midnight tangle
#

Oh right sorry I confused the two PRs

lapis moth
#

I also just opened #121 to remove the deprecated stuff, so add that to the list.

tall quail
#

even after Typst 0.14 is released, there might be a 0.14.1 later anyway so we must have a way to deal with that

midnight tangle
lapis moth
#

We can reopen #79, right?
And are there other PRs that we had closed due to the multi-char symbols thing?

lapis moth
#

I'll just go ahead and reopen them then

midnight tangle
#

Btw there is an emoji named unknown, which I suspect was added by accident at some point. It maps to U+D83E Emoji Component White Hair

lapis moth
#

Probably good to get that deleted before 0.14.0

midnight tangle
#

I don't think it matters that much, more likely than not I'm the first person ever to notice it. I don't think the burden of making a new release is worth it

lapis moth
#

oh right the release, yeah that's fair. Do we just silently delete it the next version then?

midnight tangle
#

Doesn't have to be silent in the sense that we can write it in the changelog, but yeah it can probably wait

lapis moth
midnight tangle
#

Then yes probably

#

A deprecation probably wouldn't hurt either, but it's also pointless

lapis moth
midnight tangle
midnight tangle
lapis moth
#

lol

lapis moth
#

or, well, apparently they were part of a different repo called "symmie" before, which has since been deleted.

#

(kind of funny how we're now back to a similar structure. history really rhymes lol)

grizzled granite
#

There's a lot of old weirdness

lapis moth
grizzled granite
#

Oh you meant symmie was a typst thing

lapis moth
#

yep it was

past python
#

codex is a much better name though

midnight tangle
#

It's mostly a port to HTML from PDF (it was the last UTR to still use PDF), but still amazing to know that they are working on it

grizzled granite
midnight tangle
#

Yeah I can't access it either

grizzled granite
#

Probably either not changed, or not public yet

grizzled granite
#

@midnight tangle bracket became bracke in your pr

midnight tangle
#

Oh right, thanks

grizzled granite
#

Since both paren and brace are 5 letters

#

Maybe unnecessary

midnight tangle
#

I prefer using full names when it's not unreasonable

grizzled granite
#

Oh right the full name is upper left or lower right etc

midnight tangle
#

Yes

grizzled granite
# midnight tangle Yes

I assume that mustache matches the default math classes (open/close), while the reversed is opposite?

midnight tangle
midnight tangle
grizzled granite
#

In addition to corners top and bottom we should probably have ones with diagonal corners as well?

midnight tangle
#

That couldn't be done with callable symbols though

grizzled granite
#

Aren't you able to use any delimiter on each side?

midnight tangle
#

What determines which function ends up being called is the underlying Unicode character of the called symbol

grizzled granite
#

Oh nm it's not a symbol of course

#

I want to get back to codex at some point, but I'm super busy these days

midnight tangle
grizzled granite
#

If this isn't an acceptable name, there's pea. It's less natural, but at least doesn't have the unfortunate alternate meaning

heady fulcrum
#

Is something like weierp (or some variation thereof) already dismissed? It's not as consistent with ell, but avoids the unfortunate name

#

(pea feels indeed a tad less natural to me.)

grizzled granite
heady fulcrum
#

why exactly?

#

because of shortening the weierstrass?

grizzled granite
#

it's not consistent with codex naming in general

heady fulcrum
#

the consistency in question is wrt ell or something else?

grizzled granite
#

it's also only (sometimes) called weierstrass p due to the somewhat obscure weierstrass elliptic function

#

but can be used for other things than that

#

pee or pea is a more neutral name that is more consistent with ell

heady fulcrum
#

okay, I agree

grizzled granite
#

I'd prefer pee, but I also understand why the team could be opposed to it

heady fulcrum
#

yeah, I think I'm with you here. the unfortunate clash does bother me a bit, but I do prefer it vs. pea

#

(which fwiw also has a clash, though it's not as crass)

lapis moth
#

Another way to be sort of consistent with ell would be epp :V

midnight tangle
midnight tangle
#

Also, @grizzled granite, when making the currency symbols PR, did you consider all symbols from the Currency Symbols block? You mentioned purposefully avoiding symbols for currencies no longer in active use, but does that mean all currency symbols were considered?
Also, the Currency Symbols chart mentions other currency symbols that aren't part of the block, some of them are not present in Codex. Did you consider them for inclusion?
The reason I'm thinking about that is because Unicode 17 added the symbol for the Saudi riyal, which is in active use.
Of course there is no rush so don't feel obligated to look into it right now if you don't have the time.

grizzled granite
midnight tangle
#

So I suppose it makes sense to add the Saudi riyal symbol as just riyal given that other riyal symbols are not in active use?

grizzled granite
#

I guess I can't completely rule out the possibility of missing some, but I don't think so

storm whale
#

if we're just adding a symbol for a new glyph it's surely fine to not wait for the update to harfrust?

#

@grizzled granite

grizzled granite
#

New symbols need support in the shaper, as far as I understand

#

Obviously it's not a problem as long as codex isn't bumped in typst

storm whale
#

idk, let me try it. the other issue will be font support

grizzled granite
storm whale
grizzled granite
storm whale
grizzled granite
#

Honestly I don't know, but my impression was that it wouldn't work without updating rustybuzz/harfrust

#

Haven't tried though

storm whale
#

I tried to shape it with an older version of hb-shape (before unicode 17) and I can't get it to work

#

Oh wait no it worked ahaha its just the name of the glyph in the font is "[namenotfound]"

grizzled granite
#

Moving to harfrust seems like a no-brainer anyway

storm whale
grizzled granite
storm whale
#

i'll merge the riyal codex pr

grizzled granite
#

no one seems interested in updating rustybuzz

grizzled granite
storm whale
#

true, but as you say there is really not much discussion needed here 😅

#

also, should we merge the removal of deprecations now then?

#

oh crap forgot to update the changelog for the riyal

#

thats what i get for not waiting

grizzled granite
storm whale
#

ill open a PR updating the changelog

midnight tangle
#

I opened a new PR that adds a test for the validity of standardized variation sequences, similarly to how we already test for the validity of emoji variation sequences (i.e., presentation sequences).
https://github.com/typst/codex/pull/126

GitHub

This PR updates the testing infrastructure to also test that standardized variation sequences are valid. I also reorganized build.rs to separate the processing of Codex module files from the part t...

#

I also took the opportunity to reorganize build.rs a bit

#

Sadly, GitHub seems to be pretty bad at displaying diffs that mostly consist of changed indentation

storm whale
#

@midnight tangle Related to the serif union/intersection PR you just put up,. Just an FYI that earlier today I emailed the NewCM maintainer about the empty set variation sequence to fix https://github.com/typst/typst/issues/1528

midnight tangle
#

Nice! I added the serif variations first as they are mostly no-brainers, but I would like to add the empty set variant soon as well.

grizzled granite
#

Do any fonts support these yet?

midnight tangle
#

I haven't checked

midnight tangle
#

I've added all mathematical variation sequences to the document (appart from calligraphic letters). Almost all of them are from Mathematical Operators or Supplemental Mathematical Operators (resp., first two, and third column, in the attached image).

storm whale
midnight tangle
#

Yes this is indeed quite a breaking change for a lot of documents

grizzled granite
#

@midnight tangle did you know there is a proper way to do cancel/not for every symbol?

#

It's just broken in every font lol

midnight tangle
#

I knew there were some combining characters but I haven't looked into them much

grizzled granite
#

There are some very useful ones, but they seem to mostly be broken

#

I think a few work in noto sans math

midnight tangle
#

Since multi-codepoint symbols are kinda new, I only looked at variation sequences for now

midnight tangle
grizzled granite
#

Actually the specific one I was talking about is in https://en.wikipedia.org/wiki/Combining_Diacritical_Marks instead

Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. Its b...

#

U+338

#

Hm, I wasn't aware of the fact that 20d2 is the one supposed to be used for negation

#

that's weird...

midnight tangle
#

Yeah and its even weirder that the Combining Diacritical Marks for Symbols block contains a double slash variant

#

I'd like to find a list of accepted combinations of mathematical characters + combining marks. I'll look for it in UCD

grizzled granite
#

In noto sans math it's the diagonal one that is negation at least

#

which makes much more sense

#

Same in luciole

#

And NCm

#

I didn't actually think that one worked in ncm

grizzled granite
midnight tangle
#

I guess? But it's so weird because there are obvious cases where it doesn't make sense

grizzled granite
lapis moth
#

with combining marks you can get stuff like 1̝

grizzled granite
#

But it's weird that the spec is wrong about the uses of the vertical bar for negation

midnight tangle
midnight tangle
lapis moth
#

so yeah, definitely a "use them with whatever" kind of situation

midnight tangle
#

Then I think we better stick to single codepoints with optional variation sequences for now, and try to cover as much of that as possible, before starting to decide which combinations we want to add and under which names

grizzled granite
#

I was thinking more like making them callable symbols, like the accents

#

So not(eq) for instance

lapis moth
#

makes sense, tho the usual accents already have combining variants too

#

̃ for example

lapis moth
tall quail
grizzled granite
#

arrow.l.r.double.long is a mouthful

tall quail
#

in Typst you'd just write <==> though

grizzled granite
midnight tangle
grizzled granite
#

I've just been extremely busy

#

I thought the unicode 17 PRs were blocked by moving to harfrust?

midnight tangle
storm whale
midnight tangle
river berry
#

More of a concern is whether it stretches in an LR elem, i.e. $lr(n!)$

#

or $lr(! x !)$

#

Yeah, typst-layout/src/math/lr.rs::scale_if_delimiter() currently checks the math class, so it would act like a delimiter, although I have no clue if it would actually be able to stretch.

#

I guess that's the harm of using a spacing related parameter to determine semantics. But I would expect unicode to be more careful than that in their definitions tbh.

midnight tangle
#

The underlying issue is that they have no math class for expressing postfix unary operators. This makes sense because they aren't very common (apart from exponentiation, I can't think of any other one). But it creates this weird situation where ! simply doesn't really fit into any of the existing classes.

river berry
#

Also, I love how the new section starts "The math class property described here" despite "math class" not actually appearing elsewhere in the document (although it is kind of defined in section 5, but its very implicit)

#

Also 5.1 has a typo in the first sentence

The data file [Data] provides a classification of characters by primary their primary usage in mathematical notation.

midnight tangle
#

Though I'm unsure whether this is the kind of feedback they expect at this stage

river berry
#

I'll try to submit the grammar issue at least. Neat.

storm whale
#

And that in TeX it basically worked as a postfix unary operator

midnight tangle
#

Currently, the styling::to_style function in Codex does not handle accented letters properly. My interpretation of UTR #25, Section 6.5 Accented Characters, and especially the sentence below, is that we should first decompose any precomposed character, and only then apply the styles.

to achieve consistent results, a mathematical display system should transiently decompose any precomposed upright letters when used in mathematical expressions, and should use a single algorithm to place embellishments.

storm whale
midnight tangle
storm whale
midnight tangle
#

Yes They are just regular accented letters using combining diacritics

#

Not supported by NewCMM as of now though

storm whale
#

I don't imagine many fonts would support it

midnight tangle
#

I'll probably send the maintainer an e-mail regarding the combining solidus thing (#133). I may include that as well.

midnight tangle
storm whale
#

Can you open an issue for it in codex? I might be able to do it this weekend

midnight tangle
rugged rover
#

Hi, I just opened a PR: https://github.com/typst/codex/pull/135, please let me know if you have any feedback or concerns!

I encountered the lack of these symbols while taking notes from an abstract algebra course, so these are definitely used.

Btw, I couldn't found where the |-> ligation is implemented, because it would be nice to have |--> (currently available as mapsto.long but not in the shorthand form), and <-| and <--| as well

grizzled granite
#

I think there's a moratorium on new shorthands until a better way of handling them can be found

midnight tangle
#

I'm not sure this is the way we want to implement it and expose it to the user, but to be fair I don't really know what else we could do that would be as convenient for the user

lapis moth
#

not sure what other aspects you're referring to, but at least the two-letter names are pretty much mandatory, given how country flags work in Unicode.

midnight tangle
#

In terms of implementation, we may want to hardcode every country code and instead generate them somehow

midnight tangle
#

Regarding the .slant issue, as I have expressed before, I think a solution should be found that enables users to choose a default to use for the rest of the document. A similar issue exists with diagonal vs. vertical negation strokes (see UTR #25, Table 8).

A possible solution would be to only have names for non-slanted equal variants, and diagonal negation strokes, and have functions similar to those in styling.rs: one to convert symbols from horizontal equal bars to slanted equal bars, and another one to convert from diagonal negation stroke to vertical negation stroke.

Users should then be able to use the following show rule in their document: show math.equation: it => slanted-equal-bars(it) (temporary function name used for this example).

This solution does not require any new infrastructure.

midnight tangle
#

In case anyone is interested, I recreated Symbols defined by unicode-math in Typst.
For now it is fully automatically generated, so it is not very well organized. It lists every Unicode character that has a math class, as well as all their variations sequences, together with the way to obtain them in Typst math.
In the future, I would like to add combining character sequences such as the ones listed in Section 14. Negations of Mathematical Symbols of the Codex Proposal document, which I also reworked to remove old proposal and focus on the symbol list.

Note that the Symbol List for sym document (ex-Codex Proposals document) and the new List of Mathematical Symbols Supported by Unicode document serve different purposes: the former is a working document to help find names for symbols in Codex (specifically for the sym module), while the latter aims to be an exhaustive list of all mathematical symbols that are supported by Unicode, which might be a useful resource for math font authors.

The new document : https://typst.app/project/rBazFFMoB6JF4abehfMFqL

lapis moth
#

@midnight tangle can you remind me of our current PR approval policy again? I want to add it to the contributing guidelines PR and can neither remember it fully nor find the last time you mentioned it😬

midnight tangle
#
  • Non-breaking PRs require two approvals
  • Breaking PRs require three
  • PRs that affect the public API require review by Laurenz
lapis moth
midnight tangle
#

Yes, but removing a deprecated item doesn't as long as a released version contains the deprecation

#

Or at least this is how we have been handling it for now I think

lapis moth
#

Sure, the point is formalizing all the unspoken conventions we've been holding in our heads.

#

Do we want to formalize releasing in lock-step with Typst? As in, two Codex releases between Typst releases shouldn't happen?

#

because that matters a bit for the deprecation thing.

midnight tangle
#

I'm not sure what you mean exactly.
As long as each version of Codex is used in a Typst version, everything should be fine. And since the community is not the one to decide when a new Codex version is released, what you're saying shouldn't ever happen

lapis moth
#

Yeah I guess that's good then. I was just trying to avoid bad edge cases😅

midnight tangle
#

The following two Codex PRs are missing a single review each and implement relatively small changes so they should be a quick and easy review when you have the time:

  • #123 trivially updates to Unicode 17. According to #1277628305142452306 message, it should be fine to merge.
  • #131 renames {gt,lt}.tri.* to {gt,lt}.closed.* for consistency with {chevron,paren,subset,supset}.closed.
GitHub

This simply updates all mentions of a Unicode version from 16 to 17.
Looking at the Unicode 17.0.0 changelog, this version does not make any change to presentation sequences, and I also don&#39...

GitHub

As suggested by @mkorje, I'm opening this PR to implement the change proposed in #128 to rename {gt,lt}.tri.* to {gt,lt}.closed.*.

grizzled granite
#

@midnight tangle I suspect the web app issue may apply to any symbol in the SMP

#

Same thing is happening for emojis, such as the Rocket emoji

midnight tangle
#

Seems likely indeed

midnight tangle
#

@storm whale upright(aleph) yields the non-symbol Aleph (and same for the other four Hebrew letters that have symbols). I think this is due to this line of code. Is this intended? Hebrew symbols aren't italic, so I wouldn't expect upright/italic to have an effect on them.

storm whale
midnight tangle
#

I see

#

Though using upright for that feels like a bit of a stretch to me

storm whale
#

That's fair, but there needed to be something and it kinda works. That's also what I'm doing with the arabic

midnight tangle
#

What about simply using strings to get the regular characters? Out of the following fonts, only Libertinus Math supports the actual Hebrew letters:

  • New Computer Modern Math
  • New Computer Modern Sans Math
  • Noto Sans Math
  • STIX Two Math
  • Libertinus Math
  • Fira Math
storm whale
#

In the future that'll be the text font, and I think there should be some reasonably easy way to access it in the math font, notwithstanding lack of font support

grizzled granite
midnight tangle
#

Done

tall quail
#

For ∨ if we want a "graphical" name different from "vee", according to https://en.wikipedia.org/wiki/Descending_wedge it's also called "vel"

The descending wedge symbol ∨ may represent:

Logical disjunction in propositional logic
Join in lattice theory
The wedge sum in topology
The V sign, a symbol representing peace among other things
The vertically reflected symbol, ∧, is a wedge, and often denotes a related or dual operator.
The ∨ symbol was introduced by Russell and Whitehe...

#

I've never seen that name in use though

midnight tangle
tall quail
#

What about symbols that have 2 or 3 well established meanings?

midnight tangle
#

I don't think we had this situation before. That would probably be decided on a case-by-case basis

midnight tangle
past python
midnight tangle
#

There are currently two open PRs that add tests:

  • #126 for the validity of variation sequences.
  • #144 to ensure that NFC is used for all symbols.
    The first one shows quite a large diff in GitHub, but most of it is simply changes to indentation that GitHub does not detect apparently. RustRover and VSCodium are both able to display the diff correctly.
GitHub

This PR updates the testing infrastructure to also test that standardized variation sequences are valid. I also reorganized build.rs to separate the processing of Codex module files from the part t...

GitHub

We probably want symbols to use NFC (i.e., the precomposed normalization form).
All the symbols currently part of Codex are already composed as much as possible, but sadly enforcing NFC has the eff...

midnight tangle
#

I resurrected the plan to move numbering kinds to Codex (renamed to "numeral systems"): #145.

tall quail
#

Regarding https://github.com/typst/typst/issues/6283, it seems that dots for the centered version was meant to be implemented already in https://github.com/typst/typst/pull/747 but that implementation was wrong (just changing the order of .h and .h.c without considering the number of modifiers.

GitHub

A markup-based typesetting system that is powerful and easy to learn. - typst/typst

GitHub

See https://en.wiktionary.org/wiki/·
dot is the operator whereas dot.c is the middle dot. This PR moves the current dot to dot.period and moves dot.op to dot.
Fixes #724

#

What would be the right fix? changing .h.c to .c and .h to .h.b ?

midnight tangle
#

I think we would probably want to keep .h.c as is, but list it first so that it is the default when typing dots with no modifier (or even dots.h without .c or .b).

midnight tangle
#

@grizzled granite you have multiple open PRs on typst/codex that are blocked because they don't update the changelog. If you don't have the time to edit them yourself, are you fine with me pushing some commits to do that?

grizzled granite
midnight tangle
#

np

midnight tangle
#

I can't edit your original pull requests because I do not have the permission, so I'll just create new ones instead.

grizzled granite
#

I seem to recall Laurenz said he wasn't a fan of the usage of submodules at one point. Maybe I misremember

midnight tangle
#

I don't remember exactly but I think one of the reasons was that it wouldn't be supported by the symbol picker for now. For chess symbols, I think it is worth it though. It just makes so much sense in my opinion for it to be in a separate module. An argument could be made that it should be in a package though.

grizzled granite
#

I'm of the opinion that symbols shouldn't be gated behind packages.

grizzled granite
#

@midnight tangle gotta remember to update the changeloglog

tall quail
#

@lapis moth regarding https://github.com/typst/codex/pull/153 I guess you mean you don't like variants like line.feed, line.return, line.new? Would you rather have several small submodules? or name like linefeed, linereturn , newline, textstart, fileseparator ?

#

it's true there are currently no such "meaningless" symbol names as control.line or control.text would be as far as I can see

#

except maybe for dotless

lapis moth
#

the abbreviations you have now are good imo

tall quail
#

lol I clicked "view reviewed changes" from your comment and didn't realize it was an older version

#

it did seem weird that the names were reverted, that explains it 🙂

midnight tangle
storm whale
midnight tangle
#

Thanks!

grizzled granite
#

@storm whale your issue made me randomly stumble across this in pennstander

midnight tangle
#

Looks like you're gonna have to send someone an email

grizzled granite
midnight tangle
#

Issue tracker? Never heard of that

#

I found ten new issues with NewCM 8.0.0 so I'll have to send another email I guess

#

So many symbols have random pointless dots for some reason

#

Now that NewCMM supports all variation sequences, I opened #156 to add a name for the slashed zero variant.

grizzled granite
#

The inconsistency of using slant and not slanted, but slashed and not slash bothers me a bit. Is there a logical reason?

#

I don't particularly care which one we choose, but it seems a bit strange to use both

midnight tangle
tall quail
#

also "slant"` is also an adjective anyway 🙂

grizzled granite
grizzled granite
#

Ok fair enough

grizzled granite
midnight tangle
#

That is unusual

grizzled granite
#

@midnight tangle the dot situation seems inconsistent

#

and.dot is ⟑, but ⩑ exists

midnight tangle
#

See the Symbols with Decorations section of the document for my current feeling on the whole topic.
I am on my phone right now but I'll be available to discuss that later if you want

grizzled granite
midnight tangle
#

I sent it in the issue I think

midnight tangle
grizzled granite
midnight tangle
#

I tend to agree with your comment in the PR. I am also wondering whether some of those would be better as emoji. We could also have both, but for some of them I don't think it makes much sense to have them in sym (e.g., the proposed sym.keyboard.alarm, which I also don't think fits in thee keyboard submodule).

I am working on a symbol-by-symbol review.

Also, full disclosure, I know the author IRL.

river berry
#

I also think it's a bit big, feels more like a button module to me than keyboard

#

although I know it also contains a buttons sub-module, 😮‍💨

flat pagoda
#

While I am not suprised that such a thing exists, I was not aware of the invisible function application symbol being a thing: https://github.com/typst/typst/issues/8263 . Should this be included in Typst Codex? I would certainly take it into use immediately.

GitHub

Description We should encourage best practices in the math docs for writing math that is accessible. For example, the use of intent and arg attributes (once #8262 is complete), and the use of invis...

grizzled granite
dense briar
# flat pagoda While I am not suprised that such a thing exists, I was not aware of the invisib...

I've added my thoughts on #8263. One thing to add is that AT testing will need to determine in what cases we apply intents, content MathML, or invisible operators. Furthermore, I'd find it too much of an imposition on the user to always use something like a function application operator while authoring; it does not set achievable standards for accessibility (and the operator might conflate the ability to mathematically evaluate the expression vs. accessibly announcing it)

midnight tangle
#

We are at 99 commits on typst/codex. Almost 100!

#

Somewhat relatedly, are we supposed to dismiss the security alerts ourselves or is Dependabot going to realize that we pushed a fix?

past python
midnight tangle
#

It didn't go away immediately after merging the PR but I don't see it anymore either now

formal phoenix
#

I am interested at trying my hand on adding flags. I read through is#115 and pr#136 and it seems to me that the primary concern is validating that all countries are included and correctly mapped.

It’s annoying to map names to codes. I am currently just using python and a table of iso3166 abbreviations to do the conversion but inconsistent or multiple names means my list of corrections is quite long. When writing verification in rust i will hardcode a map of country names to codes and then verify that all of the listed flags in the file fetched from Unicode are included. This will fail if an extra country with no conversion is included but I’m unsure if this is adequately rigorous for what you all are looking for.

the main problem with this is that there’s not really a good external truth for country codes that can be pulled in without adding a dependency. I would love to hear any thoughts on other ways to do this

Edit: looking at it further it seems I have misunderstood the structure of the flag sequences, this is maybe easier than I thought?

midnight tangle
#

I don't know whether we only want to have tests for the validity of flag sequences, or whether we want a test for the correctness of the name -> flag mapping.

In the first case, we only need to consider the Unicode-provides file that lists all valid country flags.

grizzled granite
#

1CEDF will be "SQUARE ROOT OF SQUARE ROOT OF SQUARE ROOT OF SQUARE ROOT" in unicode 18.0

#

lol

midnight tangle
#

Ooh Unicode 18.0 just entered Beta period!

#

This wasn't in Alpha I think

#

Relevant symbols from Unicode 18.0 Alpha are listed in the document.

grizzled granite
#

Most of the mathematical symbols added seem to be super obscure historical ones

midnight tangle
#

Indeed, which is why I don't think we need to assign them a name for now

storm whale
#

why won't they standardise the existing VS for the calligraphic/script glyphs to lower case latin!! But they'll add a whole new different alphabet as a new variation selector

#

but this time only for lower case latin

grizzled granite
storm whale
grizzled granite
#

Probably why no one has used it for hundreds of years

storm whale
grizzled granite
#

Why does discord think I want to react with 😫 (I think it's that one) every time I accidentally double tap something?

grizzled granite
#

I can guarantee you that they would use them if they could

storm whale
grizzled granite
#

Rather that than the sans serif and typewriter ones

storm whale
#

And surely something like bold(italic(sans(chi))) did not have a great deal of "evidence" for its use...

grizzled granite
#

But what possible harm could there be in allowing the variation selectors more broadly? The worst thing that can happen is that the font doesn't have that glyph and you fall back to the original one

storm whale
grizzled granite
#

Apparently you could turn it off. But why on earth is that the default

grizzled granite
storm whale
#

its got 4 categories for calligraphy/script! upright, restrained, embellished, heavily sloped

grizzled granite
#

Not sure if there's a search engine for arxiv sources

#

That would be a good way

grizzled granite
storm whale
#

well its something at least :p

grizzled granite
storm whale
#

I am working on that today

#

Hopefully I can get it all done

midnight tangle
#

It would be nice to merge #126. The test is not extremely important, but it can help detect mistakes earlier, which is alway nice.
The diff looks very bad on GitHub, but most of it is re-indentation, so VSCode and RustRover display it correctly.
If no one from the community has the time to review it in the following days, I'll ping Laurenz to do it himself.

#

Also, the commit history looks very bad, that's not GitHub fault, that's mine. Sorry for that.

midnight tangle
#

#171 changes the NumeralSystem type to use &str instead of char everywhere. This is required to merge the Arabic abjad numerals PR, and more generally a change that can be useful in the future.

midnight tangle
#

@river berry if you are interested you can ask to be given approval permission on typst/codex. One more person with the power to help PRs move forward is always nice

past python
#

@midnight tangle As people have already picked up on, we're nearing a release candidate for 0.15. Is codex in a releasable state or is there stuff that should still land?

On a side note, I noticed one minor issue with the changelog.

midnight tangle