#Codex
3836 messages · Page 4 of 4 (latest)
No one is working on it
I don't at all want to stop you from starting something new, but just a small reminder that this PR is almost ready to merge, just needs finishing touches: https://github.com/typst/codex/pull/20
Ah I almost forgot about that. I'll fix that up.
yeah I was under the impression that all work on multi-character symbols was blocked on typst. If everyone is fine with implementing them on our side anyway, I can give it a go—seems like it'd be an easy change.
I think the only "blocker" was math, but that's basically done.
cool
Just to make sure you don't both start working on the same thing simultaneously, who wants to work on that?
I already did :P
Yeah I just saw that, thanks!
The PR blocking graph is almost getting too long to keep it in my head now😂
Hopefully once we merge those PRs the amount of meta PRs should reduce.
I don't know how far away 0.14 is (cf #contributors ), but it would be nice to get https://github.com/typst/codex/pull/78 https://github.com/typst/codex/pull/62 https://github.com/typst/codex/pull/59 and https://github.com/typst/codex/pull/81 merged before then
Do we want to hardcore the variation selectors, or add them automatically to any symbol with both a text and emoji presentation at compile time?
It's presumably available in some Unicode data
I think it would be better to have them in the .txt files. That way, we decide what happens for each symbol
Unless you mean for codepoints that have multiple possible variation sequences, and one is considered the default in case there is no variation selector. In this case, it may make sense to add the VS if not present. But maybe this should happen later, such as in Typst. Additionally, I'm not sure the notion of a default variant is formalized in a way that would let us do that
I think for our symbols, even for the ones that may have a default style, we should definitely add variation selectors both ways. Software robustness principle and all that
Yes, we should add them to both. My point was that it's known which symbols have both a text and emoji presentation, so we wouldn't actually have to enter them manually in sym.txt and emoji.txt
And yes, there is technically a default presentation for each one
With the multi character symbols, are we opening up the symbol constructor to take them? I think we should still enforce it being a single 'symbol', by checking it is a single grapheme cluster (at least to start with).
Yes I agree
Allowing clusters instead of single codepoints only is really a fix rather than a new feature
wait but why? Wasn't the goal to open up symbols more? I mean I guess it can still happen later, but I also don't see a good reason to delay it...
Because then there is the question of how far we want to go, what we want to allow and what we don't.
https://github.com/typst/typst/issues/6028
Allowing arbitrary content definitely feels too far I think, so it needs to stop somewhere
why would it be too far? Imo the sort of thing that classifies as a "symbol" is not something we can encapsulate in the type system, so allowing arbitrary content is the only way to allow everything that should be allowed
So you wouldn't allow the usecase presented in the issue?
I don't understand why we're conflating these things. For the purposes of codex we do not require arbitrary content as symbols
No, but if allowing multi-char symbols means that when typst uses updated codex, it has to support them as well, so changes to typst symbols have to be done anyway, so we might as well talk about those too
My feeling is that allowing arbitrary content as symbols will complicate many things, and is better left as a possibility for the future
Right now it should just be a string corresponding to a single grapheme cluster
if you look at https://github.com/typst/typst/pull/6489, the complications are already there, and they would still be there even if it was limited to single grapheme clusters.
This is less about complications in the source code than it is about design decisions for the user facing interface imo
I think so. If it does exist, it shouldn't be under the existing symvol then I think
Some(match c {
'〈' => '⟨',
'〉' => '⟩',
'《' => '⟪',
'》' => '⟫',
_ => return None,
})
Does anyone know if these CJK compatability character mappings are actually needed?
Wait it isn't a CJK thing? The single ones aren't, but the double ones are in the CJK block.
Oh wait I think there's a mistake here lol, there are CJK ones for the single, but those aren't the ones above! It's the "Left-Pointing Angle Bracket (U+2329)", which decomposes into the CJK "Left Angle Bracket (U+3008)", as opposed to the "Mathematical Left Angle Bracket (U+27E8)".
Does the spec address this?
Im not sure. I'll have a look
It's still not clear to me to what extent some of the symbols are there merely for backwards compatibility, or if they're actively still used
Then I don't see what about "symbols can contain any content" is complicated
It's a design change, whereas allowing just clusters closer to a bug fix.This is illustrated by the fact that not everyone thinks symbols should be able to hold any content (e.g., #1277628305142452306 message).
Solving #6441 (comment) is harder than I initially thought.
Consider the following case:
symbol
@deprecated: reason 1
.modifier
@deprecated: reason 2
.modifier.alt
Where should there be warnings in the following Typst code?
#let a = symbol.modifier
#let b = symbol.modifier.alt
#let c = a.alt
To summarize the problem: we probably want a single warning on each line. But I have no idea how this could be implemented.
I found it in the spec. "22.5.4 Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF":
...
Mathematical Brackets. The mathematical white square brackets, angle brackets, double angle brackets, and tortoise shell brackets encoded at U+27E6..U+27ED are intended for ordinary mathematical use of these particular bracket types. They are unambiguously narrow, for use in mathematical and scientific notation, and should be distinguished from the corresponding wide forms of white square brackets, angle brackets, and double angle brackets used in CJK typography. (See the discussion of the CJK Symbols and Punctuation block in Section 6.2, General Punctuation.) Note especially that the “bra” and “ket” angle brackets (U+2329 LEFT-POINTING ANGLE BRACKET and U+232A RIGHT-POINTING ANGLE BRACKET, respectively) are deprecated. Their use is strongly discouraged, because of their canonical equivalence to CJK angle brackets. This canonical equivalence is likely to result in unintended spacing problems if these characters are used in mathematical formulae.
The flattened parentheses encoded at U+27EE..U+27EF are additional, specifically-styled mathematical parentheses. Unlike the mathematical and CJK brackets just discussed, the flattened parentheses do not have corresponding wide CJK versions which they would need to be contrasted with.
...
So maybe one could justify mapping the "bra" and "ket" angle brackets (maybe not though as they are canonically equivalent to the CJK ones), but definitely the CJK ones shouldn't.
https://github.com/typst/typst/issues/5401 Isn't this issue just due to codex currently not using emoji variation selectors? In which case it's a duplicate of one of the many issues open for this in Codex
IIRC there was more to it but I may be wrong
I don't think there's anything special there other than lack of variation selectors and perhaps an unfortunate fallback chain
Could probably be closed
I didn't read it too thoroughly so I'll take your word for it
I finally got round to emailing the NewCM maintainer about the roubdhand/chancery variation selectors. He got back to me and said he'd be interested in adding it but needs to read into it more.
I thought someone already created a fork of it supporting the selectors
very possible I'm misremembering
I'm wondering if we need to find additional people interested in codex. Requiring 3 reviews when we have like 4 people total actively involved doesn't seem sustainable.
I made a terrible fork updating only one character to test my PR, maybe that was it? 😂
maybe
😅
Ah I haven't looked at reviewing stuff recently. I'll go through the open ones today (hopefully)
I think you're already doing more than enough
that's what I meant about sustainable
The main problem is that this adds a lot of latency. Also, I feel bad when I have to tell people to go review PRs because I don't want to give more work to contributors who might have IRL stuff going on (or just don't want to contribute for now).
I just realized that 3 reviews is in fact reviews from all four, since the person who creates the pr is out
if we don't have enough people, we can drop the 3 back to 2
Maybe we just leave anything breaking up for longer then?
like wait a month or something
@midnight tangle
or is that too slow
You mean, allow 3 and 2 review for resp. breaking and non-breaking changes, but keep non-breaking changes open for longer?
If so, I think this could be a good compromise. Also, always starting a discussion here when opening a breaking PR would be a good idea I think (we already somewhat do that)
I don't think slowness is an issue. Symbol changes are often unrelated so breaking changes don't often block other things, and there is no point in merging PRs fast because Typst doesn't release a new version very often
no, 2 reviews for everything, but leave breaking changes open for longer
see the message above mine
oh yeah sorry this is what I meant
I don't know exactly how it works in the rustc world, but essentially this would be similar to having a final comment period for breaking pull requests
E.g. https://github.com/typst/codex/pull/59 has been open for two months with essential consensus. I'd just like confirmation that you guys are okay with the final change (add the currency sign, and use generic instead of general)
there is no real hurry, but it would also be nice to not have a scramble in order to get everything done right before 0.14
I my case the reason why I did not respond with another confirmation is because this kind of PR is hard to review in the sense that I either have to trust you (which defeats the purpose of reviews), or check everything which takes a lot of time. And even then, I just don't have a strong opinion on the changes because I am not knowledgeable enough. Maybe sometimes we should accept that there will be no strong opinions, and (1) express our weak opinions and (2) treat weak opinions as strong when there is no strong opinion.
I don't like my phrasing for (2) but I hope my point gets accross
you did
For symbol PRs (i.e., non-meta PRs), it is probably fine to not worry about Typst release cadence. It's often very much okay if some PR does not make it into the next Typst release. With the exception of symbol additions based on user request, and major breaking changes such as sect -> inter
I think it might be time to research the term translations that LaTeX uses instead of continuing to have translation PRs trickle in one by one, with new terms decided by a random user that speaks that language (and the inconsistencies that result from this). C.f. https://github.com/typst/typst/pull/6519 (I posted this here because it conceptually fits with the broader subject of the forge.)
We partly ended up doing that for the "page" translations
I have https://github.com/typst/typst/issues/2671 open, but it's more narrow in scope
hm, I just realized that we didn't handle the peso breaking change in any way. Any previous use of peso as ₱ will become $
not that I think it will affect many users, but it's still unfortunate
I'm sorry if this was answered already, but wouldn't it be better to just have peso be ₱. If users want $, they can already use dollar
That's what I had in my original version of the PR, but @storm whale argued that we should have peso and pataca as aliases for dollar and yuan for yen
I guess we could revert the change to peso specifically
I think the motivation was that a bunch of countries use the peso (with a much bigger population than the philippines), and the symbol even predates the dollar.
Ok. I understand that the choice map peso to ₱ might then be "controversial". At the same time, having two symbols mapping to the same character, and having to use a variant of one to access a different character, seems like a waste
I don't have very strong feelings either way. I don't think there is a perfect solution
let's hear what @storm whale has to say
I mean pataca is so unlikely to collide with another symbol in the future, so I don't think there's much harm in having it. I figure we want to try cater to as many different languages, where possible (especially for naming symbols that are specific to a country/language). Population wise, it's most likely a user will type peso and expect $, instead of ₱. The Philippines is the only country that uses the word peso and not the dollar $ for it.
I don't see much issue having two symbols mapping to the same character. At least, I think for these currencies there are valid reasons to have them - that is, they have multiple names depending on where they are used.
I'd be happy to do some reviews if that helps
I'll give you permissions if you let me know your GitHub handle. Then you can also approve PRs.
it's @knuesel
@grizzled granite if you want to see something similar to Unicode, but much better organized, take a look at the SMuFL specification. For example: https://w3c.github.io/smufl/latest/tables/time-signatures.html
SMuFL is a standard that uses Unicode private use areas to encode musical characters in music fonts for use in various music notation softwares. I discovered their spec recently, and it is so great compared to Unicode's: recommended ligatures and variation sequences, as well as implementation notes, and also related blocks (which they call "groups"), are in together in the same place.
I don't know if they provide all this information in a machine readable format, though.
why couldn't we have that for unicode :/
I've sent an invite
More than four people have permissions by the way. I'm not sure who is still active. Current people with permissions are dccsillag, emilyyyylime, you, Malo, mkorje, and T0mstone. And now sijo.
Only the final four are actively involved
That may change of course, but that's the reality right now
what is currently missing for text/emoji selectors?
I'm currently not very active, but was meaning to try to get back into it (hopefully by ~next week, when I'll finally have some more time)
the hope is that I'll be able to find a flow for it, so as to be able to contribute even when I'm busier
I think nothing. But I would recommend not changing #78 and instead creating a separate PR for that, as it is pretty much ready to be merged.
https://github.com/typst/typst/pull/6489 is still open
No pressure!
Oh yeah that's right. But at least from Codex's side, there is full support for multi-codepoint symbols
Yeah do we want to require symbols to be a single grapheme cluster for now? Because if typst requires that, we'd also have to add it as a guarantee to codex. (Since such a guarantee wasn't part of #92)
Additionally, there are the two points of the code where I don't know how to proceed (typst-docs and typst-ide)—I'm hoping for @past python to step in and say what should be done there.
We control what symbols we add in Codex so there is no urgent need to change the code on the Codex side (appart from docs maybe)
yeah adding a sentence to the docs is what I meant
@tall quail In case you did not know, the way to formally approve PRs is through the "Review changes" button in the "Files changed" tab.
@silent yeah I was wondering if I should do formal approvals here: for the emojis there were already 3 and for the angle brackets I think the PR is marked only as waiting @past python 's approval
damn how is tihs silent thing supposed to work?
we've moved away from using him for approval
Breaking PRs no longer require approval from him
You need to use @silent at the start of your message I think
I think it's too late now
You've already summoned him
damn let's run for cover
@midnight tangle regarding your comment on #68 you mean between single and double brackets? Or indeed between brackets and quotes (but those are different families under different symbols?)
brackets and quotes
My question is, if some random user types chevron, is it more likely that they want the quote or the bracket?
It's also good for consistency: many other brackets have their own top-level symbol (paren, brace, bracket...)
Regarding #75 (dominos) I'm not sure the trick with submodules for order-dependent modifiers is a good idea
Breaking commutativity seems like a big deal (and the gender PR was careful to emulate it with duplicate definitions .male.female and ..female.male). The user probably won't know or care about submodules vs modifiers so this could be confusing
It's not only for order-dependent modifiers
In any case we've discussed making order dependence a thing, but only to resolve ambiguous situations.
We should likely finish that discussion first
Not like dominos are hugely important
I've come to really dislike stroked.
not because of the discussion in https://github.com/typst/codex/pull/93 but because the word just isn't very good
it's at least accurate, and consistent with fill and stroke in the Typst API
though I would also prefer to make the symbol top-level instead of a stroked variant
yeah but the adjective stroked means something else
Would it be a good idea to separate https://github.com/typst/codex/pull/69 into smaller PRs?
Yeah either one PR per change, or remove the controversial changes from this one and open one PR for each of them
In other words, either one PR per change, or one PR per controversial change
That way we can merge what there is consensus on
Yeah this is what I meant
which meaning do you have in mind? I can only find https://en.wiktionary.org/wiki/stroked which is rather niche, I think "stroked" is generally understood as a past form of "stroke"
I'm not sure why people mention past tense. We're not talking about a tense of the verb stroke, but an adjectivized form of the noun stroke
technically a past participle:
a nonfinite verb form that has some of the characteristics and functions of both verbs and adjectives.[1] More narrowly, participle has been defined as "a word derived from a verb and used as an adjective, as in a laughing face".[2]
But I don't think this distinction is very meaningful. The point is that "stroked" is derived from "stroke" and has the correct meaning
(There are other meanings but I don't see why they would be more of a problem than for stroke)
I think we should try not to have too many "active" open PRs at once, as I feel like it makes discussing them harder: you have to context switch between many different topics and concerns.
Or maybe it's just me struggling to remember everything in which case it's not that bad and I can just read the comment threads again if that lets us move faster.
Okay, I can close them again. I wasn't planning to create more, but randomly came across some issues
We can keep them open
I'll close https://github.com/typst/codex/pull/109 for now at least, since it's not fixing an issue unlike the other ones.
We may want to close https://github.com/typst/codex/pull/9 too, considering there hasn't been activity for five months?
Let's ask the author before closing it
I just discovered that stix two math has hundreds of otherwise missing symbols in the private use area
Is there a list available?
Yeah, one moment
In particular the variants of the Greek and blackboard bold letters would be super great
Shame
@midnight tangle I think .alt or .slant could work, but there really needs to be a more ergonomic way of using them. Having to separately redefine all of them sounds miserable.
Maybe the solution would be to have some way to specify which version to use by default, when using just, e.g., lt.eq.
Ok I have an idea
Yes that is essentially what I meant
There's also the question of what to do with the symbols that only have the slanted version...
Of which I think there's a couple at least?
If we were to implement #6028, then lt.eq could return an element with a slanted property, which could be set using a set rule. The same element would be used for other symbols with possible slanted things.
The problem is that this would break the symbol -> str conversion
that's annoying
Now that I wrote it I no longer like this idea
I think it would essentially have to be a property on eq, equiv and tilde if we wanted to go that route
I gotta go now, but I think the solution to our problem is not in trying to find a short modifier, but rather in making the version to use by default configurable
I'd like that
@pastel violet are you still intending to do anything with https://github.com/typst/codex/pull/9 ? No pressure, just wondering if we should leave it open or not. Of course, even if it's closed you are welcome to open another one.
Oh I forgot about that feature. It shouldn't be much work to adapt it to what we decided. I'll get it on my todo list 👍
Maybe I misunderstood @midnight tangle , it seemed completely impossible to reach a consensus about the previous pr
You mean on the .slant issue?
If so maybe you are right that it's better to add the remaining symbols now and figure out a better way to change the default later
Yeah
@past python just as a heads-up, PR typst/codex#96 contains some minor documentation changes. Would you like to be notified about those minor meta PRs in the future, or only those that affect the public interface?
Let's discuss whether symbols should be required to be single grapheme clusters at typst#6489.
@storm whale https://github.com/typst/typst/issues/6583 I made an issue
I briefly looked at the code, and renaming it shouldn't be so hard, but I have no idea how deprecation would be handled
I thought this was #1176478139757629563 , but apparently not
No need to notify me for stuff like this.
why does just merging main into my branch make the reviews go stale? https://github.com/typst/codex/pull/100
am I doing something wrong?
Maybe rebase won't cause them to go stale? I've got no idea though. I'll reapprove
Technically, it could break the code, so perhaps GitHub is just being defensive here.
Also, the merge commit could contain arbitrary further edits
Same goes for rebase, so I'd assume it behaves the same
@midnight tangle do you have an opinion about what to do with https://github.com/typst/codex/pull/68? It's been open for almost two months.
I think it is just missing a review, right?
Depends on whether we want two or three, yeah
but I don't want to rush anyone
I suspect 0.14 is some way off anyway
Maybe @lapis moth has more thoughts about it, as they were the only person to raise a concern about the change
I think they already voiced their opinion
In general, I think just sending links to PR that are missing review in this channel is ok I'd say. We are few people actively involved, and we know that no one's goal is to rush people.
I understand that asking for reviews here feels like you are rushing people, but most of the time we just forget about a PR. And if we don't have the time to review it we know it's okay
pretty much, yeah. I still don't like the name and was personally mostly fine with angle.
In my personal math package, I've defined them as a function called ang, so we could also use that in theory, but I also understand if that's too unclear.
Not really sure if withholding an approval based on a personal opinion is warranted... I think we do have enough people to be able to merge this without me right?
@past python Just letting you know if you don't already: I'm pretty sure #105 still needs your input since it's a substantial change to how symbols work, in codex and later also in typst (which is also already part of of the discussion thread)
Thanks. I hadn't seen it because I'm not watching the repository anymore.
I'd like to take a closer look at this at some point as well. I hope to find the time next week.
@midnight tangle I hope you don't mind that I edited your message in #113 to remove a typo
Should the emoji flag sequences live in codex? I seem to recall @past python mentioning that they are problematic
Still, it's important to be able to use regional indicators I think
could you elaborate what "regional indicators" are? I'm not up to speed right now.
The regional indicator symbols are a set of 26 alphabetic Unicode characters (A–Z) intended to be used to encode ISO 3166-1 alpha-2 two-letter country codes in a way that allows optional special treatment.
These were defined by October 2010 as part of the Unicode 6.0 support for emoji, as an alternative to encoding separate characters for each...
You have 26 of them, and you can combine two to correspond to a region
Most emoji fonts display that as the flag for that region
ah okay just these, I see
Now that we have multi character symbols we could have the sequences in codex
my last position on this was that we (probably?) don't want to define all 26^2 combinations, so we'd need to decide which are actual countries, which I'd prefer to avoid doing
but I remember someone saying that Unicode might actually state which are valid?
idk whether I remember correctly
The CLDR specifies that
Its just the iso codes with a few modifications
then I guess it's probably fine
Alternatively, maybe we can find a way to have emoji.flag.xy be valid for any x and y lowercase Latin letter, without having to actually define all symbols?
That's what Laurenz didn't want
It's also not a good experience for users
It should only be valid for combinations in the cldr
There's also the question of whether we want to use "flag". I think "region" would be better
I meant, without having to actually have them all be predefined variants. Instead, emoji.flag would be special cased to generate its variants on demand
They're very adamant about fonts not necessarily using flags
It can just be some other thing
The problem is that those evolve with time
Yeah sure
TIL
I didn't mean manually adding them necessarily
But we should rely on the cldr
I assume we're already using some crate with this information?
We don't check text.region for this exact reason IIRC
I think that there is a guarantee of backwards compatibility. It's just that some codes are marked as deprecated
If there is such a guarantee then I would be okay with only defining flags that correspond to defined country codes
Tbh, that might be a valid retro-fitted reason, but I don't think I explicitly thought that back when I didn't implement it
I honestly think that should be checked too, but that's a different question
Deprecated regions are included in the list of valid region sequences so that deprecations in the future do not invalidate previously valid emoji flag sequences.
Not sure why they are referred to as flag sequences here, when they're adamant in other places....
Right
Although a pair of REGIONAL INDICATOR symbols is referred to as an emoji_flag_sequence, it really represents a specific region, not a specific flag for that region. The actual flag displayed for the pair may be different on different platforms, for example for territories which do not have an official flag. The displayed flag may change over time as regions change their flags and platforms update their software.
So region.xy is likely the safest bet
Not sure about that, though. In practice, they are almost always flags. flag is also easier to discover than region
yeah but no fonts have them
Few or none?
None that I know of, tho I may be ignorant
I haven't ever seen a Hamburg flag emoji for example
🏴
The uk subregions seem to be widely supported
At least
The three uk subregions are the only ones marked as "recommended for general interchange"
Most likely because font support for everything else is lacking
Chicken or egg situation....
yeah
https://www.babelstone.co.uk/Fonts/Flags.html @lapis moth
this one has support for 73 subregions
I would tend to agree
region is very undiscoverable and if I'd read it in a document, I wouldn't know what it means as a user
Let me remind you of #114, which is almost ready to be merged, with the exception of an open question regarding a test: currently, I synchronously query the list of presentation sequences defined by Unicode from the internet every time the tests are run, and I'm not sure this is the best way. Notably, the tests may start failing if Unicode updates their list of variation sequences (although that can be solved easily by pinning a specific version of Unicode).
Note that the completeness of the added tests means you don't have to review each individual changes to the symbol lists, as errors would have been caught by the tests.
I would like to specify a proper license for the Proposal document. I was thinking of CC BY-SA. @pastel violet since you are the only person other than me who contributed to the document, would you be okay with that? Or would you prefer another license?
I have no skin in this game, but I like CC-BY-SA.
Is this because of the benchmarking @midnight tangle ?
In that case you should just make sure that the "SA" part is compatible with what Laurenz wants it for
This is what made me remember we did not have a license for that, but in general it is useful to provide documents under appropriate licenses
something something I hereby officially release all edits past and future made by myself to the aforementioned "Proposal Document" under the Creative Commons BY-SA license
Thanks!
@brisk geyser as in, any emoji which has two forms gets an emoji selector in emoji and text selector in sym
ah okay, sounds reasonable, though as I mentioned from what I remember, the text variation selector can only be applied to certain emojis
but I might remember wrong
Yes that is correct
I'm just saying there are some conflicting efforts here, so it would be a good idea to not immediately go ahead with a pr 🙂
I opened a pull request to typst/codex to initiate a discussion on the implementation: #116.
Would it be better to use "han" instead of Chinese?
It's shorter, and they're also used outside of china
I know nothing about Chinese numerals, but Wikipedia does not seem to refer to it as Han numerals anywhere on the Chinese numerals page.
Chinese numerals are words and characters used to denote numbers in written Chinese.
Today, speakers of Chinese languages use three written numeral systems: the system of Arabic numerals used worldwide, and two indigenous systems. The more familiar indigenous system is based on Chinese characters that correspond to numerals in the spoken languag...
Also, does it make sense to convey upper/lower through case? (like I did for Latin vs. latin)
No, but the alphabet is known as han/hanzi/kanji/hanja (depending on language) afaik
ISO 15924, Codes for the representation of names of scripts, is an international standard defining codes for writing systems or scripts (a "set of graphic characters used for the written form of one or more languages"). Each script is given both a four-letter code and a numeric code.
Where possible the codes are derived from ISO 639-2, where the...
Ok
Maybe we could get some input from actual CJK users
Yes that would be the best I think
For this PR, I think the main goals are to figure out naming conventions and assign a name to existing numeral systems. Adding the missing ones can be done later
The ISO standard I linked may be useful. Though it's for scripts and not numbering directly
I haven't read it, but this Wikipedia page may be useful as well: https://en.m.wikipedia.org/wiki/List_of_numeral_systems
There are many different numeral systems, that is, writing systems for expressing numbers.
I gotta go now. We can discuss more later
I think that makes sense, but I don't know if it should be Chinese.simplified or chinese.Simplified?
-# it's not an alphabet, it's a logography
Fair enough, but you understood my point at least 😉
Another option I just thought of for the variation selectors is \vs{15} and \vs{16}, or \vs{text} and \vs{emoji}, which is more coherent with the \u{XXXX} syntax
Cf. https://github.com/typst/codex/pull/114
Actually the \vs15 you proposed is probably best, because the number inside the braces for Unicode escapes are hexadecimal, but VS numbers are usually expressed Un decimal
I don't think of the braces in \u{} as denoting "hexadecimal", but rather just delimiting a number that can be arbitrarily zero-padded (i.e. \u{10} and \u{0010} are the same thing, whereas \u10 and \u0010 seem like different identifiers), so I'm personally in favor of \vs{...}. Tho ofc it's not necessary since these are only two-digit numbers at most, but I really like the \vs{emoji|text} idea, so it's still good for consistency.
I think the point was that the 10 in \u{10} is actually 16
Yes and part of my point was that that's a property of \u, not of {}
Well now I'm sad
I've been using https://babelstone.co.uk/ for various things
Apparently the guy who maintained it died a month ago
@grizzled granite It's so curious to me that you it is you that opened https://github.com/typst/typst/issues/6355 even though you've previously often argued against new math shorthands. Does the situation feel different to you in math?
Shorthands are kinda tricky because basically any change is breaking.
Especially with -! and -?, I also wonder whether they encourage mixing writing and formatting.
Yes, very different. The shorthands in text are used in a very regular manner. This one also felt unlikely to interfere with anything you would write naturally, and is a natural companion to -?
Regarding shorthands in math, I think I've been pretty consistent in saying that the main issue is that the rules for where they apply are too flexible, leading to ambiguity.
@past python regarding presentation selectors, there is an issue which is that a font that can display a character X will always be able to display both X\vs{text} and X\vs{emoji}. I believe this is by design, as it means a cluster can be unable to be displayed just because the font doesn't support a specific presentation.
But for us, this means X\vs{emoji} won't fall back to the emoji font if X is supported by the current font. I'm not sure how to solve that.
I won't have the time to investigate that more sadly.
yeah, that's annoying
That first sentence is missing a negation somewhere, right?
otherwise I don't follow
Would it even be possible to tell whether the shaper has selected a glyph using the variation selector or not?
Surely there has to be a way to check whether a font supports a specific variation sequence
The problem is that AFAIK every font that supports a character supports all its variation sequences by design of Unicode
The problem is that the main text font doesn't fall back to the emoji font because its non-variated glyphs (i.e., text presentation glyphs) can be used for emoji presentation sequences as well
But not in practice though
That's a stupid design
I disagree: it's often better to see something in the wrong presentation form than not seeing it
Well yes, but it should be possible to preferentially use a font that actually has the specific presentation
I don't know if Unicode says anything about font fallback though
I don't see why typst shouldn't be free to have that feature
Or are you saying we would be in violation of something?
This would also be useful for scr
and other variation sequences in math
optional of course
I'm not too informed on the matter, but I presume the challenge would be a technical one, since there's probably no way to tell whether a variation sequence came out correctly
Best way I can think of would require explicit searching of the relevant font table (not sure which one is responsible for variation sequences) to see if the one we want to shape is included or not
What I'm saying is that I don't know whether Unicode says anything about how to handle the situation where there is a list of fonts instead of a single one
Ideally though we'd want to avoid that ig (after all that's what the shaper is for)
https://docs.rs/rustybuzz/latest/rustybuzz/struct.UnicodeBuffer.html#method.set_not_found_variation_selector_glyph there is this is rustybuzz, so maybe we can just use this?
A buffer that contains an input string ready for shaping.
@midnight tangle is there something like \ vs for zwj? It'll be useful for emoji zwj sequences
Do you mean in our DSL?
If so, no. But I would say just copy and pasting the characters is probably easier
We could do that for variation selectors as well actually, not sure what's better
But then it'll just look like one symbol
Yes
I think explicit is better
Maybe we should disallow non-printable characters in the DSL and force using explicit escapes
Having invisible symbols everywhere isn't great
Not sure whether "non-printable" is the right term, but you get the point
Yeah
idk, I kinda partially disagree. Sure, for invisibles it's a good thing and this is why we have it to begin with, but for zwj sequences I'm not too keen on writing them out explicitly...
I'm mostly talking about the exceptional zwj sequences for specific symbols, not gender and skin color
I think the distinction is between things that are (a) freestanding and invisible, (b) just modifications of an existing thing and (c) a completely separate thing that just happens to be made from smaller constituents, and I think escaping a and b is fine, but I dislike it for c.
Those would be automatically generated presumably
It just doesn't seem maintainable to me
Also if you put in the actual zwj symbols your editor will render the resulting emoji instead of the constituents
To give an actual example, I wouldn't like having the pirate flag as 🏴\zwj☠\vs{emoji}
I would like that
Is there even a text variation of it? Is the emoji variation selector necessary?
idk, my input method added it
are we talking about how users could write these in Typst strings? or files internal to codex?
Internal files
ok and what's the alternative to 🏴\zwj☠? If you replace \zwj by an actual word joiner it would just look like a pirate flag so you don't see how it's made of different code points
or am I missing something?
it seems useful to have at least the option of showing how a symbol is built
This is pretty much it
Disallowing invisible characters forces us to make sure we input the right thing
As a heads-up: We are slowly preparing for an upcoming 0.14 release candidate and to reduce last-minute stress as much as possible, I'd like to already cut a codex release fairly soon. I merged the presentation selector PR just now. Is there anything else that should still land?
Thanks for the heads-up! I don't think anything important is ready to be merged for now
Unicode 17 is supposedly going to be finalized today
It adds some interesting arrows (some of which might be hard to name), but more importantly some emojis we might want to have in Typst 0.14
I don't think waiting for 0.15 is a big deal. Font support is going to be lacking anyway.
True, but if we can do it earlier it's probably better.
https://github.com/typst/codex/pull/85 can be used as a basis
Up to @past python I guess
doesnt this require support for unicode 17 in rustybuzz first?
I'm going to go with yes considering you're asking 😂
Is rustybuzz basically in maintenance mode because of harfrust?
dont think anyones gonna update it
im not sure honestly, maybe it doesnt
Ok so if an update is required for Unicode 17 we would need to move to harfrust first
probably
do you happen to know how it's looking regarding usvg moving to harfrust?
its going to happen in some shape of form but I dont know when and also how
(as in, if its gonna replace usvg or if it will be a new crate, etc.)
new crate because harfrust uses proc-macros and RazrFalcon doesn't want those or what else would be the reason to fork?
that would be sad for the ecosystem
and it wouldn't help anybody's compile time either if the original usvg is just unmaintained
sad indeed
I don't understand the deal with giving up the maintainer role, but then still insisting on making maintainer decisions
what does new emoji have to do with the shaper? do shapers shape emoji
I think the data files must be updated for it to be aware that a character actually exists
but don't quote me on that
Yeah I think so too
@past python I won't have the time to implement better deprecation warnings for symbols. IIRC, @lapis moth's solution may make it possible to implement your dram warning so that may be a good starting point.
okay
though I think my dream warning would also require changes in codex
since the message is different
Oh right I didn't notice that
Harfrust was just updated to Unicode 17
@midnight tangle What do you think about deprecating modifiers instead of variants? I have a draft of that working. The diff on sym.txt would look like this: https://github.com/typst/codex/commit/ea0677a6f1319dd6ac4a36165ed94e0a8298d823
It is a tiny bit less flexible because you can't deprecate a single variant when all of its modifiers are also used in other non-deprecated variants, but in the existing deprecations I haven't found a case where that would be necessary. And the warning messages are much clearer (no duplicated warnings, the deprecation message & span are both related to the modifier, and the deprecation messages don't reference modifiers that you haven't actually applied). On sym.txt, it also leads to some deduplication.
On Typst, it would look like this:
warning: the `double` modifier is deprecated, use `stroked` instead
┌─ hi.typ:2:38
│
2 │ $ bracket.stroked.l ast.small bracket.double.r $
│ ^^^^^^
Instead of the current version:
warning: `bracket.l.double` is deprecated, use `bracket.l.stroked` instead
┌─ hi.typ:2:38
│
2 │ $ bracket.stroked.l ast.small bracket.double.r $
│ ^^^^^^
warning: `bracket.r.double` is deprecated, use `bracket.r.stroked` instead
┌─ hi.typ:2:45
│
2 │ $ bracket.stroked.l ast.small bracket.double.r $
^
The duplication needs to go either way of course, but I think even if we would deduplicate the current version, it's a bit less clear than talking specifically about the modifier in the message.
Personally, I like the idea, but also think we should keep the option to deprecate variants around.
The fact that it would not be possible to deprecate a single variant seems like something that may be limiting in the future. Essentially, this means deprecations can only be used to rename or remove entire modifiers instead of variants, as you noted. This is already a bit visible in the planck.reduce case, where the variant planck.reduce is renamed to planck, instead of the modifier itself being renamed (same with circle.nested being renamed to compose.o)
and we already have the code for it, so removing it and later noticing we do need it after all and have to bring it back would be a bit of a waste of work...
in theory we could also extend it such that a modifier set can be deprecated, i.e. it triggers once all modifiers in the set are in the symbol
I dislike a bit how much of the API surface of codex would be just deprecation handling if we kept both
This seems too inflexible, since there are other reasons for deprecation than just renaming a modifier
And having various different deprecation mechanisms can also create confusion which one is the appropriate one
You may want to split a modifier into two different ones for instance, since it may be used for two slightly different things and we suddenly need to make that distinction clear
We may also have introduced a symbol in the wrong place, or it shouldn't have been introduced in the first place
the problem is that the error message needs to appear on some modifier in the end
and deprecations on variants don't have that information
This seems almost equivalent to what we have now
I guess we could've been better at writing the deprecation messages, but they quickly get very long
it is not because you can decide to use only part of a set so it sidesteps the duplication problem
essentially you only emit a warning when going from non-full coverage of the set to full coverage
Isn't this what @lapis moth's solution is doing?
what do you mean with T0mstone's solution?
I mean the solution proposed here: https://github.com/typst/typst/pull/6441#issuecomment-3002011213
Which changes the implementation of warnings on Typst's side to only emit the warning the first time it occurs for variant deprecations.
My proposal is a bit different. I mean to change the code here: https://github.com/typst/typst/blob/721a7b18dd105e9255dc45f1af1510465e78bf02/crates/typst-library/src/foundations/symbol.rs#L157-L159
Currently, it warns every time a symbol is modified and the currently resolved variant is deprecated.
If, instead we could deprecate a combination of modifiers like this:
@deprecated(chevron.l): `chevron.l` is deprecated, use ... instead
quote
.double "
.single '
.chevron.l.double «
.chevron.l.single ‹
.chevron.r.double »
.chevron.r.single ›
.angle.l.double «
This would turn into a deprecation set S = { chevron, l }
And then in the code linked above, when we have current modifiers M and the new modifier m, we check whether the new modifier brings the set to full coverage, i.e. |S inter M| < |S| but |S inter (M union {m})| = |S|.
Perhaps though, we could use the same approach with the existing deprecation warnings? I.e. check whether the symbol already resolved to the same variant as an indicator that the warning was already emitted?
That might be simplest!
The warning message would still be slightly less clean than directly deprecating a modifier (for that kind of deprecation), but I think if that works it would be satisfactory.
This could still cause an issue in the following case:
foo
@deprecated: ...
.bar
.bar.baz
Writing foo.bar.baz would cause a warning at foo.bar
That seems kind of unavoidable
At least without major hacks
I.e. let x = foo.bar; x.baz behaving differently
I think we should just avoid such a thing
I think it might still make sense to change the deprecation messages in codex in the case where a modifier is deprecated. E.g. just changing "bracket.l.double is deprecated, use bracket.l.stroked instead" to "the double modifier is deprecated, use stroked instead". Then, we get the best of both worlds.
I think this is a narrower view of what a modifier is than how they're used in practice
What is "this"?
Your proposal would only emit a single warning in the following case, with foo.bar.baz, where #1277628305142452306 message would emit two warnings.
foo
@deprecated: `bar` is deprecated
.bar
@deprecated: `bar` is deprecated
.bar.baz
Maybe it makes sense to be able to deprecate modifiers and variants independently
Like, "double" isn't deprecated. The actual change is that it's moved to "stroked" for the sake of consistency
Other delimiters still have double
Okay, so this was in response to my message #1277628305142452306 message?
I mean, it was more general, but that's one example
the exact wording could be different, but I'd like to avoid that bracket.double.r gives "warning: bracket.l.double is deprecated, use bracket.l.stroked instead" (note the difference between l and r) because bracket.double is the first warning thing and resolves to the l variant
that could be avoided by a different wording
because in essence l and r can have the same deprecation wording
#1277628305142452306 message could also be adjusted to avoid warning when any previous modification already yielded a deprecation warning
two deprecation warnings on one symbol would typically be confusing anyway I think, even if it would technically make sense
Then I can't think of a case where this wouldn't work if we word warning messages carefully
on a technical level, it could even be a bool that essentially says "this symbol already emitted a warning" instead of the set stuff above
not sure which would be cleaner, but that's just an implementation concern
how would you phrase the warning?
One possible case where this could be problematic is if an unimportant warning shadows a more important one, like in the following case with foo.bar.baz.
foo
@deprecated: `foo.bar` specifically is deprecated
.bar
@deprecated: `baz` is deprecated, use `buzz` instead
.bar.baz
.bar.buzz
I expect this kind of situation to be quite uncommon though.
I think so too. I think the approach is definitely good enough for 0.14.
That's all I want 😄
I don't know, but I fear that just saying "double" has been deprecated can be misleading
it's specifically these variants that are deprecated, not the modifier
obviously not the end of the world, but still
could just be "bracket.double is deprecated, use bracket.stroked instead"
just omitting the irrelevant extra modifiers
It might indeed be enough for 0.14, but I believe there are future changes that need more flexibility. I guess that's a future problem though
@past python to be clear, are you planing to fix the warnings yourself?
yes
Feel free to take a quick look checking whether this is how you've imaged it if you have time: https://github.com/typst/codex/pull/118
It's a pretty short diff
I'll take a look at it later today
@grizzled granite Your PR https://github.com/typst/codex/pull/68 seems to have reverted the change from sq to square made in your other PR https://github.com/typst/codex/pull/110. I assume this was not intentional?
Also, we now have both angle.azimuth and angzarr. Is that intentional?
Oops. No it was not
Yes it was
It was only recently that it was discovered what the meaning of the symbol was
Would you like to make a PR or should I handle it?
What do you prefer? I won't have time to get to it right away
Sorry for creating a mess!
If you don't have time, I'll just take care of it. No worries.
Re #120: Has someone checked that all the emoji listed here are correct? Because my font is apparently totally scrambled for some reason😅
looks correct to me, your emoji font is probably outdated
Downgrading fixed it, apparently Noto Emoji 2.051 is totally bugged😅
If I approved the PR that added them I should have checked at that point. Haven't checked the changelog though
yeah I only wanted to check that they were correct in the changelog.
this reminds me of bad LLM output 😛
For codex by the way I think keeping the changelog up to date with changes would be much more helpful and worthwhile than for Typst as the structure is much more fixed. Creating the changelog was 1-2 hours of rather mindless work, so it's easy for mistakes to slip in.
Perhaps it's still not worth it, but if it's worthwhile in any of the Typst repositories, then here.
We can try do maintain the changelog with each new PR in the future
yeah, shouldn't be a problem, I try to do this with a lot of my personal projects too.
We should probably not merge any major change until Typst 0.14 is released, just in case a bug fix needs to be made in Codex. But of course we can discuss future things without waiting!
Well, it hopefully won't be that long anyway😅
Not sure if you meant to include reviews in "discuss", but those can ofc also already be given and we can just wait with merging the PRs even if they end up already having enough approvals.
also #112 can be merged immediately since it doesn't touch the functionality.
Tho it might need @past python's approval. (Also I hope this doesn't ping. I recently had a moment where I got pinged even tho someone used @silent)
Yes this is what I meant
His approval is only needed for changes to the API IIRC
Or more generally meta changes
Oh right sorry I confused the two PRs
we can merge large changes on main and still fix bugs in a release branch...
even after Typst 0.14 is released, there might be a 0.14.1 later anyway so we must have a way to deal with that
Yes, but I think it's easier to wait a few days before merging PRs than to cherrypick some commits into a new branch
We can reopen #79, right?
And are there other PRs that we had closed due to the multi-char symbols thing?
I'll just go ahead and reopen them then
Btw there is an emoji named unknown, which I suspect was added by accident at some point. It maps to U+D83E Emoji Component White Hair
Probably good to get that deleted before 0.14.0
I don't think it matters that much, more likely than not I'm the first person ever to notice it. I don't think the burden of making a new release is worth it
oh right the release, yeah that's fair. Do we just silently delete it the next version then?
Doesn't have to be silent in the sense that we can write it in the changelog, but yeah it can probably wait
By silent I meant without deprecation
Then yes probably
A deprecation probably wouldn't hurt either, but it's also pointless
For reference, emoji.unknown was added in the first commit so it probably came from Typst: https://github.com/typst/codex/blame/3819cb50153513b9b2e07d56dd7a5957cde3d230/src/modules/emoji.txt#L1405
I saw this coming from miles away
lol
Seems like it's been there since the beginning btw
or, well, apparently they were part of a different repo called "symmie" before, which has since been deleted.
(kind of funny how we're now back to a similar structure. history really rhymes lol)
That's mentioned in my PR
There's a lot of old weirdness
Similar structure?
separate crate for the symbol definitions
Oh you meant symmie was a typst thing
yep it was
codex is a much better name though
After more than 7 years of inactivity, Unicode recently published a draft for the next version of UTR #25 "Unicode Support for Mathematics", which among other things is the document that defines math classes.
https://www.unicode.org/reports/tr25/tr25-16d2.html
It's mostly a port to HTML from PDF (it was the last UTR to still use PDF), but still amazing to know that they are working on it
ermahgerd (edit: spelling)
The link to the math classes data file is broken for me
Yeah I can't access it either
There doesn't seem to be any data file associated with the draft: https://www.unicode.org/Public/math/
Probably either not changed, or not public yet
@midnight tangle bracket became bracke in your pr
Oh right, thanks
Maybe use brack for all of them?
Since both paren and brace are 5 letters
Maybe unnecessary
I prefer using full names when it's not unreasonable
Is mustache_rev something that's in use?
Oh right the full name is upper left or lower right etc
Yes
I assume that mustache matches the default math classes (open/close), while the reversed is opposite?
Yes, except the default math class is Relation, see https://github.com/typst/typst/issues/5764#issuecomment-2632435247
That's weird
But it should change in the next version of UTR #25 as per https://www.unicode.org/L2/L2023/23231.htm#177-C50
In addition to corners top and bottom we should probably have ones with diagonal corners as well?
That couldn't be done with callable symbols though
No?
Aren't you able to use any delimiter on each side?
What determines which function ends up being called is the underlying Unicode character of the called symbol
Oh nm it's not a symbol of course
I want to get back to codex at some point, but I'm super busy these days
We talked about it a while ago and never ended up doing it: I added pee for ℘.
https://github.com/typst/codex/pull/122
This was mentionned at this point: #1277628305142452306 message
If this isn't an acceptable name, there's pea. It's less natural, but at least doesn't have the unfortunate alternate meaning
Is something like weierp (or some variation thereof) already dismissed? It's not as consistent with ell, but avoids the unfortunate name
(pea feels indeed a tad less natural to me.)
I don't think that's a very good name.
it's not consistent with codex naming in general
the consistency in question is wrt ell or something else?
it's also only (sometimes) called weierstrass p due to the somewhat obscure weierstrass elliptic function
but can be used for other things than that
pee or pea is a more neutral name that is more consistent with ell
okay, I agree
I'd prefer pee, but I also understand why the team could be opposed to it
yeah, I think I'm with you here. the unfortunate clash does bother me a bit, but I do prefer it vs. pea
(which fwiw also has a clash, though it's not as crass)
Another way to be sort of consistent with ell would be epp :V
I opened a PR to update to Unicode 17. The update doesn't require any action on our side, so this is a trivial change.
https://github.com/typst/codex/pull/123
Also, @grizzled granite, when making the currency symbols PR, did you consider all symbols from the Currency Symbols block? You mentioned purposefully avoiding symbols for currencies no longer in active use, but does that mean all currency symbols were considered?
Also, the Currency Symbols chart mentions other currency symbols that aren't part of the block, some of them are not present in Codex. Did you consider them for inclusion?
The reason I'm thinking about that is because Unicode 17 added the symbol for the Saudi riyal, which is in active use.
Of course there is no rush so don't feel obligated to look into it right now if you don't have the time.
Yes, I considered every currency symbol
So I suppose it makes sense to add the Saudi riyal symbol as just riyal given that other riyal symbols are not in active use?
I guess I can't completely rule out the possibility of missing some, but I don't think so
yes
if we're just adding a symbol for a new glyph it's surely fine to not wait for the update to harfrust?
@grizzled granite
But it's not actually going to work is it?
New symbols need support in the shaper, as far as I understand
Obviously it's not a problem as long as codex isn't bumped in typst
idk, let me try it. the other issue will be font support
there are some fonts on github, no idea about quality
Looking at the harfbuzz update to unicode 17 it appears nothing relevant really changed (?), except maybe this static const uint8_t _hb_ucd_u8[] thing. But I have no idea what that is. https://github.com/harfbuzz/harfbuzz/pull/5534/files
The difference would be in the unicode data files no?
yeah, but idk if there's a record of every character in there though
Honestly I don't know, but my impression was that it wouldn't work without updating rustybuzz/harfrust
Haven't tried though
I tried to shape it with an older version of hb-shape (before unicode 17) and I can't get it to work
Oh wait no it worked ahaha its just the name of the glyph in the font is "[namenotfound]"
It does work! Try compiling
#set text(font: "saudi_riyal")
#let riyal = symbol("\u{20C1}")
#riyal
with https://github.com/emran-alhaddad/Saudi-Riyal-Font?tab=readme-ov-file
Moving to harfrust seems like a no-brainer anyway
yes ofc, though I recall it was blocked on resvg?
yeah but now that Unicode 17 is out it seems silly to resist
i'll merge the riyal codex pr
no one seems interested in updating rustybuzz
Didn't we agree on waiting a while? I don't remember how long when it's not breaking. On the other hand this is not a very complex PR :p
true, but as you say there is really not much discussion needed here 😅
also, should we merge the removal of deprecations now then?
oh crap forgot to update the changelog for the riyal
thats what i get for not waiting
You can just blame @midnight tangle
ill open a PR updating the changelog
I opened a new PR that adds a test for the validity of standardized variation sequences, similarly to how we already test for the validity of emoji variation sequences (i.e., presentation sequences).
https://github.com/typst/codex/pull/126
This PR updates the testing infrastructure to also test that standardized variation sequences are valid. I also reorganized build.rs to separate the processing of Codex module files from the part t...
I also took the opportunity to reorganize build.rs a bit
Sadly, GitHub seems to be pretty bad at displaying diffs that mostly consist of changed indentation
@midnight tangle Related to the serif union/intersection PR you just put up,. Just an FYI that earlier today I emailed the NewCM maintainer about the empty set variation sequence to fix https://github.com/typst/typst/issues/1528
Nice! I added the serif variations first as they are mostly no-brainers, but I would like to add the empty set variant soon as well.
Do any fonts support these yet?
I haven't checked
I've added all mathematical variation sequences to the document (appart from calligraphic letters). Almost all of them are from Mathematical Operators or Supplemental Mathematical Operators (resp., first two, and third column, in the attached image).
He said he has no objection to the change, but as it breaks backwards compatibility for TeX with the LM fonts he needs to ask around.
Yes this is indeed quite a breaking change for a lot of documents
@midnight tangle did you know there is a proper way to do cancel/not for every symbol?
It's just broken in every font lol
Combining Diacritical Marks for Symbols is a Unicode block containing arrows, dots, enclosures, and overlays for modifying symbol characters.
Its block name in Unicode 1.0 was simply Diacritical Marks for Symbols.
I knew there were some combining characters but I haven't looked into them much
There are some very useful ones, but they seem to mostly be broken
I think a few work in noto sans math
Since multi-codepoint symbols are kinda new, I only looked at variation sequences for now
The corresponding section of the Unicode Core Spec is section 7.9.4
Actually the specific one I was talking about is in https://en.wikipedia.org/wiki/Combining_Diacritical_Marks instead
Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. Its b...
U+338
Hm, I wasn't aware of the fact that 20d2 is the one supposed to be used for negation
that's weird...
Yeah and its even weirder that the Combining Diacritical Marks for Symbols block contains a double slash variant
I'd like to find a list of accepted combinations of mathematical characters + combining marks. I'll look for it in UCD
In noto sans math it's the diagonal one that is negation at least
which makes much more sense
Same in luciole
And NCm
I didn't actually think that one worked in ncm
I think they're supposed to work with anything in theory
I guess? But it's so weird because there are obvious cases where it doesn't make sense
Would be up to the user to use it when it makes sense then
with combining marks you can get stuff like 1̝
But it's weird that the spec is wrong about the uses of the vertical bar for negation
I couldn't fine anything of interest
Well, it's a possible glyph for negation
so yeah, definitely a "use them with whatever" kind of situation
Then I think we better stick to single codepoints with optional variation sequences for now, and try to cover as much of that as possible, before starting to decide which combinations we want to add and under which names
I was thinking more like making them callable symbols, like the accents
So not(eq) for instance
These two PRs still really need reviews!
I'll do a round
arrow.l.r.double.long is a mouthful
in Typst you'd just write <==> though
True. I think I was thinking of the vertical one, but that one isn't quite as bad (there is no .long)
Just a friendly reminder that some pull requests are waiting on review if you have the time: https://github.com/typst/codex/pulls?q=is%3Apr+is%3Aopen+label%3A"waiting+on+reviews"
Also, remember that pull requests that contain breaking changes require a third review before merging them.
I've just been extremely busy
I thought the unicode 17 PRs were blocked by moving to harfrust?
Yeah no problem ofc if you don't have the time
We tested it a while ago and it all still works with the current shaper
I just noticed that this proposed update changes the class of the exclamation mark from Normal to Closing to reflect its use as a suffix operator. I believe this would make it the first Closing symbol that doesn't have a matching Open. Would be the cause of issues in Typst for delimiter matching? For example in (x!) we don't want the opening parenthesis to be closed by the exclamation mark.
That's terrifying and I hate it, but it luckily should not affect the syntax since ! now has its own syntaxkind (Bang) used internally in the parser.
More of a concern is whether it stretches in an LR elem, i.e. $lr(n!)$
or $lr(! x !)$
Yeah, typst-layout/src/math/lr.rs::scale_if_delimiter() currently checks the math class, so it would act like a delimiter, although I have no clue if it would actually be able to stretch.
I guess that's the harm of using a spacing related parameter to determine semantics. But I would expect unicode to be more careful than that in their definitions tbh.
The underlying issue is that they have no math class for expressing postfix unary operators. This makes sense because they aren't very common (apart from exponentiation, I can't think of any other one). But it creates this weird situation where ! simply doesn't really fit into any of the existing classes.
Also, I love how the new section starts "The math class property described here" despite "math class" not actually appearing elsewhere in the document (although it is kind of defined in section 5, but its very implicit)
Also 5.1 has a typo in the first sentence
The data file [Data] provides a classification of characters by primary their primary usage in mathematical notation.
You have until the 29th of decembre to report it if you want: https://www.unicode.org/review/pri533/
Though I'm unsure whether this is the kind of feedback they expect at this stage
I'll try to submit the grammar issue at least. Neat.
I seem to recall someone opened an issue where our handling of unary operators after a character didn't match TeX?
And that in TeX it basically worked as a postfix unary operator
Currently, the styling::to_style function in Codex does not handle accented letters properly. My interpretation of UTR #25, Section 6.5 Accented Characters, and especially the sentence below, is that we should first decompose any precomposed character, and only then apply the styles.
to achieve consistent results, a mathematical display system should transiently decompose any precomposed upright letters when used in mathematical expressions, and should use a single algorithm to place embellishments.
we need to still sort out accented letters so that they go through math accent layout instead of the shaper...
Why can't you use the shaper for accented letters in math?
Oh if these aren't meant to be "math accents" then I suppose it can
Yes They are just regular accented letters using combining diacritics
Not supported by NewCMM as of now though
I don't imagine many fonts would support it
I'll probably send the maintainer an e-mail regarding the combining solidus thing (#133). I may include that as well.
Yes it seems a bit niche, but still useful. For example, I want to typeset algorithms with variable names that may be French words with accents
Can you open an issue for it in codex? I might be able to do it this weekend
I opened #134
Hi, I just opened a PR: https://github.com/typst/codex/pull/135, please let me know if you have any feedback or concerns!
I encountered the lack of these symbols while taking notes from an abstract algebra course, so these are definitely used.
Btw, I couldn't found where the |-> ligation is implemented, because it would be nice to have |--> (currently available as mapsto.long but not in the shorthand form), and <-| and <--| as well
I think there's a moratorium on new shorthands until a better way of handling them can be found
I'm not sure this is the way we want to implement it and expose it to the user, but to be fair I don't really know what else we could do that would be as convenient for the user
not sure what other aspects you're referring to, but at least the two-letter names are pretty much mandatory, given how country flags work in Unicode.
Prior discussion: #1277628305142452306 message
In terms of implementation, we may want to hardcode every country code and instead generate them somehow
Regarding the .slant issue, as I have expressed before, I think a solution should be found that enables users to choose a default to use for the rest of the document. A similar issue exists with diagonal vs. vertical negation strokes (see UTR #25, Table 8).
A possible solution would be to only have names for non-slanted equal variants, and diagonal negation strokes, and have functions similar to those in styling.rs: one to convert symbols from horizontal equal bars to slanted equal bars, and another one to convert from diagonal negation stroke to vertical negation stroke.
Users should then be able to use the following show rule in their document: show math.equation: it => slanted-equal-bars(it) (temporary function name used for this example).
This solution does not require any new infrastructure.
In case anyone is interested, I recreated Symbols defined by unicode-math in Typst.
For now it is fully automatically generated, so it is not very well organized. It lists every Unicode character that has a math class, as well as all their variations sequences, together with the way to obtain them in Typst math.
In the future, I would like to add combining character sequences such as the ones listed in Section 14. Negations of Mathematical Symbols of the Codex Proposal document, which I also reworked to remove old proposal and focus on the symbol list.
Note that the Symbol List for sym document (ex-Codex Proposals document) and the new List of Mathematical Symbols Supported by Unicode document serve different purposes: the former is a working document to help find names for symbols in Codex (specifically for the sym module), while the latter aims to be an exhaustive list of all mathematical symbols that are supported by Unicode, which might be a useful resource for math font authors.
The new document : https://typst.app/project/rBazFFMoB6JF4abehfMFqL
@midnight tangle can you remind me of our current PR approval policy again? I want to add it to the contributing guidelines PR and can neither remember it fully nor find the last time you mentioned it😬
- Non-breaking PRs require two approvals
- Breaking PRs require three
- PRs that affect the public API require review by Laurenz
And deprecations count as breaking, right?
Yes, but removing a deprecated item doesn't as long as a released version contains the deprecation
Or at least this is how we have been handling it for now I think
Sure, the point is formalizing all the unspoken conventions we've been holding in our heads.
Do we want to formalize releasing in lock-step with Typst? As in, two Codex releases between Typst releases shouldn't happen?
because that matters a bit for the deprecation thing.
I'm not sure what you mean exactly.
As long as each version of Codex is used in a Typst version, everything should be fine. And since the community is not the one to decide when a new Codex version is released, what you're saying shouldn't ever happen
What you wrote in "Add PR section" seems fine to me
Yeah I guess that's good then. I was just trying to avoid bad edge cases😅
The following two Codex PRs are missing a single review each and implement relatively small changes so they should be a quick and easy review when you have the time:
- #123 trivially updates to Unicode 17. According to #1277628305142452306 message, it should be fine to merge.
- #131 renames
{gt,lt}.tri.*to{gt,lt}.closed.*for consistency with{chevron,paren,subset,supset}.closed.
This simply updates all mentions of a Unicode version from 16 to 17.
Looking at the Unicode 17.0.0 changelog, this version does not make any change to presentation sequences, and I also don'...
@midnight tangle I suspect the web app issue may apply to any symbol in the SMP
Same thing is happening for emojis, such as the Rocket emoji
Seems likely indeed
@storm whale upright(aleph) yields the non-symbol Aleph (and same for the other four Hebrew letters that have symbols). I think this is due to this line of code. Is this intended? Hebrew symbols aren't italic, so I wouldn't expect upright/italic to have an effect on them.
It's intentional so you have some way to obtain the normal char in math
That's fair, but there needed to be something and it kinda works. That's also what I'm doing with the arabic
What about simply using strings to get the regular characters? Out of the following fonts, only Libertinus Math supports the actual Hebrew letters:
- New Computer Modern Math
- New Computer Modern Sans Math
- Noto Sans Math
- STIX Two Math
- Libertinus Math
- Fira Math
In the future that'll be the text font, and I think there should be some reasonably easy way to access it in the math font, notwithstanding lack of font support
Perhaps rename the issue so it reflects that it's a broader issue?
Done
Regarding https://github.com/typst/codex/pull/26, is there a policy about choosing names based on semantics vs graphics?
For ∨ if we want a "graphical" name different from "vee", according to https://en.wikipedia.org/wiki/Descending_wedge it's also called "vel"
The descending wedge symbol ∨ may represent:
Logical disjunction in propositional logic
Join in lattice theory
The wedge sum in topology
The V sign, a symbol representing peace among other things
The vertically reflected symbol, ∧, is a wedge, and often denotes a related or dual operator.
The ∨ symbol was introduced by Russell and Whitehe...
I've never seen that name in use though
When a symbol has an established meaning, a semantic name is preferred. But for a lot of symbols, this is not the case so we use a descriptive name instead.
What about symbols that have 2 or 3 well established meanings?
I don't think we had this situation before. That would probably be decided on a case-by-case basis
Should we worry about this?
https://github.com/typst/codex/security/dependabot/1
Since the dependency is only used in the Unicode conformance test path and the specific error condition seems exceedingly hard to trigger on a 64 bit system, it does not seem particularly severe but upgrading the dependency would not hurt either
I opened #142.
There are currently two open PRs that add tests:
- #126 for the validity of variation sequences.
- #144 to ensure that NFC is used for all symbols.
The first one shows quite a large diff in GitHub, but most of it is simply changes to indentation that GitHub does not detect apparently. RustRover and VSCodium are both able to display the diff correctly.
This PR updates the testing infrastructure to also test that standardized variation sequences are valid. I also reorganized build.rs to separate the processing of Codex module files from the part t...
I resurrected the plan to move numbering kinds to Codex (renamed to "numeral systems"): #145.
Regarding https://github.com/typst/typst/issues/6283, it seems that dots for the centered version was meant to be implemented already in https://github.com/typst/typst/pull/747 but that implementation was wrong (just changing the order of .h and .h.c without considering the number of modifiers.
See https://en.wiktionary.org/wiki/·
dot is the operator whereas dot.c is the middle dot. This PR moves the current dot to dot.period and moves dot.op to dot.
Fixes #724
What would be the right fix? changing .h.c to .c and .h to .h.b ?
I think we would probably want to keep .h.c as is, but list it first so that it is the default when typing dots with no modifier (or even dots.h without .c or .b).
@grizzled granite you have multiple open PRs on typst/codex that are blocked because they don't update the changelog. If you don't have the time to edit them yourself, are you fine with me pushing some commits to do that?
Sure, sorry, I've just not had the overhead to spend time on codex recently
np
I can't edit your original pull requests because I do not have the permission, so I'll just create new ones instead.
I seem to recall Laurenz said he wasn't a fan of the usage of submodules at one point. Maybe I misremember
I don't remember exactly but I think one of the reasons was that it wouldn't be supported by the symbol picker for now. For chess symbols, I think it is worth it though. It just makes so much sense in my opinion for it to be in a separate module. An argument could be made that it should be in a package though.
I'm of the opinion that symbols shouldn't be gated behind packages.
@midnight tangle gotta remember to update the changeloglog
@lapis moth regarding https://github.com/typst/codex/pull/153 I guess you mean you don't like variants like line.feed, line.return, line.new? Would you rather have several small submodules? or name like linefeed, linereturn , newline, textstart, fileseparator ?
it's true there are currently no such "meaningless" symbol names as control.line or control.text would be as far as I can see
except maybe for dotless
yeah I'm not a fan of that one either😄
the abbreviations you have now are good imo
lol I clicked "view reviewed changes" from your comment and didn't realize it was an older version
it did seem weird that the names were reverted, that explains it 🙂
When someone has a minute to spare, #152 should be a very quick review.
oh I already approved that 😅
Thanks!
Looks like you're gonna have to send someone an email
Thankfully they have an issue tracker 🙂
Issue tracker? Never heard of that
I found ten new issues with NewCM 8.0.0 so I'll have to send another email I guess
So many symbols have random pointless dots for some reason
Now that NewCMM supports all variation sequences, I opened #156 to add a name for the slashed zero variant.
The inconsistency of using slant and not slanted, but slashed and not slash bothers me a bit. Is there a logical reason?
I don't particularly care which one we choose, but it seems a bit strange to use both
Modifiers that aren't shape names usually use an "adjectival" form, which for verbs translates to past participle.
This was explained here: https://github.com/typst/codex/issues/43#issuecomment-3037212874.
.slant is an acceptable deviation from that because we consider it an abbreviation because the symbol is so commonly used (I remember writing that, but I don't remember where).
also "slant"` is also an adjective anyway 🙂
Is it? I can only think of interpretations as a noun or verb
yeah like slanted basically
Ok fair enough
The explanation for the two latter ones is that that the replacement character in pennstander is apparently just a regular question mark
U+FFFD is the unicode hex value of the character Replacement Character. Char U+FFFD, Encodings, HTML Entitys:�,�, UTF-8 (hex), UTF-16 (hex), UTF-32 (hex)
That is unusual
@midnight tangle the dot situation seems inconsistent
and.dot is ⟑, but ⩑ exists
See the Symbols with Decorations section of the document for my current feeling on the whole topic.
I am on my phone right now but I'll be available to discuss that later if you want
I lost the link to it, I'll see if I can find it
I sent it in the issue I think
Here it is
Second link
https://github.com/typst/codex/pull/164 that's a very large pr
I tend to agree with your comment in the PR. I am also wondering whether some of those would be better as emoji. We could also have both, but for some of them I don't think it makes much sense to have them in sym (e.g., the proposed sym.keyboard.alarm, which I also don't think fits in thee keyboard submodule).
I am working on a symbol-by-symbol review.
Also, full disclosure, I know the author IRL.
I also think it's a bit big, feels more like a button module to me than keyboard
although I know it also contains a buttons sub-module, 😮💨
While I am not suprised that such a thing exists, I was not aware of the invisible function application symbol being a thing: https://github.com/typst/typst/issues/8263 . Should this be included in Typst Codex? I would certainly take it into use immediately.
I've added my thoughts on #8263. One thing to add is that AT testing will need to determine in what cases we apply intents, content MathML, or invisible operators. Furthermore, I'd find it too much of an imposition on the user to always use something like a function application operator while authoring; it does not set achievable standards for accessibility (and the operator might conflate the ability to mathematically evaluate the expression vs. accessibly announcing it)
We are at 99 commits on typst/codex. Almost 100!
Somewhat relatedly, are we supposed to dismiss the security alerts ourselves or is Dependabot going to realize that we pushed a fix?
I don't see any active alerts on codex, so I suppose it automatically realizes. I don't think I usually actively dismiss them, but not sure.
It didn't go away immediately after merging the PR but I don't see it anymore either now
I am interested at trying my hand on adding flags. I read through is#115 and pr#136 and it seems to me that the primary concern is validating that all countries are included and correctly mapped.
It’s annoying to map names to codes. I am currently just using python and a table of iso3166 abbreviations to do the conversion but inconsistent or multiple names means my list of corrections is quite long. When writing verification in rust i will hardcode a map of country names to codes and then verify that all of the listed flags in the file fetched from Unicode are included. This will fail if an extra country with no conversion is included but I’m unsure if this is adequately rigorous for what you all are looking for.
the main problem with this is that there’s not really a good external truth for country codes that can be pulled in without adding a dependency. I would love to hear any thoughts on other ways to do this
Edit: looking at it further it seems I have misunderstood the structure of the flag sequences, this is maybe easier than I thought?
I don't know whether we only want to have tests for the validity of flag sequences, or whether we want a test for the correctness of the name -> flag mapping.
In the first case, we only need to consider the Unicode-provides file that lists all valid country flags.
1CEDF will be "SQUARE ROOT OF SQUARE ROOT OF SQUARE ROOT OF SQUARE ROOT" in unicode 18.0
lol
Ooh Unicode 18.0 just entered Beta period!
This wasn't in Alpha I think
Relevant symbols from Unicode 18.0 Alpha are listed in the document.
Most of the mathematical symbols added seem to be super obscure historical ones
Indeed, which is why I don't think we need to assign them a name for now
why won't they standardise the existing VS for the calligraphic/script glyphs to lower case latin!! But they'll add a whole new different alphabet as a new variation selector
but this time only for lower case latin
There are many design decisions about Unicode that don't make sense, but once they're made we're stuck with them
I thought it'd be a nested square root or smth, but no, it adds more squiggly lines at the left lol
It's not very readable 😛
Probably why no one has used it for hundreds of years
IIRC the reason it was excluded was "lack of evidence of use" or something? I am tempted to waste a couple hours putting evidence together and drafting a proposal lol
Why does discord think I want to react with 😫 (I think it's that one) every time I accidentally double tap something?
Mathematicians will eagerly use any alphabet you throw at them. The argument of lack of evidence of use make no sense to me
I can guarantee you that they would use them if they could
Yeah and its available in LaTeX though right with mathscr and mathcal, so surely it appears in the wild already
Rather that than the sans serif and typewriter ones
And surely something like bold(italic(sans(chi))) did not have a great deal of "evidence" for its use...
Not in all fonts I believe. Partially because (especially earlier versions) tex could only address a limited number of symbols
But what possible harm could there be in allowing the variation selectors more broadly? The worst thing that can happen is that the font doesn't have that glyph and you fall back to the original one
I found the following package: https://mirror.aarnet.edu.au/pub/CTAN/macros/latex/contrib/mathalpha/doc/mathalpha-doc.pdf. And it is quite comprehensive in listing all the different styles across packages.
Apparently you could turn it off. But why on earth is that the default
I think for a "watertight" case you would need examples of articles or books where scr and cal are used to distinguish two different things for lowercase letters
its got 4 categories for calligraphy/script! upright, restrained, embellished, heavily sloped
Oh yes definitely.
I think that only applies to abstracts?
well its something at least :p
On another note, what's the status ofhttps://github.com/typst/typst/pull/8172?
It would be nice to merge #126. The test is not extremely important, but it can help detect mistakes earlier, which is alway nice.
The diff looks very bad on GitHub, but most of it is re-indentation, so VSCode and RustRover display it correctly.
If no one from the community has the time to review it in the following days, I'll ping Laurenz to do it himself.
Also, the commit history looks very bad, that's not GitHub fault, that's mine. Sorry for that.
#171 changes the NumeralSystem type to use &str instead of char everywhere. This is required to merge the Arabic abjad numerals PR, and more generally a change that can be useful in the future.
@river berry if you are interested you can ask to be given approval permission on typst/codex. One more person with the power to help PRs move forward is always nice
I actually did so yesterday lol
@midnight tangle As people have already picked up on, we're nearing a release candidate for 0.15. Is codex in a releasable state or is there stuff that should still land?
On a side note, I noticed one minor issue with the changelog.
Unless I'm missing something, it should be in a releasable state. I'll make a PR to fix the changelog.