#Codex
1 messages · Page 3 of 1
And I'm not sure how many characters are encompassed by that. Do you remember which other snowflake symbols like the weierstrass P there are?
personally I really like the current dif
it's really semantic
or at least it feels to me
Only in the context of derivatives, not for integrals
I was about to say that
There is pee and ell and that's it I think
even in integrals it feels semantic to me, for some reason
as general infinitesimals
That's why dee would be good. It's how it's actually pronounced typically for both derivatives and integrals
Integrals are much more general, the d isn't necessarily connected to infinitesimals
Ok then I propose dee
I understand the reasoning, but for some reason dee isn't exactly clicking over here
If not then I'd prefer dif over del (which is not a good name for reasons previously discussed)
Is that not lowercase blackboard bold d?
I still think we should reserve the possibility of using them for lowercase symbols
In the future
to be fair, I often personally map doubled lowercase chars to uppercase caligraphic letters
it's kinda handy
hmm
I guess oo is infinity, so that ship has already sailed
Maybe part of the reason for why dee doesn't fit for me is that that's not how I (mentally) pronounce it (due to language reasons)
true
It's not how I pronounce it in my native Norwegian either, but the typst names are rooted in English
yeah
but for some reason this is sticking out more in my mind
trying to figure out why
also, incidentally
looks like dif doesn't show up in the symbol search despite being mentioned in the paragraph preceding it...
Because it's not a symbol
yeah, I thought so
but then we should remove the paragraph before the search
"The d in an integral's dx can be written as $dif x$..."
Anyways, back to dee
Was this mentioned already somewhere (e.g. forum)?
yeah diff was bad
Why? It's not a value of type symbol, but it's clearly a symbol from the user's pov
Maybe it should be included in the symbol list even if its not technically a symbol
I agree. I personally think it should be included in the search and the paragraph kept
but we absolutely can't have the paragraph there and not have the symbol in the search, I'd say
well the symbols in the symbol list are all part of sym, so having one that isn't would require UI changes to show the sym. prefix for all the others
I intentionally didn't deprecate diff right away in https://github.com/typst/codex/pull/44 because there was some pushback and I wanted to at least have one more discussion about it. There's quite a bit discussion about dif above here, but not a lot about diff. I really can't judge this as I'm not a mathematician, but is everybody on the same page that diff -> partial is the right way?
Not only does it align with LaTeX, which reduces friction for people who are transferring to Typst, the LaTeX choice of the symbol name is not terrible.
Regarding https://github.com/typst/typst/issues/5695, how do you feel about ang? It's short like abs, but it neither conflicts with angle nor is overly semantic (which is a problem when it has multiple semantic meanings)
we could potentially also rename all the angle brackets to be under ang as well, since they're actually fundamentally different from the angle symbols
The html names are lang and rang by the way
I feel like this introduces a subtle, arbitrary, difference between two names, which is of the same kind as dif/diff (and we are trying to change that).
But at the same time, it works, and solves the issue. So it's probably fine.
I still prefer angles though.
I find ang a bit cryptic (abs is just as short but it's a common abbreviation), I prefer angles or chevrons
In my personal code I use #let inner(u, v) = $lr(angle.l #u, #v angle.r)$
now, if we add this, whichever the name, we need to make sure that the user can add commas inside of the function call and that it will be processed properly
(which my inner helper fails at)
maybe it's just a matter of taking in varargs
as for the name, I personally lean more towards something like inner since it's very semantic. But maybe the bra-ket people won't appreciate that name as much (though, to be fair, I would imagine this function by itself wouldn't be the most useful to them, since it's always just the angle brackets...)
there's also the matter that sometimes angle brackets are used beyond inner products, e.g., I've seen people use would-be inner(v) to denote the 'sequence of vs' (where one would already have v_1, v_2, ....). That said, the usage as an inner product is by far the cannonical one, as far as I am aware, and other usages (e.g. the sequence one) are kind of on-the-fly and thus still feel like they fit in with the name inner, in my opinion
agreed that ang is very cryptic. angles seems somewhat reasonable, chevrons I fear is going to be an unfamiliar term to much of the userbase, and comparatively harder to type.
We could also rename the angle symbol, since it's much more uncommon. So then angle could be the bracket function, angle.l and angle.r as they are today, and something like angle.geom (very tentative?) the current angle symbol.
as @grizzled granite mentioned in the issue these symbols are used for many things. I guess whether one usage seems canonical really dependents on your field, so inner being very semantic counts against it. I'd prefer a non-semantic name that describes the typographical symbol (rather than its function in one field) so it makes sense in all contexts
yeah, I'd missed that comment. The use of angles as a tuple is also somewhat cannonical, I now realise.
is there any math function besides lr that treats a comma as a symbol rather than an argument separator? Maybe it's best to keep this special behavior to a minimum
The others I still think are a bit on-the-fly. But I agree with the underlying message.
I'm not sure, but mat does some special handling iirc (mainly due to its custom handling of semicolons).
In light of this, I'd say I'm currently leaning towards renaming the current angle symbol and using angle or angles as this new angle bracket function.
the semicolon thing works the same for all functions in math (you can use it in your own functions)
wait, really??????
No you can't?
uhm, wait
Oh, I think I see it
this works, and x is then (1,) and y is (2,)
yeah, so it might be just lr then
yep
but I mean, it could be just a case of the function taking in varargs
#let angle(..args) = $lr(angle.l #args.join($, $) angle.r)$, or something like that
(maybe I'm missing a method call in the usage of the args)
#let angle(..args) = $lr(angle.l #args.pos().join($, $) angle.r)$
there we go, fixed
I guess that could work... Is that how lr does it?
it does feel like a hack
Ideally it would just be easy to input the characters directly, then you get the lr behavior for free:
?r
$ ⟨ x/y ⟩ $
The best solution could be to find shorthands for these characters. This would solve both the comma issue and the naming issue
but I can't find good shorthand ideas
I kinda like the idea of shorthands for angle brackets
but yeah, it's not obvious what would be a good shorthand
binom has special handling because of multinomial coefficients
interesting, it's documented to do the vararg thing: math.binom(content, ..content) -> content while lr is not: math.lr(size: relative,content) -> content
that makes sense semantically
the angle bracket function could do the same thing as binom I guess
maybe a better name than angles would be angled?
I'm not a fan of names longer than 4 letters for this (like in norm), or at worst 5.
Either choose a name that people will actually use when it's used repeatedly (like ang), or drop all pretense
That being said, if you put a gun to my head then angled is preferable to the other options mentioned
I'm unequivocally in favor of ang, since that's literally what I called it in my personal math package
Well, I guess there's a bit of a difference between "I like using it" and "I think this would be good for everyone". Maybe I'm not 100% on that second one🤔
#discussions message
How can I add slash.circle? Apparently there is backslack.circle, but nothing for slash. That's odd, as the latter is more popular.
I don't think it exists in unicode
Actually I'm wrong
?r $ \u{2298} \u{29b8} $
I don't know if they're more similar in other fonts, but that's the closest thing that exists
Yeah in stix two math they appear to be mirror images @steel chasm
You can create a feature request on the codex repository. It may possibly already be covered by one of @midnight tangle 's proposals though
It's not explicitly covered as a proposal in the form of an open issue, but the specific character is probably somewhere in my drafts. Regardless, if you want the symbol to be added in the future version of Typst instead of keeping it on the side to be added as part of a later work on circled symbols, you should open an issue or PR to typst/codex.
Why do alef, bet, dalet and gimel map to the Hebrew letter and not the mathematical one directly?
Like these are specifically for use as 'math symbols', and the whole Hebrew alphabet isn't mapped. I think then the unstyled variant in math should just be the mathematical glyph, not the Hebrew one
I think that's the way it's done for Greek also?
The idea is that they then work in regular text too
They are replaced by their math versions in math
Yeah but for Greek you actually use the upright letters in math, as opposed to the regular Hebrew alphabet?
As in, the upright math Greek alphabet is just the Greek alphabet in Unicode (use the same codepoints), unlike these four hebrew characters
Slightly unrelated, what is the letter shin used for? It is the odd one out of the Hebrew alphabet in codex (it has no math glyph)
I've traced back when it was added to before 0.1.0, where it appeared along with all the other symbols first added
I would expect it to be used in papers related to solving a heat equation with blood flow constraints in the human shin.
It's used sometimes in signal processing, IIRC
Thinking of the dirac comb, specifically
There's some disagreement on what letter is actually used for it. Most people say it's a cyrillic letter, but I've seen (and was originally taught) that shin is often used too
That would be the Cyrillic letter sha
Yeah, I agree that it's more usually represented by sha. But if I'm not mistaken there is a nontrivial amount of people that use shin for it
they are pretty similar
that's the only explanation I have for why shin is included in Typst 🤷
That makes no sense though, since sha is the only one that actually looks like a dirac comb 😅
You do have a point... but when I first met the Dirac comb, it actually was introduced to me as being the letter shin
Probably someone along the way wanted another hebrew letter in math 🙃
anyways, I don't have any strong feelings about it
btw, do we have sha in codex?
doesn't seem like it
Yeah, you're probably right about the shin explanation
also, tbf handwritten shin is a bit different from typeset shin (as most letters are)
when handwritten it's almost exactly equal to sha, it's just the middle stick that sometimes gets slightly slanted
so maybe that's why
We probably should
Or both to be on the safe side
I don't know if everyone uses lowercase or uppercase
sure
U+29E2 is the unicode hex value of the character Shuffle Product. Char U+29E2, Encodings, HTML Entitys:⧢,⧢, UTF-8 (hex), UTF-16 (hex), UTF-32 (hex)
Unicode never ceases to surprise me
We have that one already I believe
At least I am aware of its existence
But it is essentially a Cyrillic sha, except for use in math with slightly different semantics I guess
I don't think so unless it was added post 0.13
Ok then maybe I should stop saying stuff without checking first
I think we're all guilty of that
If shin isn't really used, I think it would be fine to map the other letters to their math glyphs instead then. On the other hand, if it were used frequently, I think it'd then be better not to so that its consistent atm (and possibly in the future if more Hebrew letters are added)
what about the other cyrillic letters? I used cyrillic Be once because I had run out of variations of the letter B.
Afaik their names are phonetic, so especially for those like Be, where they're the same as the latin ones, we'd probably have to group them in a cyrillic module or sth
A (probably not very thorough) search on GitHub shows a single use of shin in Typst (as a symbol) and its in someone's documentation for Typst listing the symbols (from 2 years ago)
Is that present in math fonts though?
In the Unicode spec chapter 22.2.3 only sha gets a specific mention
I think we should avoid adding all cyrillic and hebrew letters unless they have a specific use
can someone give me a TLDR why we want to replace alef, bet, etc. with the mathematical codepoints instead of keeping the hebrew ones and letting math layout take care of the auto-conversion like we do in other cases?
I don't necessarily see why we should do that if they're converted anyway
Neither New Computer Modern Math, Stix Two Math, nor Noto Sans Math contain shin, so I think removing it is a good idea though @storm whale @heady fulcrum
unicode-math also doesn't contain it
Stix Two Math and Noto Sans math do contain sha (both lower and upper case)
surprisingly New Computer Modern Math does not contain sha
May want to suggest that to the maintainer
@vapid osprey you've been in contact with him right?
i have but tbh i don't really like writing mails to him 😅
Okay, maybe I'll bite the bullet
I wish they had an issue tracker
sending patches is even worse since he doesn't really use git
I don't think this would be a good idea, because they should be easily accessible by default. Also, cf. #1277628305142452306 message
fair enough
that makes some sense, especially given that sha is more common. though maybe we should deprecate it first, then later remove it
Probably a good idea yes
I believe the only point in favor of this change is expressed in #1277628305142452306 message
It came up as I was doing the changes on the Typst side for that PR and saw that the Hebrew mappings were there, but they don't really fit in with any of the styles (and would be odd to not move to Codex along with the rest I think). Those four Hebrew math letters aren't really "styles" of their original Hebrew letters, but different glyphs with different meaning entirely. Comparing it to say Greek, the upright letters (which are actually used in math) are just the Greek alphabet in Unicode (uses the same codepoints). This is unlike these four Hebrew characters and their math codepoint which are the same letter (and the same style), just encoded differently because they have different meaning. There's also maybe the point that the math ones are LTR characters, but the Hebrew letters aren't, so mapping them would maybe be a bit weird.
I'm not sure how that's significantly different from greek
the italic greek symbols (which are the ones you'll see unless you call upright) are mathematical symbols
The fact that they map to the text symbols means you can use sym.gimel in text and get the correct letter instead of the mathematical symbol (which is unlikely to be present in a text font)
But the upright Greek are also mathematical symbols (especially so in ISO style), unlike the Hebrew letters
What do you mean? The regular upright Greek codepoints are the same in both text and math
In both cases therefore the Greek and Hebrew symbols point to the text versions, which get mapped to the math versions in math (though for Greek that mapping is trivial for upright)
It seems perfectly consistent to me
Ok that's a fair point though I still think its a little different.
Though you actually can't get the Hebrew letter back in math, as it is not treated as a style
so that's kinda where I am coming from as well
You can if you specific call the text function with the text font
Iirc
I don't think this was mentioned earlier, but unicode-math defines \aleph, \beth, \gimel, and \daleth, which are mapped to the symbols (not the Hebrew letters).
Yeah but if its like Greek alphabet you should be able to get it somehow in the math font (preferably without text function)
Oh interesting
Do the lowercase Greek map to the italic math symbol?
There is no, e.g., \alpha in unicode-math apparently
Instead, there is a name for each style variant
Why?
You can do it for Greek because they're the same codepoints
Upright that is
Why not? So with the Hebrew then since they're not the same codepoints what's the point in even mapping to the non mathematical ones then?
Completely unrelated: Openai trying to steal our thunder https://github.com/openai/codex
Two pull requests were opened to Codex.
I approved the first one because there seems to be an actual use for the symbols and the names don't conflict with anything. https://github.com/typst/codex/pull/57
Regarding the second one, I'm not sure how much currency symbols we should support, but cedi is a pretty unique name as well (Wikipedia redirects "Cedi" to the corresponding page, with no disambiguation) so it's probably fine. https://github.com/typst/codex/pull/58
adds U+2322 ⌢ (frown) and U+2323 ⌣ (smile). it is used in certain cohomology theories for cap and cup products. in latex, it is represented by \frown and \smile.
I mean we already have a fair few currency symbols and they are all quite distinct names (as in, can't really see them ever overlapping with something in the future), so I'd agree it is fine (I've approved it). With the other one, I think that's also fine, but might be good to leave it a little longer to give others a chance to comment (so I won't approve it atm)
Time for a cease and desist
Because someone using them in text will get the correct symbols instead of the mathematical versions
I don't see an issue with supporting any currency that is in Unicode.
Should we use the opportunity to add other currencies? At least the currencies in active use here https://www.compart.com/en/unicode/category/Sc
All Unicode Symbols with Names and Descriptions on One Page
Given the PR was made by an "outsider", and there is some work to do to determine which currency symbols should be included, and under which name, I would say it's better to do it in a separate PR
I would feel bad asking them to do this work when they just wanted to add two symbols
Sure
I'm actually working through the list of currencies right now @midnight tangle
Then you can probably make a PR of your own if that's okay for you
I think some discussion is required, but eventually sure
Actually I also realized we should deprecate a currency symbol we currently have
Which one?
@midnight tangle do you know what the deal is with all the "fullwidth" symbols?
Without context I can't say for sure, but that sounds like the versions of latin notations meant to be used in CJK text, displayed in a square the same size as other CJK characters
but they shouldn't be explicitly added right?
I don't think they should be added to Codex. Most people will misuse them, and I would assume those who need them already have a way to insert them because they need to do it in other software
Armenian Dram Sign - dram (Currency of Armenia)
Afghani Sign - afghani (Currency of Aghanistan)
Bengali Rupee Sign - taka (Currency of Bangladesh)
Tamil Rupee Sign - rupee.tamil (Alternative symbol for Indian rupee in the Tamil alphabet)
Thai Currency Symbol Baht - baht (Currency of Thailand)
Khmer Currency Symbol Riel - riel (Currency of Cambodia)
Colon Sign - colon.currency (Currency of Costa Rica)
Naira Sign - naira (Currency of Nigeria)
Rupee sign - rupee (Currency of Mauritius, Nepal, Pakistan, Seychelles, Sri Lanka)
New Sheqel Sign - shekel (Currency of Israel)
Dong Sign - dong (Currency of Vietnam)
Kip Sign - kip (Currency of Laos)
Tugrik sign - tugrik (Currency of Mongolia)
Guarani Sign - guarani (Currency of Paraguay)
Hryvnia Sign - hryvnia (Currency of Ukraine)
Tenge Sign - tenge (Currency of Kazakhstan)
Manat Sign - manat (Currency of Azerbaijan)
Lari Sign - lari (Currency of Georgia)
Wancho Ngun Sign - rupee.wancho (Alternative symbol for Indian rupee in the Wancho alphabet)
No longer in use (or have never been used):
Bengali Rupee Mark
Bengali Ganda Mark
Euro-Currency Sign
Cruzeiro Sign
Lira Sign
Mill Sign
Peseta sign
Drachma Sign
German Penny Sign
Austral Sign
Livre Tournois
Spesmilo Sign
Nordic Mark Sign
Rial Sign
Tamil Sign Kaacu
Tamil Sign Panam
Tamil Sign Pon
Tamil Sign Varaakan
Proposed deprecated:
Gujarati Rupee Sign, see https://unicode.org/L2/L2009/09331-gujarati-rupee-sign-deprec.pdf and https://www.unicode.org/charts/nameslist/n_0A80.html
I am confuse:
Nko Dorome Sign
Nko Taman Sign
North Indic Rupee Mark
Small Dollar Sign
Currently in typst, but should be renamed:
Indian Rupee Sign - rupee.indian (Rupees are used in several countries, and this symbol is only used in India)
Currently included in Typst, but should be removed:
French Franc Sign (Apparently proposed but never actually adopted or used)
I think that should be everything @midnight tangle
I guess there is some argument to be made for having obsolete currencies available at some point, but that's less pressing
The Wancho language has ~50 000 speakers, and the written language is only 15 years old, but if it's notable enough to be added to unicode I don't see why typst couldn't contribute to keeping endangered languages alive
I was most unsure about the naming of colon, since there is an obvious conflict. I ended up with colon.currency
I don't think we want to start doing that, but I should mention that colón is an option. Otherwise colon.currency is probably fine.
you don't think we want to start doing what?
Unless we decide to put all those currencies under a currency submodule, but that would make them all longer to type so it's probably not a great idea
Using non-ASCII characters in stdlib identifiers
Ah yeah, I was just going to say that. I misread your message
Okay I found the proposal for the Nko symbols I was confused about
They are described as letters for the "dorome" and the "taman", but I can't find currencies under that name
Codex
How come this is just a comment? And not a line with @ at the beginning like the other deprecations?
I created a PR for the currency symbols @midnight tangle
hopefully I did it correctly
I've only created on once before
It looks right at a first glance. I won't be able to properly review it for some time, but a pull requests needs two approvals to be merged so it is fine
no hurry
@grizzled granite regarding your comment on the currency symbols PR ("I don't think there are any plans to have more categories than general symbols and emojis."), we have the ability as of now to create sym and emoji submodules, and are planning to do it for gender symbols (https://github.com/typst/codex/pull/2). This PR is the oldest still waiting for approval btw
Yeah I later saw that. But that's because they're serving a special purpose as modifiers no?
I mean that they're not symbols by themselves. They modify other symbols
Unless I misunderstood
Oh nevermind, I mixed two prs
Sorry
So what exactly is the reason that they are a submodule? @midnight tangle
Because it does not really make sense for gender alone to be a valid symbol. What would it be?
Essentially, unless there is a generic symbol for the concept of "gender", not making a submodule and instead having gender be a symbol by itself would have required choosing a default gender, which I kinda don't want to do
Okay, I understand
Though I still don't understand what that has to do with importing?
Because then you can do #import sym.gender: x
In the case of currencies, e.g., #import sym.currency: euro
Okay. That does make sense, though I suspect non-programmers will not do that
Currency is a very long word
Yes this is a valid concern, but a separate issue. This could be explained somewhere in the doc, but it is probably hard to explain properly (from the user's pov, the difference between submodules and symbols is barely noticeable)
Well it's not really separate, because it effectively means that anyone not using import will have to type a much longer name
To be clear, I meant #1277628305142452306 message as a response to #1277628305142452306 message, not "currency is a very long word"
With the exception of colon, the names are also weird enough that they shouldn't be interfering with anything.
And also the day the do we can probably figure out a (possibly breaking) solution, it's not like those currency symbols are gonna become the most used symbols in Typst
It looks reasonable, so I approved of it
What? 😅
I've been incredibly swamped with work, so I haven't had the overhead to look into the codex PRs. Now I'm done with teaching, so hopefully I can find some time
It's a long word to add before every single currency. If I want to use the euro symbol on text I'd have to type #sym.currency.euro
How do you guys feel about maintaining more information about each symbol we add? I feel like there's a lot of contextual information that could be useful to have available in the documentation. This could include references to other relevant symbols.
Or just #let euro = sym.currency.euro. Of course if this was defined in another file, that would result in a need to inport that instead if sym.
But my two cents is that descriptive longer names for symbols are better. Importing is not that terrible.
In my opinion it should be avoided when it's not necessary. Over time, most users of typst will not be programmers. And it's nice to avoid boilerplate preamble
We should strive to make names actually usable by default when possible
If course, but a part of usability is being intuitive. For example, the Unix habit of shortening program names by just removing all vowels is rather unintuitive, until you figure out the pattern.
I don't think Unix naming is something we should look to for inspiration
I agree. The most intuitive thing to do is to just call things by their real names, without any transformations applied. Unless of course a name is like 20 symbols long. That would be annoying (but only without autocompletion).
This should be done in the descriptions of the corresponding PRs I think
I'm specifically talking about information that could be available in the symbol list in the documentation
You mean in the user doc?
I don't think we can assume that everyone is using auto completion
Yes
Like adding information about what symbols are, how they are used?
Yes
That could make sense but it would be quite a lot of work
Indeed. I'm talking long term
I guess this information could be sourced from Unicode
Some, but it would be hidden in thousand of proposal documents
Although doing that automatically could be a bit hard, because the information that is easy to get programmatically is not always the most relevant
It doesn't have to be that detailed though, at least not as a start.
I wish Unicode did that themselves honestly: a place where each codepoint links to all relevant information from the spec, charts, and proposal docs
Maybe there is an internal database for members, but what I've found has just been all over the place
maybe someone needs to make this themselves and then they might adopt it
It would aid discoverability too, since it would improve the symbol search
I'm certainly not. Well, apart from the open-buffer autocompetion from Vim and/or a certain commit of the Helix editor. So I'd also be forced to write the long name(s) at least once, and I'm fine with that. They just help with discoverability when there is no autocompletion or documentation available.
Like this?
https://codepoints.net/
Codepoints is a site dedicated to Unicode and all things related to codepoints, characters, glyphs and internationalization.
Well, this is mainly just the technical information (appart from the Wikipedia integration). I would like links to relevant parts of the spec, not just data from the UCD. But sadly I don't think this is easy to automate, it would have to be done manually
How should these be encoded https://unicode.org/charts/PDF/U2800.pdf ?
The Unicode convention seems very strange. The top 6 are numbered 1-3 on the left and 4-6 on the right, then the bottom two are 7 and 8
Is it because regular braille only has 6 dots?
Answer: yes
I guess it would be done similar to the math styling PR?
If we decide to add them (which I'm unsure there is a need for, as people who use braille already use different input methods and probably don't want to display braille letters), the first thing I think about is that this is where the modifier system really shines! We could have braille, with .tl, .tr, .l, .r, .bl, and .br modifiers that could be combined in any way.
This would probably have to be automated in the build script though
The naming convention for the dots here seem to be a standard though
Then it's kinda unfortunate that we can't use single digits as identifiers
I guess not adding them would be the simplest option, but it feels like something that should exist at some point
Honestly I don't think so. I don't really know how much braille is used in the digital world, but braille dot combinations basically match to regular letters, so I don't know if there is a need to have them easily accessible in Typst. Also, this can be done by a package.
We've probably discussed this before, but I'm not a big fan of the naming of dash()
Braille animations are used as a loading indicator at least by one Fedora package manager. I forget whether it was rpm-ostree or dnf. 😅
same here.. That's "overline" in unicode and latex right? was this deemed too long a name?
overline is already used for something else
It's overline in unicode, but I don't think it's set up for use in luatex/xetex. It's not listed in the unicode-math list
is the overline function synthesized?
I think that bar would be a better name than dash. It's short, descriptive, and should be intuitive for latex users.
The only pitfall is that it would not be the same as \bar in latex, since that corresponds to typst's macron
I think most people prefer the extensible variant though
So as I see it, there are two options:
- Rename
dashtobar - Rename
macrontobarand find another better name fordash
I've never heard of dashes referring to anything other than hyphens, en- and em-dashes
What do you mean by "synthesized"?
I agree dash is not a good name. I have never heard it being used in such a way either. I am personally in favor of option 1. for two reasons:
a) It's not that bad if we don't do exactly the same as LaTeX. bar will still do something similar, so it's fine.
b) "Macron" is the most precise name we can give to what is currently called macron.
I mean that it's not actually a unicode/font thing, but a line that typst creates
Then I think it is
But I have not checked
Also, some parameters (such as line thickness) may be obtained from font metrics
Has there been any discussion about how to deal with gender and skin color for human emojis?
Not that I know of. Regardless, this would require supporting arbitrary strings as symbols.
I am waiting for the modifier resolving PR to be merged to implement that
(to prevent conflicts)
I think adding all combinations manually would be unsustainable
Indeed
True, but it wouldn't have to be manually; We could do it in build.rs.
Not saying that that's how I'd want to do it, just pointing it out as an option.
does u+2215 get turned into u+002f ? The former is a division slash
fell into this rabbit hole due to https://github.com/typst/codex/pull/60
not that i know of
slash.circle is addressed in Section 3.4 of the document. I did not add it for now because it would be nice to resolve #34 first
Then we're missing the actual division slash too
It's in Section 3.13 of the document, along with a gazillion other slashes apparently
I'm still confused about the purpose of u+2215. Is it just a semantic thing?
I think it might cause some fonts to display the surrounding digits in sup-/sub-scripts
That would be the fraction slash no?
Oh yeah indeed
Then I think it's probably just a slash specifically for use as a division symbol, the same way "MINUS SIGN" is not the same as "HYPHEN MINUS"
But that would need to be confirmed
So "just a semantic thing" as you said earlier
- ofc fonts may display it differently
Regarding https://github.com/typst/codex/issues/34
How do you feel about switching to circled, but also adding "o” as an alias for that modifier?
These are the kind of symbols that are used kind of often
I think those two ideas can be considered separately, but I like .o!
LaTeX kinda uses that, and .o does not already exist as a modifier AFAIK so it would not conflict with anything I believe
To be clear, in my mind, .o would be to access a circled version of a symbol, not any version that has a circle somewhere
Yes that's exactly what I meant
I guess it wouldn't even need to be an alias. It could be the canonical way
I was gonna say it would be good to keep .circled for consistency with .triangled and the other ones, but actually keeping .o only makes sense, the same way we already have .t, .r, etc..
It was mentioned that we have two different spellings for Hebrew letters. Is there any particular reason for that?
I don't see that as necessary to be honest. We should just pick a spelling and stick with it
alef seems to be the more modern romanization
Though that would be inconsistent with Greek, since if one follows romanization it would be alfa instead of alpha
So deprecate alef etc?
I don't think it's problematic to have multiple names for a single symbol. Both spellings exist, why only accept one of them? Let's accept whatever reasonable input users may try
Something I loved with Typst symbols when I started using Typst was that they were very easy to predict, and when I was using Typst for the first months, I would often be able to just guess the symbols I wanted to insert. Removing some common alternate spellings goes against that
Because the symbols are there for use in math
where one of the spellings is universal (aleph)
Do you have sources for only aleph being used in math?
The only reason why we have both is that they were added before we had the ability to deprecate
I'm biased as the author of the alias PR, but I think having two spellings is totally fine as long as neither one conflicts with something else
I mean, the letter isn't only used in math, but the reason why it's a symbol is that
that's why we only have 4 (well, 5 for some weird reason) hebrew letters
I meant sources for the "aleph" spelling being the only one used in mathematical contexts
I'm opposed to having different spellings unless there is a very clear reason for it
What kind of source are you looking for exactly? If you search for alef on arxiv for instance you'll just find a bunch of papers by an other named alef, while aleph give tons of results. Even if you filter out the matches stemming from \aleph
Ok I'm fine with that as a "source"
I just wanted to be convinced that it was more than what you and other people in your field are used to
Still, I don't see a major issue with having alternate spellings
I don't use hebrew letters in my field
I mean, that's a slippery slope when it comes to transliteration. Just see the example of the mongolian currency recently
Well... I would not see an issue with adding all reasonnable transliterations here as well
tugrik, tughrik, tugrug, togrog, tögrög
I don't recall the full context though
it's a maintenance burden and adds noise
Different names when they're fundamentally different? Sure
But spelling differences is unnecessary
@lapis moth what do you think about the semi-alternate proposal to https://github.com/typst/codex/issues/34 ? Using "triangled" and "squared" like you proposed, but the simpler modifier "o" for circled. Some circled symbols are extremely common, and there's already precedence for using "o" to refer to circular things. See "oo" for infinity
plus, it's cute!
Just as a note, although of course everyone's opinion is welcome, this proposal was extracted from the document so it was not initiated by @lapis moth
oh okay, I didn't mean to steal your glory!
by the way, I discovered some weirdness while investigating this
I just meant they don't necessarily have any strong opinion about this
like how "ast.circle" really should've been "ast.op.circle" and there was no corresponding "convolve.circle"
there's a couple of instances of this
we can rename it without being a breaking change
also "ast.small" is in the small form variants block, which are compatibility symbols for chinese
Maybe we should go over all the existing symbols and check for weirdness like that at some point, but that will take a lot of time
I'm going through the block now
"plus.small" is also a compatibility symbol
and "lt.small"
and "gt.small"
and "eq.small"
those are all of them
so not in.small?
no
weird
Should we have a deprecation period, or just remove them?
I don't think there's a formal deprecation mechanism for modifiers
No there's not
we really ought to pin the link to your proposal document
lol
as an aside, this page is the MVP https://www.babelstone.co.uk/Unicode/whatisit.html
can a modifier be repeated more than once?
it's really weird that unicode has ⦼ but not the uncircled one
No. Modifiers constitute a set
I.e., not ordered, and repeats are ignored (maybe warned or even cause an error in recent versions)
I like that one as well https://unicode.link/inspect
A page for inputting text to be inspected. Shows a breakdown of the codepoints in the string, and other information.
But at some point I was seriously considering making my own similar website, because most string inspector/character lists online do not show all the information I want AND relevant links to Unicode charts, etc..
Then I realized I do not have the time to do that...
I created a PR. I opted not to add any additional symbols right now, for the reasons mentioned in the issue
In particular, many of the circled symbols are variants of symbols we do not currently have at all. Such as the division slash and bullet operator
Then I realized I do not have the time to do that...
Story of my life
Is there a motivation behind some symbols being written with a hex code instead of the actual symbol?
in sym.txt
fwiw, I'm very in favor of .o. Seems very clear and also ergonomic
Isn't that only done for things that aren't visible, like spaces and control chars
no, brace.l, brace.r, dot.basic and others
Oh weird
With https://github.com/typst/codex/pull/63, do we want to sort out a way to deprecate modifiers or are we okay with just removing them?
I'm not sure to be honest. Deprecation seemed more relevant for instances where the symbol changes name, but here they're removed outright
The only deprecation mechanism for modifiers right now is a comment
Would it be hard to introduce something like the current thing we can do at the top level?
I'm clueless about rust
Today I learned that North Korea once requested that unicode add separate codepoints for korean letters that are used to spell out the names of kim il sung and kim jong il
what a loss
I have no idea, I haven't really read that code.
oh I thought you were the one that added it
No lol, I think it was @midnight tangle
basically the same person, your names both start with an m!
Not just m, but also ma!
for compatibility with this standard https://en.wikipedia.org/wiki/KPS_9566 which they apparently keep updating
KPS 9566 ("DPRK Standard Korean Graphic Character Set for Information Interchange") is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additiona...
they added kim jong un's letters in 2011
man there's a lot of lore on that page
Those characters have special meaning in the language we use
Oh right of course ahaha
Wait do we now know what Angzarr is!?
The angzarr (
⍼) is an obscure typographical symbol representing azimuth
https://en.wikipedia.org/wiki/Angzarr
The angzarr (⍼) is an obscure typographical symbol representing azimuth, dating back to at least the mid 20th century, which became notorious during the first half of the 2020s for its obscurity and lack of a widely recognised meaning (compare ghost characters).
The name is from an abbreviation of its ISO 9573-13 name, "Angle with Down Zig-zag...
Seems like it
duh
we could consider adding an alias with the actual meaning at some point. But we need to figure out the whole angle bracket situation
since right now the whole angle thing is weird
yes
regarding https://github.com/typst/codex/pull/65 how does one handle symbols that can be both a regular glyph and an emoji?
this is the case for instance for ☾ I believe
We would first have to resolve https://github.com/typst/codex/issues/25, which I'm planing to do after the modifiers resolution PR is merged (to prevent conflicts) if I find the time
@heady fulcrum it seems like most math fonts either have all the cyrillic letters, or none
the default font (ncmm) doesn't have any
Ah. Ouch.
unlike for greek, there are no specific codepoints for math cyrillic, since they're very rarely used
with the exception of perhaps sha
how does latex do it? the font over there has sha?
or is it some package that does trickery?
lol, this question makes me think it's the latter https://tex.stackexchange.com/questions/124738/i-just-want-to-write-sha-without-ruining-everything
it doesn't seem like unicode-math defines sha at all
I think adding sha makes sense, though we should probably ask if the maintainer of ncmm could add it
not sure
is it just the uppercase letter that is used?
https://github.com/typst/codex/pull/67 @heady fulcrum
pretty sure it's just uppercase
oh well, I added both for good measure
I'm sure there are people out there who decide to use lowercase, but I really don't thiink it's major, so no need to support both I think
sounds safer tbh
I saw some references to https://en.wikipedia.org/wiki/El_(Cyrillic) having been used, but that seems very obscure
They've been quite responsive in my interactions with them. I could shoot them an email about it in a bit
never heard of it. also reminds me of the tav comment in the github thread
(the letters even look alike, again!)
I might have dreamt that up then, oops.
Actually NewCM has Cyrillic in the text font, so should just be a matter of copying the glyphs into the math fonts
I found the solution to the angle bracket situation
brokets /s
It's not ideal, but how about "anglebracket", with a predefined lr-function "ang"? Presumably using the delimiter without creating some form of definition will be rare
Currently we have all sorts of entries for angle brackets and angle (as in acute) smushed together
Someone suggested chevron
Perhaps more ambiguous, but it's not bad as long as searching for angle brackets brings them up
I'm really not a fan of chevron
it's a fairly complex word (in terms of writing as well as pronunciation) and is hardly discoverable
presumably not a lot of people are accessing delimiters by the symbol names directly
I'd lean a bit more in this direction. It's just ang that bothers me a bit
ahh, maybe I misunderstood what you were proposing then
Initially just moving the angle brackets from angle to chevron. Like chevron.l.double etc
the current situation where we have two entirely different families under angle is very awkward
the other family being the angle from geometry?
yes
I agree it's a bit awkward, but I'm still not convinced that it's worth moving the angle brackets to a more obscure name
(I remember the previous discussion on this only somewhat)
assuming the lsp and symbol list is set up correctly it should be discoverable regardless
it's not like it's an unknown name
disagree
it's a technically correct name but very people call them chevrons, nevermind think of 'chevron' when they see the symbols
I mean, do you call the bar a macron?
I'd say that's even more obscure
no, but I also think we should change that
that I agree
IMO it should be bar and the current bar should become something else
I think we discussed something like this a long while back, but I don't remember what came of it
I don't remember who was in the discussion, but we came to the conclusion that dash should be bar
I mean, if the 'entry-point' for this notation was an lr-like function, then I wouldn't be too bothered about a more obscure name
but this lr-like function does not exist yet
dash?
yes, but that's a separate discussion
that's the extensible version of the symbol
it's a terrible name
?r $ dash(x y z) $
hm, I see; I'd say I agree with that (dash -> bar)
though that's totally orthogonal
I think most people want the extensible version regardless, so I think it's best to give them that as bar, even if it doesn't technically match latex
It's a separate discussion, but I think it should be a prerequisite for us to change the name of the current angles to something more obscure
agree 100%
crazy idea, but maybe we could do something like bar.short for the current macron then? (In a world where bar is the current dash)
(not sure the parser would work well with this)
?ast $bar.short(a)$
Markup: 14 [
Equation: 14 [
Dollar: "$",
Math: 12 [
FuncCall: 12 [
FieldAccess: 9 [
MathIdent: "bar",
Dot: ".",
Ident: "short",
],
Args: 3 [
LeftParen: "(",
MathText: "a",
RightParen: ")",
],
],
],
Dollar: "$",
],
]```
I don't think we necessarily need to rename it
oops*
just figured that could be slightly more discoverable
since macron is such as weird name
The problem is that no one can agree what it should be called, because it's used for wildly different things
yeah
but if we don't have something better there, then I think it's worth having the geometry angle in the same hierarchy for the added discoverability
I thought ang was a relatively neutral term,
when I see ang for some reason I think angstrom
yeah yeah, I know
(don't have the keyboard to type that)
(oh wait I do, I just don't know how to do it 😅 )
I don't think we can make everyone happy, but we can try to make many people the least unhappy? xD
yeah
why not angles?
(I feel like we already discussed this option)
for what?
for the lr function
that sounds even more angle-like
hm, I suppose we could also do angles.l and angles.r for the symbols, though I'm not sure that's a good idea or even trivial with the current architecture
but it would be separate from angle
which is the issue, no?
I mean, people call these angle brackets
I find angles even worse than angle to be honest
I find it natural that angle would be in the name
really? why so?
because it's just a plural of the word angle, not the plural of angle bracket
ok I'm done with PRs for a while I think :p
chevron is a bit technical, but I think it's worth it to have clearly distinct names for different families of symbols
the "angle" will still be there in the full symbol name (Mathematical Left Angle Bracket), so it will still be fairly discoverable
e.g. searching "angle" in the symbol documentation page will find it
I agree. If there are three names for the same thing, you have to know all of them to really know Typst. Otherwise when you read someone else' code you'll be "is this alef" the same thing as my "aleph"?
Also it's good practice to avoid user variables that shadow standard names. That gets harder if we populate the standard library with redundant names
@midnight tangle has a point regarding discoverability but this should rather be addressed in the tooling: the alternative names should be somewhere in the symbol metadata, and the documentation, tinymist, etc. should use that to help the user find the canonical name
this would have been different names for different things though
but not through the code, and the code should be the priority since it's the primary point of interaction
though this could be the way to resolve it. but we'd need to make sure that the typst webapp also does something similar
but again, I don't think it's worth having a more "organized name" if it hurts usability because it's less discoverable. And angle brackets are common, as we've already discussed, so I think they merit some care.
I think this was a separate comment about the Hebrew symbols
Although they're obviously related
oh yeah, you're right. which I then agree with
To me, it does not make sense for compose and circle.stroked.tiny to be aliases of each others because they have different semantics, so we may reasonably not want to have the same variants.
To me, aliases make sense for things that are different spellings of the same underlying concept. For different things that happen to be represented by the same symbol, not so much.
ah, so nabla and gradient are aliases, but since circle.stroked.tiny is only about the shape of the symbol, it's not considered the same thing. Makes sense👍
When making PRs that contain or plan breaking changes (i.e., remove or deprecate stuff), don't forget to add the breaking tag (if you have the permission). Also, remember that PRs containing breaking changes should be reviewed by Laurenz before being merged.
We can merge a PR that reverts it, but we can't unmerge it
It's not that big of a deal. Laurenz will see the changes at some point (when making the changelog if not before), and we can easily revert the PRs at this point if needed
I'll take a look at the merged breaking PRs when I get back to PR review
Completely forgot about that, sorry for merging some of those PRs too eagerly!
that's a paddlin
@storm whale I'm not sure why the accents are present as symbols in the first place? When is a standalone non-combining accent useful?
I have no idea. And the non combining accent decomposes into a space and the combining accent anyways
oh lol, that makes a surprising amount of sense
I think were doing it for all of them. Grave for example is \u{0060} and not U+0300 (but it 0060 doesn't seem to have any decomposition?)
does anyone have opinions about the creative commons symbols?
I'm wondering if they can be cc.by, cc.nc, cc.nd, cc.sa, cc.zero and cc.public
or if we want to use longer names
Like, cc.attribution is fine, but cc.noderivativeworks is a mouthful
I was considering cc.not.derivative and cc.not.commercial
I think the abbreviations are fine
what about the public domain mark?
I was thinking pdm would be hard to decipher
so that's why I went with cc.public
Oh yeah public and zero I think are fine as they're short as well
What happened to no more PRs for a while? Aha
🫠
I stumbled into some unicode rabbit hole, and this felt fairly isolated and uncontroversial
That's how it starts...you'll have ten more open by the end of the day!
my god there's a lot of chess symbols in unicode
It is a mouthful, but at the same time having the symbol name be equal to the actual thing that it is referring to helps with discoverability.
Yes I had to comment them in the document because otherwise it was too annoying to scroll through them
The abbreviations are very common here, so I think this is fine
I've been thinking. Should we separate math symbols from those that are not math symbols?
Because for regular users this distinction is not evident
I've been wondering about that for some time, and it might make sense to at least have some sort of marker that a symbol is a math symbol. But I guess that's what the Unicode MathClass is for.
Could be used in the doc to have a more consistent font as well. For now, symbols in the symbol lists use the default website's font, with a fallback to NewCM Math. This means some symbols look quite ugly, such as → and ←, especially when next to, e.g., ↑.
I was thinking like sym and mathsym or something along this lines
with only the latter being imported in math by default
The thing is, is the distinction really useful? Any symbol can be a math symbol, and some symbols can be both math and non-math (e.g., punctuation marks)
why would we want to not import all symbols in math by default?
It is useful yes, because it tells you which symbols are likely to be present in a math font
Dice symbols are clearly not "math symbols", but I could see myself using them in math. Same for playing card suit symbols
You would still be able to use them, they would just not be present by default
I don't think this is a significant enough advantage, especially because in the end there is no guarantee that "math symbols" are present in a math font and that "non-math symbols" are not
There's also the fact that there's symbols that are more or less identical, with the exception that one is intended for math and one is not
When codex is growing you would get auto-completion suggestions for symbols that are extremely unlikely to be present in a math font, but seem like they should be
As an example https://www.compart.com/en/unicode/U+2E28
U+2E28 is the unicode hex value of the character Left Double Parenthesis. Char U+2E28, Encodings, HTML Entitys:⸨,⸨, UTF-8 (hex), UTF-16 (hex), UTF-32 (hex)
Regarding this specific symbol, maybe we shouldn't add it in the first place. In general, this seems like an autocompletion issue that could be tackled separately
Are math symbols the only ones that should be present in codex then?
There is no clearly defined goal: https://github.com/typst/codex/issues/5
I see no reason to arbitrarily limit symbols, unless they're clearly compatibility characters and such
But I would say we shouldn't add symbols that aren't useful, and this double parenthesis thing isn't useful imo because it's not meant for use in math, and whoever needs it in text probably has a better way of inputting it
But that's why I'm also saying that there needs to be some way to tell math symbols apart from non-math symbols
and whoever needs it in text probably has a better way of inputting it
How would they do that?
The real question is "who is this codepoint for?" If it is used in a specific language, people writing in this language probably have a way to input it outside of Typst which they can use in Typst as well
Unicode isn't particularly good at making clear what a symbol is supposed to be used for
without digging into proposals
I agree this is quite annoying
And they don't even make it easy to find the relevant proposal
In this case they seem to be editorial symbols for Latin scholars
(but who's to say they don't want to use typst?)
Anyway, I'm not proposing adding them now, I just wanted to provide an example
I think it would be better not to merge PRs immediately, in order to let time for everyone to give their opinion. Ideally, I would wait a week until the PR was opened, but I understand that it can slow down development. At the very least, PRs should probably stay open for 24h.
Rust has a similar concept of a "final comment period" for RFCs
yeah I don't merge PRs even when I'm the second approver for this exact reason
Sorry, I'll keep that in mind in the future
there's no hurry, it just didn't occur to me
Is there some setting that could prevent the merge button to show up until a week has passed?
maybe using Actions (which i don't know anything about), but it's probably not that big of a deal as long as we just commonly remember not to merge PRs too eagerly
another reason for multi-character symbols: many ipa symbols consist of sequences with combining accents
https://marketplace.visualstudio.com/items?itemName=buttaiwan.cursor-charcode this is super useful, except it hasn't been updated in 5 years
so no unicode 14+
dash.wave (U+301c, 〜) and dash.wave.double (u+3030, 〰) are both cjk punctuation characters, which we've tried to avoid so far
the latter is also an emoji (〰️)
It seems like every batch of symbols I come across has at least one symbol in it that needs a text variation selector
Do we have all emojis apart from skin color variants and such?
the answer is no
Does anyone have a good way of sorting emoji.txt and sym.txt?
Not really. The thing is we kinda want to sort by meaning, but also by alphabetical order within some groups, so it's a bit messy at the moment
I feel like the best that can be done without putting unnecessary work into it is just reorganizing entries around the liness that are affected by a given PR when working on said PR
I had forgotten that about sym.txt, but emoji.txt isn't really sorted by meaning
it's like 99% alphabetical with some odd ones out
If it's easy to fix by hand I would just do that. Otherwise, idk
The ones I found would be easy, but if I were to actually make a PR I'd wanna make sure I caught all
Does anyone know why we have "kai"? it's essentially a greek ampersand, and would likely not be useful for anyone not writing greek text
and they presumably have better ways to enter it
@midnight tangle funnily you had this pr https://github.com/typst/typst/pull/5316 but typst did not have names for digamma
Since you are making PRs for Greek letters, Unicode defines U+00B5 µ MICRO SIGN, which is explicitly not equivalent to the Greek letter Mu: https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-7/#G12477
is that not similar to the other situations? Ångstrom etc
No
Specifically, it is explicitely not similar to the Omega / Ohm situation (see the link above)
in stix two math it looks exactly like upright mu
The ohm sign is canonically equivalent to the capital omega, and normalization would remove any distinction. Its use is therefore discouraged in favor of capital omega. The same equivalence does not exist between micro sign and mu, and use of either character as a micro sign is common. For Greek text, only the mu should be used.
but I see there are differences in serifs for new computer modern
It probably does in most fonts
Oooh interesting
I honestly suspect that may not even be intentional
Btw ths character is present on all AZERTY (i.e., French) keyboards AFAIK
The visual difference?
Yeah
Above is NewCM, below is NewCM maths
Left is Mu, right is Micro
With fallback disabled
I agree it's probably not intentional
It's so annoying that NewCM does not have a public bug tracker where we could just report that instead of writing an e-mail
It might be intentional that micro is the same for the text and math fonts
It looks good near an upright "m" in both cases
So probably intentional actually
Looking at the list of symbols defined by Unicode math, it doesn't seem to include a name for micro
Maybe because it's not really a math symbol, although it is still a scientific symbol
To me it makes more sense to typeset units using the text font and not the math font
Above: Libertinus Serif, below: Libertinus Sans
Left: Mu, right: Micro
There is a slight difference between the Micros, but not between the Mus I think
At least they should be typeset upright, and in LaTeX this would be using the text font I believe?
You mean mathrm?
I don't know LaTeX but I think it was said multiple times that it uses the text for for upright math at least in some scenarios
That's the text font yes. Though I think you can typeset upright math with the math font at least in the Unicode engines
I remember someone (maybe even you actually) saying that LaTeX uses the text font more than they expected in math
i thought \mathrm would be upright math font while \text would be text font
No, mathrm is the text font.
See the top answer here https://tex.stackexchange.com/questions/717151/setting-font-face-for-mathrm-in-math-mode
By the way, epsilon and epsilon.alt are reversed relative to latex
I'm not sure if it's intentional or not
It is probably, because the LaTeX non-alt epsilon is a terrible choice
Dare I say, it was made by a 🌙 lunatic 🌙
I mean, that's subjective. It's not uncommon to use both in the same paper for different things
Anyway, I don't think it's a big deal necessarily. I just happened to notice it because of epsilon.alt having a reversed variant, but not epsilon
It'll definitely trip some people up
Isn't phi also reversed the same way. vs \phi and \varphi
Yeah
Varphi is so much prettier than phi 🥰
How much effort does the inclusion of multi character symbols require?
I wish I understood rust
Almost none from Codex's POV. Then, in Typst, they should work properly in text, but in math there is still some work to be done before they are supported.
https://github.com/typst/typst/pull/6336 might be enough, or at least going in the right direction for supporting that in math
The main reason I haven't implemented multi-character symbols in Codex yet is because multiple meta PRs are waiting to be merged, and I don't want to have to deal with conflicts
Meta pr?
PRs changing the thing that parses the symbol files
they have the meta tag on GitHub
This PR basically finishes support needed for math. Only thing left is then on the typst side to modify Symbol and SymbolElem to support multiple chars
I found a bug
I know we said to leave PRs open for a bit but with something like digamma that seems fairly uncontroversial?
uh oh a bug where
scratch that
or actually
I'm not sure
I can't find a font that has both
but the first symbol is an inverted L, and the last is a sans serif inverted L
the sans function should presumably be mapping it correctly
There's an inverted L?
yep
I am 99% certain it is not in any of the Mathematical alphabetic blocks in Unicode then
No, I believe they are more associated with IPA. But some are present in a number of math fonts
probably used for inverse functions and such
And they have sans equivalents?
Not all
U+2142 is the unicode hex value of the character Turned Sans-Serif Capital L. Char U+2142, Encodings, HTML Entitys:⅂,⅂, UTF-8 (hex), UTF-16 (hex), UTF-32 (hex)
U+A780 is the unicode hex value of the character Latin Capital Letter Turned L. Char U+A780, Encodings, HTML Entitys:Ꞁ,Ꞁ, UTF-8 (hex), UTF-16 (hex), UTF-32 (hex)
actually the sans serif one says math symbol there
oh the sans ones are in the letter like symbols block
yeah there are a few more
sans serif inverted G, L, Y and reversed L
presumably the corresponding non-sans serif symbols should map to them?
otherwise there's no way to add them as symbols unless we use a sans modifier, which is inconsistent
I'll create an issue
I can add them to my codex pr
ok, I'll create the issue regardless
I'll probably forget aha so should do
By the way, there are a few instances of inverted used in a different way in unicode than we do
ʁ, codepoint U+0281 LATIN LETTER SMALL CAPITAL INVERTED R in Unicode, is located in the block “IPA Extensions”. It belongs to the Latin script and is a Lowercase Letter.
In typst currently, inverted means rotated by 180 degrees, while here it is used in the sense of vertical mirroring
We use .rev instead right?
ah
actually I think the turned L is literally the only one that has a non-sans serif version @storm whale
why do you have to be so damn inconsistent unicode
In the days of printing with metal type sorts, it was common to rotate letters and digits 180° to create new symbols. This was a cheap way to extend the alphabet that didn't require purchasing or cutting custom sorts. The method was used for example with the Palaeotype alphabet, the International Phonetic Alphabet, the Fraser script, and for so...
the whole situation is a shitshow
there's a whole duplicated region for "fraser script" too
I gave up for now
There actually is a system behind TeX's decision of var vs not-var: The variant letters are those which you can draw without lifting the pen (or those which look more handwritten in the case of σ,ς and ρ, ϱ)
I think that was a good choice, it's annoying in latex having to write var to get the better looking and less confusing symbols (as the others can be confused within and empyset)
I cannot merge https://github.com/typst/codex/pull/70 and https://github.com/typst/codex/pull/81 due to lacking reviews by the way. I guess we should hold off on the second one even with the review from Laurenz, since it hasn't been a week?
We have a new naming problem, which fits well with codex I think: Numbering systems. We're considering to use the existing CSS counter style names, but they are a bit inconsistent. For example, I find it a bit odd that CSS spells it "greek-lower-modern", but "lower-roman" (lower once before and once after the writing system). There is a bit of preexisting discussion in https://github.com/typst/typst/pull/5622 and new work in https://github.com/typst/typst/pull/6379.
We were also discussing, in the case of adding many more numbering systems, to move them into a separate crate. Perhaps this crate could be codex! Essentially (already with the math styling), we could expand codex from just naming symbols to more generally supporting Unicode- and internationalization-related efforts in the Typst ecosystem.
Thoughts?
I think I agree with categorising this under Codex—we're kinda like a Working Group/Task Force for i18n in Typst ;). I think it could be easier to organise everyone's thoughts if we opened a new document under the Codex organisation where we can look at all the numbering systems at large
This could be a great opportunity to expand the Codex Goals document, which for now is not very informative regarding the goals of Codex to say the least
Yes! I revoked the share link now because a public write share link is a bit dangerous. Could you edit your message to the new read share link? I'm happy to grant access to other trusted people that aren't yet external project members. The existing people that clicked the write link (i.e. you and Emily) still have access because they became external project members.
Done
This reminds me of this half baked RFC/discussion I've had floating around since Oct 2023 😛
Dedicated template syntax
The what
I believe Typst would benefit from dedicated syntax to create templates. These would be objects that generate content. Where today standard library APIs have to take in crude function parameters (like numberings beyond strings) or lack in flexibility (like
supplements)The why
A very large portion of all use-cases of advanced numberings, outline entries, reference syntax, and re-usable formatting in general would greatly benefit from such a capability being convenient and consistent across the ecosystem. This can also apply to
.display()interfaces such asstate's,counter'sReferences:
- Issues: #2485
- Discussions: #2479, #2353, #2243
The how
This is why this discussion exists of course! Please share your opinions and ideas!
Let's start by considering some prior art; these are mostly programming languages with f-string adjacent syntax:
"string {interpolation:format}": Rust, Python, C++"string sigil{interpolation}": JavaScript, Java"string sigilinterpolation": Perl, Most shells
These are not used much outside of string interpolation in the aforementioned works, but their potential could extend to arbitrary content in both normal and maths mode
This is very incomplete and not fully thought through but make of it what you will
Alright I wrote some stuff there now it's somebody else's turn >:(
but if we do decide to adopt numberings I think a design doc just for them will be of great use
How viable would it be to load codex dynamically? Breaking changes wouldn't be as much of a problem if people could fix a document to a particular version of codex
imo breaking changes have been few enough to not warrant further consideration. We're still in 0.x after all.
Breaking changes have been few because we've been avoiding breaking changes so far
Yes it's 0.x, but people do in fact use it. Having symbols stop working or change meaning in a document isn't ideal
As long as Typst is in beta, I prefer a breaking change to an ecosystem split
Also, we might as well invest time into making modifiers deprecable instead
That would be very useful in any case
Yes. i don't remember why I did not do that in the first place. It probably required changing too many things. But retrospectively this ws a bad decision.
I will try to work on that when the existing meta PRs are merged (I don't want to deal with conflicts)
hasn't Typst already lost the beta labelling? Or are you just referring to 0.x
Yeah I was just referring to 0.x
I guess I should put this here as well https://typst.app/project/rFiEdiAaEJUIQn6Nuu38Bx
I created a new version https://typst.app/project/rlonk5duAqHoQqTNYpkygm which includes two additional more obscure blocks
also useful package @midnight tangle
thanks
there's also this really weird one https://en.wikipedia.org/wiki/Combining_Half_Marks
Combining Half Marks is a Unicode block containing diacritical combining characters for spanning multiple characters.
Re #46 (comment): Anyone have any good ideas for alternatives to _unchecked?
For insert_unchecked, you could maybe use insert_dotted?
I think we need to find a different solution for PRs with breaking changes as I don't have time to think myself into each one. Perhaps we should just ensure we're more sure, e.g. with 3 or 4 approvals?
A longer grace period could also work
Should also be less problematic as long as everything is deprecated properly. Worst case scenario there's always "undeprecation"...
Is there a way to ensure that on GitHub (forcing PRs with the breaking tag to have more approvals)?
It would be nice I think, just to prevent accidental merging
Also, would that include meta PRs as well, or do you still want to review those personally?
I don't think it's possible, but not sure
The meta PR volume is low enough to keep reviewing
I'll add that for most of the PRs with breaking changes it's only a small portion of the PR that is breaking. We could give a short description of the breaking change. You wouldn't need to look at the rest of it
But I do understand if you'd like to avoid non-meta PRs altogether
@midnight tangle should we just merge https://github.com/typst/codex/pull/56 then? it's been open for almost two months
I just approved the PR so that we can respect the new idea of requiring more than two approvals for breaking PRs. It's been open for quite a while (two months!?), so let's merge it!
how can I see a list of maintainers?
I know of you, mkorje and t0mstone
@heady fulcrum maybe?
on it
ah
ooops
I thought I was being asked to review it LOL
but yeah I'm also a maintainer
we already merged that one, but there are some other lingering ones. I'm just wondering who I can request reviews from
(even if I'm not that active...)
no problem
if you have anything specific, feel free to ping me
I'll just request reviews, but don't feel obligated to do all of them. It's just that I can't figure out a way to see who actually has maintainer status atm
https://github.com/typst/codex/pull/70 should be fine to merge as long as I get one approval, since laurenz has already signed off on it
is the deprecation note in the pr correct?
in particular, Laurenz mentions a t->top change, but the PR (including title) are about top->t, right?
either way is fine by me (iirc this would make things more consistent, right?), just wanna be sure which one was decided on
It's top -> t, I think he just wrote the wrong thing in his comment
yes, every other symbol uses t
@pastel violet maybe as well?
the full list: Malo, Enivex, mkorje, T0mstone, emilylime
regarding https://github.com/typst/codex/pull/59 , would rupee.generic be better than rupee.general?
I'm also wondering if it would be better to indeed include ¤ as currency after all, and make ₡ currency.colon instead of colon.currency.
though I think it would be more easily discoverable as colon.currency
I'm for the inclusion of currency, but not currency.colon, because that'd be inconsistent with all the other currencies not being under currency.
wdym
none of the currencies are under currency
Exactly, but with your proposed change, one of them would be
well it's going to be inconsistent regardless, since none of the other currencies are under colon either :p
True ig, but it feels less inconsistent to have colon just have a name collision. Otherwise I can imagine a new user being confused why the currencies are not under currency.
anyway if we're not moving colon then I see no reason to include currency. I've yet to see evidence that it's actually used.
I've used it in the documentation of one of my not-yet-public crates😛
but that's probably not the kind of evidence you're looking for
used for what?
It was for examples of monetary formatting. An excerpt:
/// The sign string comes before everything else.
///
/// Examples: `±¤1`, `±1¤`
generic feels slightly better. Idk if we have a convention for this or something
we don't yet
Maybe we should make a document that keeps track of some established conventions that we're already using...🤔
(in the codex repo, to be clear)
I think this is too fluid to be clearly established in a document. Kinda like how legal precedents may be compiled in books, but you can't just have a list with clear cut boundaries
Idk maybe this comparison is stupid
I'm talking about conventions, not clearly established rules. It would be more of a list of guidelines and a collection of precedents, so you don't have to keep track of that in your head, because idk about you, but I certainly can't remember every modifier that exists on my own and I don't want to have to read the entirety of sym.txt to check for inconsistencies every time there's a new change.
@midnight tangle I've done some thinking about the above/below and over/under thing
it's very relevant for bottom accents
What I ended up on is that under would be preferable for the reason that the short form b is already taken for bottom, while u is free.
that is, no modifier would default to above, while u would mean under
so macron and macron.u etc, and the same would apply to the operator decorations
does that make sense?
I think bottom accents and the dot above/below equal sign can use different modifiers
But I haven't thought about it a lot
Also, I'm sure I love .u. Very short modifiers make sense for common symbols and when there is no ambiguity. .t, .b, .r, and .l are good because I knew what they meant when I saw them for the first time, as a user. .u, I fear not so much
But I may be wrong
I think .u would be very clear for diacritics at least. There aren't that many modifiers
yeah maybe
Although I guess why not use .b in this case?
there are some weird symbols where you both need to refer to a direction and position
Ok
U+0356 is the unicode hex value of the character Combining Right Arrowhead and Up Arrowhead Below. Char U+0356, Encodings, HTML Entitys:͖,͖, UTF-8 (hex), UTF-16 (hex), UTF-32 (hex)
though this is obviously a worst case scenario...
that would be in unicode once we have tilde.u 🙂
Is it used in a language, or meant for use in math?
Yeah but sadly NewCM Math does not render the combination of subset.sq and combining tilde below correctly (even in text mode)
I'm not sure what the intended use is, but none of the math fonts I tested have it
Then I think we may want to not worry about it for a while
the distinction between math and text isn't that easy. there are many more diacritics available in math fonts than those that have been assigned a math class
Yeah but I mean I don't think anyone uses it n math, and math fonts don't support it, so why giving it a name?
project is private
(note the names are just a rough draft, I just wanted to demonstrate font support)
oops
this is the correct link
codex is broader than just math though, no?
Yes, but the symbols we add should have a use. If this is only used in a specific script, users of that script already have a way to input it so they don't need us (it's not like they are gonna write every word by combining Codex symbols). If this is used in the IPA, then I guess there is an argument to be made, but for now we have left IPA symbols as a future work
I think this falls more in the IPA camp, I can't imagine any natural language using this
Who decides whether something has a use though? I don't have encyclopedic knowledge of every field, and it's not like unicode is very good at actually describing what symbols are used for
obviously some things are more pressing to add than others, but I still think we should have these kinds of things in mind when naming so we don't run into trouble later
When we can't determine how a given symbol is used, and its name is not trivial (i.e., we would have to spend some time to find a good name), we can just wait for someone to ask for us to add it and provide a use case
that's fair
my initial goal was adding names for everything in strix two math, which is a more manageable subset
at least almost everything
Cardinal directions are also available.
Would it make sense to use l,r,t,b for position, but n,s,e,w for direction?
Or l,r,u,d
Or the reverse, meaning l,r,t,b / l,r,u,d for direction but n,s,e,w for position
I'm not sure which would incur more breaking changes
I think I need to see some concrete use cases
Right now, I have the feeling that the distinction between l/r/t/b and n/s/e/w would be too subtle and feel like an inconsistency to non-power users
I was just thinking out loud. We should probably take our time to figure it out before adding additional symbols where this distinction is important
By the way, I'm not sure if there are any modifiers removed between 0.13.1 and when we settled on a method of deprecation, but we should probably retroactively add them if there are
yes, at least https://github.com/typst/codex/pull/63
naming would be so much simpler and more consistent if modifiers were order dependent..
For those interested, interesting discussion is happening in https://github.com/typst/codex/pull/89
Input would be welcome I think
I didn't mean to kick the hornets nest, but I think it's helpful to have fully fleshed out proposals to look at
What are you referring to?
Maybe, but as a user, I found it really good that modifiers are not ordered when I discovered Typst: you can just say what you want on your symbol, and don't have to remember in what order to specify things
Yeah honestly for a while I did not make PRs because I wanted to reach a consensus before, but your recent PRs have shown that consensus is easier to reach when we have actual changes to discuss in the form of a PR
the main drawback is that order is extremely useful to disambiguate
yeah
the only way we have to remedy that right now are submodules, but that's only for the top modifier (or can you have nested submodules?)
you can
But I don't think turning every symbol into a module would be good. Submodules are great for what we use them for currently I think (i.e., grouping multiple closely related distinct symbols)
I'm just thinking that it could be extremely useful to have the opportunity to use order to disambiguate in a few instances,
take ⪋ and ⪑
If we allowed the names lt.equiv.gt and lt.gt.equiv we wouldn't have to bend over backwards to come up with a solution
Maybe a solution could be to allow modifiers to be ordered, but if only a single order exists for a set of modifiers, then the variant can be accessed in any order
yes that was my idea essentially
This would also open up the possibility of repeated modifiers.
It would make things more weird and opaque for the user I guess, but I don't think this is very important. I would tend to think users don't care about exactly how symbols work. When they want a symbol, they search for its name, and if the name make sense they will remember it
I'd argue that while it may be a bit weird in some circumstances, it will also be less weird in others
Like you said, I don't think most users even think about this
And if names tend to be consistent with each other, AND the variant resolution is permissive enough, they will be able to guess new names correctly, or fix it if they are wrong (e.g., I want ⪋ and I try lt.gt.eq, it displays ⪑, I can guess what to change to get what I want)
your famed ⩶ can be eq.eq.eq (only half joking)
I used this "trick" to make the naming in https://github.com/typst/codex/pull/75 possible, but with a change to how modifiers work it wouldn't need to be a submodule (though I think in this instance it perhaps makes sense that it is)
Yeah in this situation I would argue using a submodule makes sense
At least it's not non-sensical
How hard would it be to make sym.txt writable in a hierarchical way (independent of whether we change how modifiers actually work)?
As in
I think
triangle
.stroked
.t △
.b ▽
.r ▷
.l ◁
.bl ◺
.br ◿
.tl ◸
.tr ◹
.small
.t ▵
.b ▿
.r ▹
.l ◃
.rounded 🛆
.nested ⟁
.dot ◬
.filled
.t ▲
.b ▼
.r ▶
.l ◀
.bl ◣
.br ◢
.tl ◤
.tr ◥
.small
.t ▴
.b ▾
.r ▸
.l ◂
is more maintainable than
triangle
.stroked.t △
.stroked.b ▽
.stroked.r ▷
.stroked.l ◁
.stroked.bl ◺
.stroked.br ◿
.stroked.tl ◸
.stroked.tr ◹
.stroked.small.t ▵
.stroked.small.b ▿
.stroked.small.r ▹
.stroked.small.l ◃
.stroked.rounded 🛆
.stroked.nested ⟁
.stroked.dot ◬
.filled.t ▲
.filled.b ▼
.filled.r ▶
.filled.l ◀
.filled.bl ◣
.filled.br ◢
.filled.tl ◤
.filled.tr ◥
.filled.small.t ▴
.filled.small.b ▾
.filled.small.r ▸
.filled.small.l ◂
Probably not too hard
Although for now I think the parser could (but doesn't) not rely on whitespace
It doesn't have to be that specific syntax, it was just for illustration purposes
The roles are swapped, now I am the one to advocate for less whitespace dependence
I think the syntax you proposed makes sense
It is also more readable than having the same thing repeated everywhere
yeah this is my problem, it gets very distracting
if we don't want to depend on whitespace, it could also be something like
triangle
.stroked
..t
(etc)
i.e. one additional . per level
I don't think whitespace dependence is a problem in this case, especially for an internal syntax what we are the only ones to use
I don't particularly care either way, just pointing out options
We can leave out bullet.hole for now if you think it's problematic
@midnight tangle
I don't necessarily think it is problematic, but I'm not sure exactly what it is meant to be used for
https://catawbaindian.net/storage/app/media/Generator - Project Subapplication Supporting Documentation Checklist.pdf I found a document in the wild
Interesting. Then if it is used as a bullet point in lists, let's add it as well
btw submodules are not yet supported by the docs. how do we imagine them being shown?
One option is to just list symbols from nested modules the same way other symbols are listed (prefixing the name with the module of course)
I don't think the fact that it is a submodule needs to be user facing at all, except perhaps an indication that it can be imported?
This is a really interesting approach. I hadn't even considered that this is possible: https://github.com/typst/typst/pull/6448
This is interesting, but I'm not sure this use of modules is "right". Extending the functionalities of symbols seems like a cleaner option to me for Codex.
These aren't actually symbols though
You mean, Codex modules?
If so, indeed. I'm only saying that if we want to add symbols where the order of modifiers matters, we shouldn't do it through modules but rather by extending the functionalities of symbols
I'm confused. I was talking about the negative spaces
I think what Laurenz was saying is that a similar technique (i.e., having modules with content) could be used in Codex
I was under the impression that it was unconnected to the earlier conversation, but I guess only Laurenz knows
I don't think it was connected to earlier discussions in this channel
Oh I understood what you meant now
Anyway my initial point was that codex exclusively deals with actual Unicode symbols no? Which negative spaces are not
Indeed, spaces in the math module, and in particular the proposed negative spaces, do not use Unicodde characters
I think I'm with you. In fact, I've been wondering whether it was a mistake to use modules at all in codex (e.g. for the gender stuff) and whether we should instead have added support for symbols without a default variant
The fact that I'll have to rerig the docs generator to somehow hide this from users is a symptom of the inconsistency
I think the way we use modules currently is fine, and I would expect some packages to use modules in similar ways
I dislike that it feels just like a symbol without being one
#73 defines a sym.chess module, which groups different symbols without itself being a symbol (or looking like one)
That feels different to me than the gender one for some reason
I agree that this one feels more strongly like it should be a submodule, but in the end the conclusion is that there are scenarios where sybmodules make sense (responding to #1277628305142452306 message)
Maybe it was a mistake to create so many PRs at the same time. Should I close the ones that are currently blocked?
Blocked PRs are kind of annoying because they clutter the PR page. I think you can close those that are blocked on multi-character symbols for now (or just remove the part of the PR that blocks it)
The bullet one is fine to keep open I would say
Also, IIRC Rename circle modifier for symbols that are circled to "o" #62 is pretty much ready to be merged and only missing a review. If someone has some time this week-end.
There are some more that are pretty much ready to be merged in addition to that one. https://github.com/typst/codex/pull/59, https://github.com/typst/codex/pull/78, https://github.com/typst/codex/pull/81, https://github.com/typst/codex/pull/90
doesn't this add deprecations so @past python would have to approve anyway?
ah, must've missed that
I cant find the message now, but it's in this thread somewhere
The current thing is:
- Non-breaking symbol changes: 2 community review
- Breaking symbol changes: 3 community reviews
- Meta changes: review from Laurenz
I guess https://github.com/typst/codex/pull/90 doesn't need more reviews, I was just waiting for a week to pass. Should be fine now
they were already deprecated, just not documented
Yes it's non-breaking
I'm trying to remember which emoji makes https://github.com/typst/codex/pull/79 reliant on multi character
Good luck
it's the addition of "envelope"
I believe
so the question is if I just remove the top level symbol for now, or just close it
I'll just close it, there's not consensus anyway
@lapis moth regarding #51 (comment), I tend to think minimal modifier sets are less of a priority than multi-character symbols.
Since the codebase is quite small and all meta PRs end up conflicting with ach other, it's probably better to do them one at a time. So if you want to work on something, implementing multi-character symbols would be a better contribution imo.
Of course this is just my opinion so if you are more motivated by minimal modifier sets, feel free to work on that either way.
For now, multi-character symbols would not be laid out properly in math AFAIK, but they could already be useful for emojis and other symbols that are not primarily intended for use in math.
There's also the question of what we discussed earlier, about order dependence.