#Rich Text - Uncreatively Named Text Handling Prototype Tool
1907 messages Β· Page 2 of 2 (latest)
These are the only ones I have:
I just keep 2 builds, debug with the debug and run benchmarks/profiling with release, luckily I don't use ue so I don't have any cases where debug is literally unusable
it should be a drop down list in the toolbar somewhere already, out of the box
random thing
someone mentioned obsidian here a while ago, so I tried it out and it's honestly pretty great
it's a solid markdown editor for starters but it also lets me keep a random ass pile of notes in pretty much the same way you'd want from talking to yourself (or others) on discord
would definitely recommend
it has ```language``` markdown support natively, instantly great
yeah obsidian is great
are these all root tasks π
(i.e. is this necessary for some other project or are these tasks just self contained so to say)
no they're all completely unrelated
they're just the main projects I find fun atm, they do sort of feed off each other in abstract ways
in the sense that my raytracer uses my vulkan abstraction and I plan to use this as my general purpose text handling lib in the future
Is notepad really able to render all that
That's honestly impressive
I never considered it as anything but an ASCII editor lol
what I ended up doing was using a normal distribution to generate word lengths from randomly picked unicode blocks (out of a subset of unicode blocks I know I have font info for), probably the best I can do without using a dictionary, which is too much of a pain in the ass
yeah it clearly has fallback font support too, of some extent
but I think hardcoded or using some windows registry default, I have my font set to noto sans but it wants to fallback to Segoe UI or its list of fallbacks when it wants
also this is laggy as fuck
this is 1MiB (1024 * 1024) bytes of UTF8 text
I just tried inserting some emoji (which, note to self, I need to add some block ranges for so I can test hitting that in my font cache too)
and it took a few seconds
n++ also struggles with it
resizing notepad is so slow too lol
generated randomized font size, weight, and style runs as well, and a debug html file dump

do you also support that wonky ass unicode overflow thingymajigg where the characters go crazy?
zalgo text? yeah
yep π
the thread image is a pic of zalgo text rendered with my thing
It's using the system's/window's text drawing APIs which are pretty bad but can render anything.
windows text rendering is not bad at all though
cleartype is like the best thing around
i find windows' text rendering superior
I'm not saying it's bad in quality, but it's very slow.
and Notepad does not do texture atlases or any kind of basic optimization.
how is that relevant?
with a lot of text on the screen you can see the double buffering from when it tries to redraw on the WM_PAINT message
I was giving a bit more info on this.
@molten galleon BTW, how do you handle texture atlases? Is it per font?
Just wondering about different methods
nope lazily allocated like I mentioned
which is the method roblox ships to all platforms with
What do you mean by that? I'm not too familiar with graphics terminology yet.
I start with an empty atlas, I request a glyph from my atlas, if it doesn't have it then it adds it
lazily [x] (just as general programming terminology) just means "only do it when someone asks for the result of it"
Ok, so that's the same method I do right now, but I'm confused on how to do multiple fonts and multiple atlases when the first atlas is completely filled up.
https://stackoverflow.com/questions/3980035/performance-of-win32-memory-mapped-files-vs-crt-fopen-fread
https://manybutfinite.com/post/page-cache-the-affair-between-memory-and-files/
Ended up doing a little digging into memory mapped files on wingdings because of the discussion the other day, this is some good info
I'm realizing that while memory mapping isn't the optimal thing for loading a file for a single read, like for a glTF or an image, it might be a lot better than trying to handroll some kind of LRU cache for font file memory, for a whole host of reasons:
- an LRU cache like MuPDF's works on variably sized data and just holds whole handles to file memory, its eviction scheme isn't actually true LRU but it tries to scavenge for objects of closer size to the new memory you need, more like an allocator
- evicting a file whole means that you might have to eat the cost of reading a whole file into memory multiple times during the course of a single text layouting operation
- font files have disproportionately hotter areas (the certain charmaps/tables you want), and trying to serve that need would basically cause you to reinvent the OS's paging, virtual memory, and file caching
preliminary benchmarks (before any fancy alterations to font loading) show I can layout 1MiB of text in about 1.8 seconds on release
could probably shave some time here and there if I wanted to go extra hard but the vast, vast, majority of the time is spent on harfbuzz actually computing the shaping
so I'd say my main goal now would be to just try to reduce overall memory usage without causing too massive of a regression, and make this thread safe (since rn you can't use the font cache from multiple threads), since that's about all you can do to improve runtime
but really, it seems the main way to speed up layout is to tailor it to how you're actually using the text
that function naming π
lol which ones? most of the functions in that profile are from hb/FT
yeah underscore mixed with capitalized snek
PascalCase and lower_case_snek
its all in there : )
oh yeah lol, just seemed right to name it like that since it narrows the benchmarks by category
so you can see how BM_Layout_MultiFont_LineBreak would fit in
(which is currently disabled due to epic font cache bugs)
yeah
fakk how do you make a multi-line-graph in excel
some random tut tells me my data needs to be kinda arranged like this
but how do I do that, if a pivot table didn't
fuckkkk how tf do I use excel
One time I tried making a macro for something I was frequently doing in Excel. I never succeeded and I still haven't mentally recovered from it
so I got it to treat string length as numbers, that's a start, it still thinks its all 1 dataset though
I just want it to show multiple lines for each test
ack
rather, 1 line for each test
I need to figure out how to squeeze it out of this because this is how google benchmark dumps the data to me
what a pain
you'd think pivot tables would be the way you aggregate it by run name
ok I've given up on getting a nice graph view for now, excel has won for now
https://learn.microsoft.com/en-us/globalization/fonts-layout/fonts#font-linking
surprised I didn't come across this before
and linux's fontconfig
it looks like I've been wandering down essentially the right path, minus the specifics and complexity of my fallback/linking scheme
got my FontRegistry API looking like how I'd want it, still some internal stuff to smooth over but things are looking good
looks like there was basically no performance change before/after either, including changing from loading font files into memory to just mapping them
but now building text layouts is safely multithreadable
i found something
which might be something for you
when you decide to bin vulkan and come back to the holy opengl land
or want to use gl for some dummy project
NV_path_rendering π
requires gl1.1 at least, and you can apparently feed novideo cards SVG and postscript path syntax
const char *svgPathString =
// star
"M100,180 L40,10 L190,120 L10,120 L160,10 z"
// heart
"M300 300 C 100 400,100 200,300 100,500 200,500 400,300 300Z";
glPathStringNV(pathObj, GL_PATH_FORMAT_SVG_NV,
(GLsizei)strlen(svgPathString), svgPathString);
probably compiles that into some draw list of sorts under the hood
const char *psPathString =
// star
"100 180 moveto"
" 40 10 lineto 190 120 lineto 10 120 lineto 160 10 lineto closepath"
// heart
" 300 300 moveto"
" 100 400 100 200 300 100 curveto"
" 500 200 500 400 300 300 curveto closepath";
glPathStringNV(pathObj, GL_PATH_FORMAT_PS_NV,
(GLsizei)strlen(psPathString), psPathString);
static const GLubyte pathCommands[10] =
{ GL_MOVE_TO_NV, GL_LINE_TO_NV, GL_LINE_TO_NV, GL_LINE_TO_NV,
GL_LINE_TO_NV, GL_CLOSE_PATH_NV,
'M', 'C', 'C', 'Z' }; // character aliases
static const GLshort pathCoords[12][2] =
{ {100, 180}, {40, 10}, {190, 120}, {10, 120}, {160, 10},
{300,300}, {100,400}, {100,200}, {300,100},
{500,200}, {500,400}, {300,300} };
glPathCommandsNV(pathObj, 10, pathCommands, 24, GL_SHORT, pathCoords);
turtle grafics en novideo
ideal to draw glyphs π
wow lmao
kinda makes me wonder too if this is/was a vector for exploits
because you have to think that at least at some point some browser might've been pasting raw <svg> strings into this
haha that was my first thought too when i saw that - forgot to post the sauce https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/NV/NV_path_rendering.txt
don't browsers just use bigboi graphics apis like Skia
don't think they would need anything like this in the first place prob
yes
ofc not, they use whatever libs are there already
I saw this during my research but it's kinda useless because only Nvidia GPUs support it lmao
ye
it might not be useless though
if your vector grafics editor you are making can detect nvidia, you could potentially use it and have hardware accellerated vetor grafics
yeah but if your other option doesn't use cpu prerendered in the first place, it's probably too dogshit to exist
and I don't think you can use it in place of a CPU prerender
nah this is cool, but for 99% of people that are trying to make their games accessible at all, i don't think this would be an option π
i wonder if you could build a resizing algo that efficiently resizes based on the vectors instead of having freetype generate them for you again
I'd wanna figure out how the driver implements it
good luck
I'm guessing there's a 'standard' set of techniques to rasterizing vectors tho
not like the whole linux community spends decades trying to figure that shit out and still fails
obviously
glCreateLists, glCompileLists, glDrawLists π
stuff like this is probably bundled as special shaders the driver people wrote
wait is that actually a thing
yes, fixed function stuff from before you were born probably
kinda like the BVH builder shaders that pixel writes
I bet mesa has an implementation of this ext
i looked it up and nothing showed up
π€
it's mentioned here but i can't find docs for it
ah its glNewList(... GL_COMPILE)/glCallList
my DSA brain turned it into glCreate...
its been a while but i have written code using those
23 years
it renders subsequent draw commands?
is that kind of like a vulkan command buffer?
iirc graphite posted a while back how he figured out how to do some ptr offset hacks into VkCommandBuffer on nvidia and write in NV device commands directly
oh ok
fun cursed stuff
oh like he just looked at the VkCommandBuffer internal code and got a pointer back from it? lol
idk how he figured it out
i'm not experienced but i don't there there are too many diverse ways to go about that
yeh but thats not necessary probably
ah you said it, fun cursed stuff, indeed
yeah not at all, its just a neat hack, and a look into how commandbuffers are actually implemented on nv drivers
a list of numbers with other numbers in between π
hehe reminds me of my super naive commandlist implementation https://github.com/deccer/Xacor/blob/master/Xacor.Graphics.Api.D3D11/D3D11CommandList.cs
have you ever made cheats for video games
same thing
haven't done anything more advanced than modifying values in cheat engine
well those values you were modifying were inside a struct of some sort
you figure out some amount of the layout of that struct somehow
VkCommandBuffer is always a real pointer, basically
and so you do ((nv_cmd_buffer*)myVkCb)->things
very simple
but also if graphite learned this by poking at nv driver with a disassembler, the resulting work is potentially not legally shippable
this also won't work in presence of any layers that implement VkCommandBuffer
This... is text! Let's figure out how to draw it.
Starring: BΓ©zier curves and (so many) floating point problems.
Source code: currently in early access on Patreon, but will be freely available on the 25th. If you'd like to support my work (and get early access to new projects) you can do so here:
https://www.patreon.com/SebastianLague
https://k...
pretty cool, I like how he goes from first principles and explores all the differentm ethods
sebastian can explain all the stuff he does so nicely
#wip message
noice
:frog_ancient:
I should redo my path rasterizer at some point, I'm not really sure I agree with lague's takeaways tbh
finding roots of even quadratics is a huge floating point pain
which he does demonstrate in the vids
in fact IIRC my solution to numeric robustness was to switch to iterative root finder
it was simply more robust against higher power coefficients being close to 0 than solving analytically
I was considering tessellating my stuff to linear segments
that stuff is very robust
or maybe I should do something else idk
and also I rather want path/vector textures
instead of just drawing text on-screen
though
that might be a meme use case
because there's no way to author that
and a much better solution for vector-looking memes on models is still SDF/threshold masking
because that's actually authorable in something like blender
NV_path_rendering my beloved π (jk)
wdym no way to author that, inkscape?
in blender
well I guess you could do that but idk it's not the most convenient looking thingy
I don't think you can use SVG as a texture in blender
oh I see what you mean, using svgs as texture maps for models
that'd be cool but you'd probably have to roll your own tooling via a plugin
that tooling would just be converting stuff to bitmap at like load time and it would suck a lot
I tried doing ktx2 images in blender that way once
Finally got around to removing all my optional git submodules and just pulling them if you have tests enabled and don't have them in tree
now the library is officially usable and not clutter
don't love synthetic bold, but it works
need to properly override the advance lookup
synthetic italic isn't nearly as bad
regular is the standard font weight
that's the one that's being scaled
most of the time you only ever see regular and bold
and (non-synthetic) smallcaps
slowly working through the remaining feature list
oh boy I am getting to the fun parts, so far I've been doing smallcaps, superscript, and subscript glyph substitutions by passing the relevant opentype font tags to the harfbuzz shaper, and it just does the substitution lookup from the gsub table if possible, but now I'm looking into how to synthesize them if they're not present and harfbuzz doesn't seem to provide too much for that
I think relying on opentype features is on the whole, the wrong way of going about it, and if I want synthetic substitution glyphs I'd have to do a pre-scan in the step where I generate font runs and have some kind of font instance for handling the substitutions via the hb nominal glyph function override
luckily this is a CSS feature https://developer.mozilla.org/en-US/docs/Web/CSS/font-synthesis so I am thinking the easiest way to figure out how to do this is to just make a debug build of chromium on linux, make a simple html file, and insert a debug break where they call harfbuzz stuff
if at all possible though I want to avoid the scenario where I need to parse the otf tables (or worse, write into them)
What is the input to this whole pipeline
Just font files?
Or is there some database that fonts only affect a subset of
there's font files, font family description files, and the formatting runs data for the target string
font family files describe the relationship between individual faces, which is mainly necessary for changing the weight, (style) italicness, and finding the correct sources for languages and symbols
and the formatting runs just describe the intervals of fonts, colors, italic, underline, stroke, etc. for the string, in my library I provide a generator that parses inline XML from your string for formatting, but your source could be anything, like syntax highlighting
LLVM ERROR: IO failure on output stream: No space left on device
fuck guess I'm not building chromium, only 450 files left
I could probably dig up an HDD and move the chromium dir there
in the absence of actually building chromium, I think I found where they do the smallcaps transformation
haven't traced entirely what goes on but it mimics what I assume I'd have to do, which is to split out smallcaps runs during my run segmentation stage
based on everything I've seen though, I think now is the time to start building custom harfbuzz font callbacks instead of relying on hb-ft.h
one of my original ideas was going to be to emit bitflags on the high bits of glyphs to signal if they need synthesis, I think it's actually ok since hb_codepoint_t is unsigned but truetype fonts max out at 16 bit glyph indices I think, but I am not confident enough in that to risk it, and it seems chromium doesn't, instead what they seem to do is split a string like Hello World to H ello W orld and mark those ello and orld ranges as handled by a font that always does a synthesis
though I'll have to see how to handle non-synthetic caps since from probing the hb buffer callback, it only does a glyph substitution if it actually has the smcp table and hits a possible substitution, so my get nominal glyph would have to look something like
if is OTF and has smcp table and smcp[chr]:
return glyphof(chr)
else
return glyphof(toupper(chr))
however I don't see any evidence of that in chromium, I might have to build it after all
been spending a few days fucking around with building chromium
whatever version I pulled from main and built (~9 hours) seems to crash on startup, with a mile deep stack trace I have no idea how to debug, so I had to get the git history and start changing tags (like an hour to pull and overnight to do the deltas)
switched to the latest tagged version and I'm rebuilding, only 37k files I need to rebuild
all to find out I probably already found all the references I need in the source code search, but I still want to have a working version to probe
I think I've placed too much trust in chromium, I finally did what I should have done to start and put together a simple test page to invoke synthetic smallcaps and subscript/superscript, and I realized that the reason I can't find these features in the source code is that chromium doesn't even support them
but my test page works in firefox, so time to start snooping through firefox instead
@molten galleon hello where do I license your path rasterizer
I haven't bothered looking through actual glyph rasterization methods yet
been too deep in font features
π
for now I just rasterize via freetype to an atlas, and have path rasterization for "later"
though this is my last major feature before I do that
so firefox does pretty much what I expected I'd need to do
for synthetic subscript and superscript it basically completely overrides the font settings and doesn't emit a subs/sups tag, just creates a copy of the base font with scaling and translation applied, there's no mixing of synthetic and whatever scant few letters are maybe provided as subscript/superscript, which I guess works out best overall
for smallcaps it looks like substitution is never done in hb_font_get_nominal_glyph, but as a part of the subfont-splitting stage of run generation, where it writes out a transformed string based on the actual capping rules (smcp, pcap, c2sc), gonna have to check what ICU provides me here since mozilla is full of a lot of handrolled unicode support, and of particular note is that Greek casing is apparently complex enough to warrant a lot of notes
gonna have to look further to see if they check tables/glyphs for availability, but I think due to how CSS font-synthesis tags work, deciding what gets synthesized is mainly user driven
so the chromium SmallCapsIterator just marks out runs of small-cappable chars by using u_hasBinaryProperty(UCHAR_CHANGES_WHEN_UPPERCASED) and tacking on u_getCombiningClass() to make sure combining marks join with their adjacent run
in the firefox impl they seem to use their custom unicode functions to lookup if a char has some kind of alternate capitalization mapping, but the main thing is that they run a toupper transformation using their custom uppercase transformer thing, which I think I'd have to do with ICU's transformers
Case mappings may produce strings of different length than the original.
For example, the German character U+00DF "Γ" small letter sharp s expands when uppercased to the sequence of two characters "SS". This also occurs where there is no precomposed character corresponding to a case mapping, such as with U+0149 "Ε" latin small letter n preceded by apostrophe.
ΗΉ
gonna have to test out how this applies to editable text, if it does at all, since this messes with any kind of logic that'd let me build a remapping between the u_strToUpper result and the original string
so true
unicode more like unicΓΆck
mozilla has a custom implementation of locale-aware str toupper that seems to obey turkish and greek casing rules but might just ignore funky stuff
hmm, so I actually don't know what kind of funky logic they use to actually decide whether to synthesize or not, because it seems to trip on the StraΓe, but if I force enable smallcap synthesis in the code I get der STRASSE just as I expected, and just as u_strToUpper gives me with lang="de", and based on how the selection cursor interacts with it, the 2 S's are treated logically as 1 character
I think I have enough info to start building my own solution, perhaps slightly better than firefox/chromium in these minor ways
I just want it to work well lmao
I like how smallcaps look, and from the frontend of the renderer they're just toupper(c) scaled down a few percent, so how hard could it be?
little did I know...
so one minor thing I failed to consider was that mozilla's TransformString does also fill a bitset with what character indices it performed a special case substitution on, and marks if the string needs a merge, which I can't do without having my own implementation
which means it's NIH time baby
as a bonus, mozilla and ICU's uppercase transformations are both UTF-16 native so rolling my own UTF-8 native one means I can continue to avoid having to convert there and back, which I've successfully avoided so far
finally piped everything through to get synthesis for subscript and superscript (here it's actually seamlessly combining with natural smallcaps, not synthetic yet)
so now I should have everything I need for synthetic smallcaps from the perspective of glyph transformations, just need to add the char substitutions
you could toss a little sneak peak in there as well "libFancyFont" with a superscript "tm" π
that does look neat
notepad flashes the cursor exactly 5 times after you make an input then holds solid
this is important information
Linux terminals do the same thing, it's to save having to redraw the screen since on an old terminal it would have been the only thing moving
Idk if there's any reason for it these days beyond historical
But I like it
honestly shocked that it more or less "just works"
just using a hardcoded substitution for now, but it just treats the 2 glyphs like 1 character, just like it's supposed to
although it does need to be improved a bit since clearly only the 2nd S glyph is being used for the bounds
but now synthetic smallcaps just work, all that's left is to implement the unicode case mapping algorithm
ah yes, parseable raw data where some comments are data and some are comments
both ICU and firefox seem to have some kind of mix of generated lookup tables and handwritten logic
oh boy, I found the icu::Edits class and icu::CaseMap::utf8ToUpper, no need to do any complex casemapping crap myself
I think now I've covered every core font styling feature I wanted to add
and only 2 layout features left for later, vertical text and tab alignment
so now it's time to fix some bugs and start on glyph rendering
what about angled text π
i do have some 45deg text in some of my excelsheeterinos hehe
like globally angled?
iirc there's a param in the GDI font api to set individual character spins
ye globally, as in the whole text in the cell is angled
that's on the UI layer to do, and the glyph renderer to make not look aliased as fugg
since you just get rect positions in the space of your parent textbox
I found what it was called again, escapement and orientation
orientation is the rotation of the baseline (so the angle of the string) and escapement is the rotation of each char
so to draw a string at an angle, you set orientation = escapement, otherwise your characters are slanted
and I assume the reason it does this is probably because your glyph in your atlas has to be rendered as (char = 'A', font = ..., escapement = 45) so that the final bitmap doesn't alias, which is maybe neat to have support for, but I think in a game engine you'd rather handle that with your glyph shaders
with an SDF or vector based glyph
hmm I feel sort of dumb, for a while I thought scaled fonts had some hidden internal scale factor, but I realize I knew how to calculate font metrics the whole time, given a ppem size (4/3 * pt size), it's just upem / ppem
@radiant spear feel free to pester DR about font isms too, he knows his shit most on this server i think π
gonna interrupt the regularly scheduled text content to post about sails so I don't clutter other people's threads further lol
that convo got me thinking about taking another crack at modeling sail force
and generally you have these 2 main forces
plus your air drag which is proportional to v_effectiveWind^2 in the direction of -v_effectiveWind
I realized just now you could probably get a good enough approximation of the lift using standard wing models and using the chord/sweep/twist/whatever other descriptors of standard wings you have to interpolate the models, and just account for the fact that when v_effectiveWind is parallel to t_sail, the force is 0 because the sail goes flat, and it flips if dot(v_effectiveWind, n_sail) < 0
this btw is why you see a peak of power somewhere around a broad reach, since you get the maximal combo of lift and reaction
@left junco if you were further curious about the model
there's some interesting behaviors that arise from it too, since in either case the power is delivered in the direction of the sail's normal, but your forward speed is proportional to dot(d_ship, n_sail), so with an "idealized hull" with no lateral drift, there is an optimal angle to maximize the force delivered in the forward direction, but that can change with the lift force since that's a function of effective wind, which depends on your speed, so there's a feedback loop
and you'd probably have to solve a differential equation to find its optimal stable state
I might have better luck figuring this out 'acquiring' a book on aircraft physics instead of trying to get one of the rare sailing oriented ones, dunno why I never thought to do that
back to text stuff, I've been coping with the fact that I can't seem to tune my MSDF generation by copying the windows menus for my debug vars
and it's a little compressed in the video so it takes away from it, but man my MSDF settings suck
I've been picking through the code in msdf-atlas-gen to match what he does
but I haven't quite got it
and I want to at least get a fair setting of MSDF before I compare it to Qin or direct curve rendering
so I tightened it up a tiny bit and double checked that my math matches msdf-atlas-gen, which it does, but MSDF is still fuzzy at large fonts and ugly at small fonts, I think this is down to how I'm rendering it perhaps
to render it I referenced this article which referenced in turn this github issue on msdfgen
MSDF is an efficient, modern way of drawing fonts at any size and this article looks at how you can implement it in your application!
which gave me the following code:
vec3 msdf = texture(u_texture, v_texCoord).rgb;
float sigDist = median(msdf.r, msdf.g, msdf.b);
float w = fwidth(sigDist);
float opacity = smoothstep(0.5 - w, 0.5 + w, sigDist);
outColor = vec4(v_color.rgb, v_color.a * opacity);
which is notably different from the example given on the msdfgen readme: https://github.com/Chlumsky/msdfgen?tab=readme-ov-file#using-a-multi-channel-distance-field
in that it lacks any kind of notion of my input scale or range
hmmmmmmmm
might need to open it as a generic GP question because I don't really know how antialiased SDF rendering works
hmmm I think I'm getting there except for this annoying artifact
a lot of experimentation and yet I think I had the best solution the whole time
I think small letters just suck with SDF fonts
float w = fwidth(sd);
float opacity = smoothstep(0.5 - w, 0.5 + w, sd);
seems to just work, and is especially good for upsampling
this article with its more primitive version seems to show it even works in 3D
wonder what I can even do for downsampling, I tried generating mipmaps (automatically) and that didn't really give me much
get high density display
if you have chunky pixels you need to do subpixel rendering which can get real headacheful with blending and stuff
but it's just evaluating the sdf at 3 points instead of 1
what filter did you use
I don't think pretending SDF is a signal and low pass filtering it is a good idea
but let me think for a few seconds
ah actually
right
whatever glGenerateTextureMipmaps uses, I thought about it for a second too and since SDFs discretely sample a lienar thing it seemed fine
it didn't even seem necessary actually
though it makes sense overall why small images suck, you completely miss the important samples that describe the edges
potentially
aliasing when letters are small is because you can't represent the edges or whatever
SDF can't help you there
well
you don't need normal images
you can just sprinkle some TAA at it
assuming you're drawing in 3D
also >using gl
cringe!
this is my shitty test program and not my main vulkan renderer so my goal is to just have as small of a renderer footprint as possible
ok
well anyway
in 3D I just wouldn't care and let TAA eat it
but if you do just switch to normal bitmap with full mip chain
well I guess there's also the sucky case of when you're doing a very anisotropic sample
yeah, it seems good enough overall for 3D but I guess the lesson here is you can't have MSDF be a 1 size fits all solution for 2D text and 3D
sdf just kinda sucks tbh
msdf too
msdf actually even worse because the algorithm is something crazy, slow and falls apart in some cases
yeah, I'm hoping the Qin paper that jasper mentioned might at least be a saner formulation of MSDF-like stuff
just rasterize paths as is
no need for this msdf bs
you can also modify your path rasterizer slightly and have it render sdf
that's what I will probably end up doing, but the Qin thing seems interesting to at least try, esp. if it ends up being better than MSDF
this way you can also generate bitmap representation at runtime + mips
since everyone seems to shill MSDF
and I don't think there's a publicly available implementation of it anywhere

I'll take a look once it downloads LMAO
the pdf I mean
but it's probably a nexttonothingburger
ah nvm found the pdf that opens quickly actually
just paste the link into scihub
ok I think I understand what it's doing
IIUC there's a list of lines per cell in the grid and they compute min or max { distances to all lines in the list }
or segments
something along these lines
hmm yea it's segments
section 5 goes in detail
yeah this is pretty weird and itneresting, this is the first time I actually inspected it closely
the way it was initiallly described to me was as some kind of much earlier proto-MSDF but with a saner distance function
the data representation seems a little funky for modern use though with the voronoi diagram and the indirection
why
there's some kind of per-texel indirection and linear search
so each glyph seems to point to an index in the line segment table and then it also looks at the next 4 neighbors
been rolling the Qin stuff around in my head for a bit and I'm starting to appreciate the elegance of it
FUCK
I have 2 projects, 1 that's just a tester for my lib and 1 that's my actual engine
the tester can link the library and run code, my real engine tells me it can't find libicudt
ld.lld: error: undefined symbol: icudt73_dat
>>> referenced by libicuuc.a(udata.cpp.obj):(.refptr.icudt73_dat)
and I even set up my tester to have a sub-library like the LibEngine in my main repo
this literally makes 0 sense
the raw compile commands even reference the same libs in the same order
are you on windows still?
yep
delete build/ (or whatever $(IntermediateDir) in vs perhaps?
with cmake?
I copied the libicudt.dll.a from my working project to the non-working project
yep
not even the actual dll, just the dll.a file
and that fixed the build
so for some reason, just my engine project is building a bad dll.a file
dll.a should just be the forward for the dll
very consistently
hmm do you have a rogue setting somewhere in your cmakelists perhaps
which switches BUILD_SHARED or something
ok I came back, made a new cmake build directory and config'd it, and that directory doesn't have this issue
and I did try doing ninja clean in my old one and rebuilding completely
so I guess my old build dir had some magic state in the cmake cache that told it to fuck my shit up
average saturday morning C++ session
I just made a build2 in case I had some kind of untracked junk in my build dir I wanted to copy over
which is essentially the same as deleting build/
ah
I don't think there's any particular setting that would've caused this, I don't have BUILD_SHARED_LIBS enabled in either, and my ICU setup is handrolled to build everything static except icudt.dll
so, asset manager = finally integrate LibRichText into my engine = set up a fresh text atlas using my new UploadPool = I don't have the desired subregion upload behavior
I think I understand why devsh has a synchronous mode for his upload pool
however there is no way I'm waiting on a semaphore to upload glyph subregions
it only kicks in when you run out of staging space
we don't actually wait per subregion/glyph
timeline semaphores are truly bae here
yeah that's the conclusion I ended up coming to
I added a "semi-synchronous" flush() operation to my upload pool where it will wait on intermediate submissions if it needs to wait to stage more upload jobs, but it won't wait on the last submission
so if there's only 1 submission required it doesn't wait at all
btw, if I issue my upload submissions w/ a final transition barrier and it's issued to the graphics queue, I don't have to wait on the semaphore in my main render submission because it's also on the same queue, right?
yes
epic
briefly derailing my own thread to come to what is perhaps the most obvious realization
displaying an ECS style level in editor is as simple as displaying the entities in their transform hierarchy in the explorer/tree view, and displaying the components in the properties pane, with each of the subheaders (Data, Selection, State, Text in the examples) just being representatives of the components
also >OpenTypeFeatures
I need to cook...
you can even use "Category" on the left side to "group" properties into categories or "components"
yeah, that's how it seems to be used here
but this is a pretty convenient way to hijack this scheme to display ECS instead of a polymorphic node tree
I came to this realization somewhat recently as well. Before, I had the component editor inline with the transform hierarchy and it was just a mess
It also helped to look at existing things that do something similar (blender, ue)
I didn't know blender was ECS-like in any way
it has pretty strongly typed nodes afaict
that's for my python scripts to do
behold the IDL for my UI classes
A phrase which my phone suggests to me in auto complete I might add lmao
By "components" I just mean editable properties
Idk how blender is structured internally
oh, yeah the explorer+properties dual-pane thing is pretty common once you start noticing it
I think it originates in VS actually, and it kinda makes sense that application developers would be most inspired by the tool they're using to make the app
"hmmm, I'm in the forms editor, but how should I make my application look?"
wpf makes this also super sexy
you design the control you want for a specific type, add it as a resource, and have a DataTemplateSelector select it based on the current object'type
its just a big switch block
Blender is more ecs base internally as the data model is mostly C and not cpp. The python api is a generated wrapper on top of the c data model.
that's pretty interesting, also it does look like blender is keen on visually displaying "has a" relationships in the tree view
I hadn't really thought too deeply about it
fuck I found a fun case θΆ³οΌ©οΌ«.R
the I and K are full width (so funky shift-jis compatability stuff), but they register as latin so it selects the latin face instead of the CJK face
I think I might just fix this by having my CJK face be the lowest priority general fallback for my latin face
ok that did it
that explains why CJK people when writing latin text always have ugly looking ms sans serif stretched, wide text
adding engine features vs. scrolling through famfamfam/fatcow icons to find the perfect ones
in my recent bikeshed rabbithole I started NIHing an LSP client to get syntax highlighting, only to find out LSPs don't feed you basic token info
so now I have an LSP client and need to roll some kind of tokenizer and syntax highlighting provider on top of that
and I think I was supposed to be working on asset libraries or something...
What's LSP
langauge service/server protocol
the thing which gives you syntax highlighting or intellisense in your vim or else where
Ah
it doesnt' give you syntax highlighting, but it does give you a lot, like go to declaration/definition, code lens, semantic tokens, etc.
https://github.com/Microsoft/language-server-protocol/issues/682
yeah, it could technically be a custom command on the server
but what I found in practice was that the document tokens request doesn't actually give you tokens with a fine enough grain on the range in text for highlighting
it seems like both document tokens and semantic tokens are for augmenting syntax highlighting in ways that simple regex passes from textmate grammars wouldn't
and the recommendations all basically point to flagging tokens with a preliminary highlighting pass then augmenting with semantic tokens
probably makes more sense to just regex the text you see right now in that editor view, and color that accordingly, rather than trying to color the whole 10mb worth of your current.cpp π
yeah, lsps support sending partial text and partial requests afaik
the asset system of yours will greatly benefit from it either way π
i will allow the side quest
@balmy gulch
ah it was this 
all this text talk reminds me I need to finish my articles

It would be amazing
. You are literally working on stuff that very very few people work on and most take it for granted. Information is scarce and often times mixed (just looking around, the name "font" is used in so many different context for so many different people).
https://github.com/forenoonwatch/forenoonwatch.github.io/blob/master/text/My Explorations in Text - Part 1.md
here is my rough draft of part1, too lazy to figure out hugo rn
tried to make it as short and direct as possible but have enough narrative structure that it wasn't just a dump of "WATCH OUT FOR THIS"
I'm kinda worried that it's perfectly clear to me but way too terse for anyone to actually find helpful lol
that screenshot of your ui thing which looks like a winforms knockoff, needs to go into a neat library eventually
so that we can make use of it too : >
I was checking in your library how you handle text line breaks, but I wasn't able to find it.
Any tips/pointer to make correct line breaks ?
I piggyback off of SheenBidi here, this is in my unpublished part2
wait, do you mean soft or hard linebreaks
I didn't even write that part in my blog yet, actually
Well, I was looking for both. I will start by looking how sheenbidi works and what it does.
I'm looking at the whole concept here, soft/hard line break as well as hyphenation.
basically everything is in build_layout_info_utf8.cpp
I rely on SheenBidi for hard breaks, (aka character-induced linebreaks from CR, LF, LSEP, PSEP), those are really easy to iterate manually because you just search for those 4 characters
but SheenBidi just does it, so I let it happen
for soft linebreaks, it's basically how you'd expect it to happen conceptually - you do all your shaping first, then you try to find the max run of characters such that width(run) <= textAreaWidth
but the trick is you use an icu::LineBreakIterator to find safepoints to break the line
so it's max(safe linebreak point) s.t. width <= textAreaWidth
Much much thanks π
What is s.t ?
max(safe linebreak point) s.t. width <= textAreaWidth
such that lol
Ah lol
So it could be done this way :
- Shape the whole text through harfbuzz
- Find the closest linebreak to textWidth thanks to ICU
- Discard everything beyond into a new buffer
- Re-Shape the line
- Validate or re-iterate from 2
- Go down a line through lineHeight found in the font metrics by scale
- Restart from 2 for the rest of the text
For soft line breaks.
Hard lines breaks can be a pre-pass to that
I think I get the gist of it. I will look into your build_layout_info_utf8.cpp file π 
you don't reshape the line, the shaping data is always valid, you literally just cut it
that's the contract that the icu::LineBreakIterator helps you fulfill - it tells you based on unicode data what is visually safe to cut, so you never end up making a slice in a point that'd require reshaping
Oh wow
Just wanted to say a huge thanks for the pointers. π
I finally got the text rendering at an acceptable level and I would not have succeed without your help.
briefly back on the horse
maybe I'm the last to find this out, but apparently in text boxes the tab width is entirely based on visual metrics
so a 4 space tab aligns to sizeof(' ') not the 4th printable character position
and CSS gives you two options for it, either supplying the integer space count or specifying the size outright
notepad (or whatever text box it derives from) seems to have some kind of particular curve to the tab width in px as a function of the font size, I can't tell where it derives from, but maybe I measured wrong with sharex
it seems generally that it's ~6-6.25 pixels per font pt, so a 12 pt font has a 72px tab, an 8pt font has a 50px tab, a 72pt font has a 450px tab, with various values in between
I should probably graph it
it's also approximately, but not quite, 19 spaces
so essentially the tab step is applied after shaping, during the point where you sum the advances, you make the advance of any encountered \t chars equal to alignas(tabWidth) cursorX, however the slightly complicating part is that if you have to softbreak the line, I think you have to readjust the size of the tabs
Everytime I see a post in this thread now, I expect greatness and good findings π
I believe if you have a soft break, in some cases that just considered part of the end of the line 1.
In fact it seems... harder to define ? I just tested in Notes. After a certain amount of tabs, it is displayed in line 2.
It seems to be if we are lower than maxWidth, allow one tab at maximum in the line (not visibly displayed) even if it gets bigger than maxWidth. If we are higher or equal maxWidth, 1 tab gets to the next line and is visibly displayed. This would currently break the display in my engine. 
the status of softbreaks as lines is its own thing
but in this context because tabs are a purely visual operation, softbreaks affect it because the tab length affects the length of each line
also 99% sure tabs will break your display by default because by default they resolve to a tofu
well not "break" but look ugly
got myself worried for a second, but it looks like L1.1 enforces the fact that a bidi shuffle won't affect having to recalculate the dynamic width of tabs
You could probably make a living off of this library lol
I'm going to make the much smarter decision and use this technology solely for textboxes in a game engine that will never ship a game
Based
also idk why I didn't think to do it before but I've also got the CSS spec open
Yeah the best implementation AFAIK is still CSS spec (so far that I found)
uh oh, codebase is calling in my technical debt
I can't properly accumulate line-space offsets to do tabs because of my shitty combination of bases I'm working in, I basically have some data buffers in whole-string space, paragraph space, logical run space, and some redundant representations of glyph widths
mainly because I started from ICU layoutex and I wanted to make sure I didn't fuck up the conversion to utf-8
but now I think it's refactor and rewrite time
I like how the CSS spec complexity is literally split up in levels
I wonder where the spec for text editing is, maybe it's in the html spec
https://www.w3.org/2021/06/web-editing-wg-charter.html
this might be something
I think I just don't know enough about how you define a text box in HTML+CSS to know who'd be responsible for specifying it, I found the html spec for input type="text" but it doesn't specify the actual input controls, and the CSS specs seem to only cover display
I guess the actual editing is left as an exercise to the implementer because individual stuff like the cursor is roughly spec'd but I guess it covers for onscreen vs. physical keyboard
or other funky input methods
hmm I really left a big stinky for myself, I didn't document the directionality any of my data is expected to be in so now I have to figure out what steps need what data in what order
my final layout info is strictly in (post-UBA) visual order, the data from shaping (which HB gives you in visual order in logical-run-space) I homogenize into logical order (following the order of the string) and then potentially reverse again into visual order based on the bidi level
the position data (float) stays forever in visual order without reversal but I currently generate a list of widths (in 26.6 fixed) in visual order that I want to get rid of but that might mean doing something funky like storing my secondary axis in visual order and passing it through to the end, but storing my primary axis in 26.6 fixed in logical order so I can do line breaks and then flipping it during the final run write
either way I need to document this mess...
did you know there's an hb-cplusplus.hh harfbuzz header with a unique_ptr for the buffers?
I've been obsessing over how to minimize the number of copies/reversals I need to do in total but maintain my logic, yet it seems like firefox calls hb_buffer_reverse to throw things into logical order... and I can't seem to tell what chromium does because they definitely don't call that, and I don't see them doing much directional branching
That is interesting!
I also need to set up some benchmarks for conditional reverse iteration and binary search
I think I remember from that andrei alexandrescu video that binary search is slower up to like 4 or 5 digit arrays due to the power of cache
could be completely missing the mark with that estimate though
hmm, this might be a perf neutral thing but I think I'm gonna give up on this logic, despite having gotten it to work partially successfully
I have decided to say fuck people who use RTL languages, you will pay the price of an O(n) reversal of 32 bytes/glyph
and by perf neutral I mean "so far it's been inconclusive in the noise of my profiling" which may or may not have shit methodology
I don't know if there is a lorem ipsum for RTL but that would be a good test. Like you, I expect that to be negligible(until it isn't). You are doing one search that is O(n) right ? That should be fine.
I just generate random mixture strings at roughly 50/50 distribution but realistically RTL langs are wayyy less common
the problem is basically I have 2 choices:
- convert everything to LTR order, perform logic on it, convert it back when necessary
- keep everything in native order but have a bunch of branching logic that looks like
if (RTL) { ... } else { ... }but only have to do reversals during the really rare case where logicalRunDir != visualRunDir
.mΟ
ΙΏodΙl ΙΖ¨Ι bi minΙ Ιillom ΙnΟ
ΙΏΙΖ¨Ιb ΙiΙiΚΚo iΟ
p ΙqlΟ
Ι ni ΙnΟ
Ζ¨ ,ΙnΙbioΙΏq non ΙΙΙΙbiqΟ
Ι ΙΙΙΙΙΙΙo ΙniΖ¨ ΙΏΟ
ΙΙqΙΙxΖ .ΙΏΟ
ΙΙiΙΏΙq ΙllΟ
n ΙΙiΟ±Ο
Κ Ο
Ι ΙΙΏolob mΟ
lliΙ ΙΖ¨Ζ¨Ι ΙilΙv ΙΙΙΙqΟ
lov ni ΙiΙΏΙbnΙΚΙΙΏqΙΙΏ ni ΙΏolob ΙΙΏΟ
ΙΏi ΙΙΟ
Ι Ζ¨iΟ
α§ .ΙΙΟ
pΙΖ¨noΙ obommoΙ ΙΙ xΙ qiΟ
pilΙ ΙΟ
iΖ¨in Ζ¨iΙΏodΙl oΙmΙllΟ
noiΙΙΙiΙΙΏΙxΙ bΟ
ΙΏΙΖ¨on Ζ¨iΟ
p ,mΙinΙv minim bΙ minΙ ΙU .ΙΟ
pilΙ ΙnΟ±Ιm ΙΙΏolob ΙΙ ΙΙΏodΙl ΙΟ
ΙnΟ
bibiΙni ΙΏoqmΙΙ bomΖ¨Ο
iΙ ob bΙΖ¨ ,ΙilΙ Ο±niΙΖ¨iqibΙ ΙΏΟ
ΙΙΙΙΙΖ¨noΙ ,ΙΙmΙ ΙiΖ¨ ΙΏolob mΟ
Ζ¨qi mΙΙΏoβ
``` there you go @outer dune
What.The.Hell
what even is that language
lorem upsom mirrored
Is that parsed RTL ? I 99% sure it's LTR mirrored
it looks like all LTR glyphs to me
the only RTL languages in use today are arabic and hebrew
this was just me do a little trolling, found a mirror text generator and pasted in lorem ipsum
then a bunch of ancient/obscure languages that are just technically listed
yeah there are only like 8 or so common languages which are RTL by default
I would go option 1. Make everything LTR and convert back if necessary. The other option will introduce a lot of conditionnal branching I believe.
Which might be a pain for maintenance later on
yeah I think I'm gonna stick with option 1 since that makes the code simplest
option 2 gives me literally double the code in certain places despite being technically slightly better
plus this is microscopic in terms of execution time, most of it comes from allocation (that I can improve)
and the rest comes from hb shaping, but that cost is basically immutable
I think the best data model for positions might just be to only store widths, in logical order
- the offset0 for each logical run
though the more I test the less I'm sure I need to care about that
just the widths, in aggregate
ah
basically, (and the more I investigate this the less I'm sure about it)
the concept of separating offset and advance is to homogenize layout math over vertical and horizontal text
and the aggregate offsetX + cursorX (where cursorX is the sum of all advances 0..n-1) is the only critical position for each glyph
you could pick the one right now which your belly tells to use
also the one which might be more readable (code/shader wise)
I'm picking widths because it makes tabs the easiest to implement
this is internal representation while I'm building the text, the final object stores positions (since that makes rendering the fastest/simplest)
because when you're dealing with tabs you have to basically apply realWidth = isTab ? alignUpToNext(lineWidthSoFar, tabWidth) : width
yeah
but the more I think about it, the LayoutEx code I referenced originally already takes the assumption that offset0 = 0 because otherwise accumulating widths wouldn't even get you the line width
and also a line split down the middle assumes that the offset/advance of the glyph - 1 doesn't matter
I think I'm gonna have to store them heterogenously based on what's the primary and secondary axis
because basically these widths are what line breaking (and your cursor position) care about
and the green arrows are the offset on your secondary axis, which I guess I could also store as widths but I definitely would have to store o0 separately in that case and it would effectively just be difference encoded
ok epic, having a hybrid width/position data model was definitely the way to go, actually simplified my code a bit
and now I have no redundant position/width representations in the data, at my goal of 16 bytes per glyph
which is like the theoretical minimum I can achieve uncompressed/unquantized, and doesn't count the memory required for stuff that doesn't scale directly with glyph/character count
and that raises some pretty interesting stuff to think about when you consider absolutely huge blocks of text, since they require at least 16x the memory to display (and I am being more efficient with my redundant data than chromium or firefox for sure)
so rendering absolutely fuckhuge text files (obviously) requires some pretty aggressive culling, and in turn some pretty efficient scans/heuristics to determine what you wanna display in the viewport
and I wonder what you'd do there with soft line breaks enabled, since just scanning the string for newlines up front wouldn't necessarily give you accurate positioning for what your viewport is looking at
isnt it just counting
you always know how many characters fit in your viewport
and the viewport shows a span of text
not necessarily because they all have different glyph sizes
and some aren't even visible
imagine the world's most adversarial 10GB logfile where the first GB is just several billion ZWNJ
so line 0 starts over 1gb into the file
ο·½
π«
and imagine some of these
notepad++ knows the right thing to do though
ah i forgot about those things
hmm
you do seek within the bytestream of your text still
when you scroll through a file for example
then you read the span, and inteprete it somehow, and afterwards you know all the characters which are supposed to be visible and whichnt
hmm this isnt trivial π
the problem there is that this is not just nontrivial, but requires A) dealing with variable sized utf8 encoded codepoints and B) is essentially data-driven since you need the unicode metadata on a codepoint's properties
the way I'd do it is just to populate it lazily, first by scanning the entire string for hard newlines (with SIMD + multithreading you could probably rip through several GB/second) and use that as the initial assumption, then actually try laying out whatever's currently supposed to be visible in your viewport
then use the softbreaks from that as an "adjustment"
but this also makes me wonder if I can use simdjson/simdutf8's code tricks + parsing UnicodeData.txt directly to do some kind of codegen for the theoretical most optimal text prescan possible, based on the actual unicode class queries I need
but that's a project for the far future
man, a true brainworm hole
yeah Big Textβ’οΈ is its own separate rabbithole on top of regular small string rendering
I'm gonna have to make a mesopotamian faction whose soldiers talk in cuneiform
the dispute over low quality copper has gone violent
replaced building vectors and iterating them with iterators for all of my logical run building, benchmark result: inconclusive
however I know in my heart that like 8 less std::vectors get allocated per string, and that is enough to bring me joy
need to probably run profiling with MSVC so I can get the cool heatmap on the code and see what it tells me beyond the obvious hb_shape and "why are you allocating a hb_buffer and icu::LineBreakIterator per string shaping?"
I see why no one bothers to make nontrivial optimizations to anything outside of shaping now, lmao
something minorly stupid and annoying though is that calling icu::Locale::getDefault() too frequently has a tangible cost
though other than that, the main thing that registers outside of hb_shape (as like 3% of total time lol) is the allocation costs of SheenBidi
actually this was a bad test, since the profiling grouped all sizes of string (up to 1MB) into the same calculation
for my smallest test (string size = 64) hb_shape is 'only' 71.81% of the execution time
but as expected this only magnifies allocation snafus, e.g. I don't do any reserve calls on my final layout struct, but I do for my temp build vectors, and as such there's a huge cost of allocation from appending positions and glyphs
the allocation for the icu::LineBreakIterator is also significantly more expensive than the SBParagraph and hb_buffer, but that's fine because I can factor it out, and seemingly using it is mostly fine
and interestingly some attrition from using UText in computing subfonts instead of a dedicated UTF-8/16 iterator
profiling even shorter strings basically tells me what I expected - initialization and allocation dominates even harder and small buffer optimization (and finally not being lazy and factoring out the reusable stuff) will get me basically the only meaningful wins I can conceivably get
however... will I eventually try to create some kind of simd version of sheenbidi just so I can say I beat it in benchmarks? perhaps
and hb_shape in turn is basically bottlenecked by however quickly I can read glyphs
so, harfbuzz is itself pretty impressive in its optimization
I'm happy to hear that I found the same issue. The bottleneck is also hb_shape on my end.
And I can't do much in there (or I don't want to look inside)
The other one I found was UText and LineBreakIterator that I heavily re-use when possible.
yeah, though luckily it's the LineBreakIterator allocation that's the majority of it, not even setText or preceding
finding sub-fonts is the only place I use a UText where I don't need to, just because I wanted to unify my code for handling UTF-16 for doing regression testing against ICU
but now that I'm confident I don't need to use ICU's stuff anymore I can just drop that
but UText is the best interface you have for the break iterators, and it's fine enough there
with the icu::LineBreakIterator creation factored out, hb_shape is about 80% of an 8 char string, with a scant few % (5 or less) in allocations for appending final data, get_sub_font, and SBAlgorithmCreateParagraph
still like 13% speedup just from getting rid of it lol
ack
yet another (several) annoyances I didn't account for
- SBParagraphDefaultLTR/RTL doesn't seem to correctly swap the run dir (I assume because these characters are strongly LTR), so to specify what I think are level overrides in ICU I need to explicitly specify base paragraph levels of 0/1 (equivalent to CSS
direction: ltr | rtl) - harfbuzz is actually smarter than I thought for my own good, and actually resolves glyph mirroring for me, however I think this ends up fine overall because the selection I posted above resolves to
llo>, which is correct
#1 is a tiny bit annoying because the UBA spec has no definition for what a "default" level is supposed to mean...
ah
big data model refactor is done, L4 character mirroring is confirmed to be a non-issue, directionality overrides work, and tabs now work, also fixed an issue with string-terminating newlines
though what I'm seeing here is that I need to revisit subfont selection because it's able to use Twemoji for the numbers and that gets on my nerves
can you render composite eomjis too?
like a female black vampire
i think thats vampire+darkskintone color + zwj + female sign
ah yeah
where is the female coloured vampire? π
glad I added clipboard support
data point magic
new knowledge has decided to reveal itself to me:
https://simoncozens.github.io/a-duffers-guide-to-fontconfig-and-harfbuzz/
https://gist.github.com/CrendKing/c162f5a16507d2163d58ee0cf542e695
also the spec for how CSS picks fonts https://www.w3.org/TR/css-fonts-4/#font-matching-algorithm
but this is more of the macro scale, front-end of font picking where you match a named family to some file or files, not necessarily to figure out what file can render a substring
somehow fonts feel as complex as operating systems, and browsers to me
AAA engines are nothing compared to that either hehe
making text "just work" is a huge part of browser magic
and OS magic, for that matter
this is epic, but how complex of text can you render without dying?
idk, hypothetically anything
my coverage is pretty good, I can do emoji, I have inline markup for underlines, italic, bold, smallcaps, subscript, superscript (with synthetic glyph gen for all those if they're not available in any font files)
emoji with any joiners, zalgo text
I render tabs now
the single biggest thing I'm missing atm is vertical text, and I've been working up to it in my latest refactor
nice
how do emojis work btw
are they like regular glyphs with color or something else
because if they are regular glyphs then they must have a whole lot of codepoints innit?
no I mean
the first 2 are U+1xxxx which means they're 4 bytes I think
points in the glyph
the actual unicode scalars you use to bezier the thingy
or is that not how this works
(my understanding is extremely limited)
oh yeah probably, atm I have freetype rasterize them precomposed because I've been too lazy to deal with too many actual rendering techniques
since I've been focusing mostly on layout
but you can also have freetype give you them as mono-color layers
smh can't believe you didn't make harfbuzz and freetype from scratch
(really you get the outline 1 layer at a time + a color and you have to rasterize it yourself either through freetype or some other way)
the thing I always say is that it's amazing how much you still have to do even if you grab all the libs you need from the start
I literally have 4 giant dependencies and there's a ton I need to do on top of that
freetype+harfbuzz+icu+sheenbidi (which I technically wouldn't need if I wasn't stubborn about making everything work natively in utf-8 without utf-16 conversion)
it gets a shade more annoying because I have to make sure the data I generate from layout is optimal and reconciles with being able to use it for text editing
stuff like, e.g. pango is pretty full-featured but read only afaik
just shits finalized pixels to a render target
a simple but long awaited change
Oh god thatβs horrible to implement
yeah, you have to split the string into first char and the rest, use the base metrics of the text (line height and ascender) to figure out the glyph metrics of the first char, shape the first char, then shape the rest of the text in 2 pieces, determining the split where it falls off the first block
luckily this is way, way out of scope
also hyphenated word breaking
straight up can't find what firefox uses for vertical line height (or I guess line width, as it'd be in this case), I suspect they might just use the horizontal metrics
meanwhile
in chromium:
But so so sexy
So it looks like what you're supposed to do, roughly speaking, is hit the TrueType vhea table for vertical ascent/descent https://learn.microsoft.com/en-us/typography/opentype/otspec180/vhea
and failing that, CJK fonts may provide a vertical advance via the OS/2 typographic ascender/descender, which itself is a little rabbit hole because it briefly covers the OTF concept of the "ideographic em-box", which luckily I can just skip the fancy stuff for because its secondary metrics are just em-width
https://learn.microsoft.com/en-us/typography/opentype/spec/recom#tad
https://learn.microsoft.com/en-us/typography/opentype/spec/baselinetags#ideoembox
whoever wrote the FreeType docs decided to copy paste this line for TT_HoriHeader and TT_VertHeader which sent me on a bit of a wild goose chase because this is actually true for horizontal metrics, and in CJK fonts the typographic ascender/descender affects the vertical ascent metrics, just not the secondary axis
What's the fallback on fonts that do not support the field ?
Can't you just assume "easily" assume a sized glyph which is offsetted by it's normal height * size increment ?
Tho, it will affect the following lines x offset... UGH. I don't even know what it gives in Kanji for Top to Bottom lines.
I'm thinking either you use the line height as the line width, or just the em-width
what I'm going to do is find a font that does have vertical metrics and see which value is generally closer
it breaks some renderers as expected
is that a browser
Yes, but it's a little bit of a hack, I'm using <span> instead of css ::first-letter
Works great for Latin caps
Won't work for those with y-bearing though
I think this is technically something I handle better than CSS can
at least if you use spans, because I think CSS can't use the spans as context for a line (but I may be wrong)
but the ascender and descender of my lines are max(ascender) and min(descender), so the large glyphs stay on the same baseline, at least
they look dumb, but don't collide
I'm checking but :initial-letter is not fully supported on all browser. ::first-letter is though.
Yeah, CSS prioritise line-space to be the same. I have the same result as you (bigger gaps between baseline) in my engine
At least for <span>, it seems
I'm getting convinced by that, but I have no usage in-game except in the context of an old letter
MS word also does this
Yeah, I think it's CSS being dumb (Firefox implementation) π€
you should try it in a chromium based browser and see what's different
Chrome 
Somebody stole someone else implementation
"Lemme copy your code, I will change it I swear"
here's a snippet from the firefox source
everyone is copying everyone else because there's no unified body of work for how text should be rendered
with the exceptions of tidbits explicitly in the unicode or CSS spec
which isn't much for either
I'm tempted to eventually at least add a path for people to be able to do this manually if they really try, maybe by adding some polymorphism to the layout builder
because mostly the biggest change is customizing the linebreak logic to break at a different width for the first n lines
This is amazingly funny, so now we know Pango, Firefox and Chrome are wrong IMO.
it's hard to say pango is wrong because idk what they did and didn't copy, I should honestly clone pango and snoop in it as well
I think a lot of linux distros use pango as the OS text display
at the very least, I believe gtk uses it and thus everything that uses gtk probably does too
Yes they do use pango in most (all?) Linux distro. I guess not a lot of people uses that part of it (if it's wrong in pango).
It's quite rare to see such usage anyway π€
pango calls writing mode gravity lol
so it looks like generally non-CJK fonts don't have vhea, but the vertical fallback for non-CJK fonts is the standard ascender/descender, not the emwidth or typographic one
I should actually measure what the browsers choose for CJK line widths, because based on this (the metrics from Noto Sans CJK jp) they spec vertical line height as = to emwidth
yet another minor annoyance about vertical text is that 0 on the Y causes the offset of the glyph to end up above the line, so you still need the normal ascent either way
am i stupid or is discord too stupid to render arabeeque?
i wanted this
it fails to render Ψ§ΩΩΨΩΩΩΩ
uh
wait
gm habibi
π¬π² habibi π
discord nick length is fixed : ( it cut off
it cut "the ruler" off hehe
Can you type something again
something again
ahahsdjalkjalksjdlkjaskljlkjdflksdjfklwjerlkwjlksdjfkljasdkljlkwjerkljsldkfjslkdfjlkjwerkljlksdjflkjsdfkjlkasjdlkjsdflksjdflkjsdflkjslkdjflksjdflkjslkdjflksjdflksjdflkjsldkfjlskjdf
π
mixed rtl and ltr is funny
this is what I see
and with the larger fontsize
It do
interesting
i also like how United Arab Emirates uses those more straight looking characters for their stuff
like some sort of monospace π
"Allah"
almost looks like some simple power tower in arabic heh
latin alphabet is so boring
do you guys understand the word sprachbund?
thats english apparently
for some reason linguistics attracts programmers
: )
It's because we are just typing and talking about text all day 
C++ is just another lingu to istic
ok time to wake this thread up for some new features
so I was thinking how in order to append the results of layouts together (e.g. if you wanted to insert another GUI in the middle of your text, like I do), you would need to provide an initial offset to the first line to join up where the last text block ended... so I thought, "why not allow the params to have an offset and count so you can implement drop caps", and then I thought "wow it'd be crazy and overengineered if I just let you pass a callback to dynamically resize the lines"
so that's what I did
so now drop-caps are totally possible
and then following that up, I added text truncation (i.e. stop laying out if you go out of bounds), which I needed for a different feature, but the combo of text truncation and line resizing means that you can actually relayout text around obstacles very dynamically
(imagine I actually put the big E here )
that little implementation is a bit buggy because I half assed it for a proof of concept
but the important part is that I can
insane stuff
We need to think up some more exotic typesetting behavior that games might benefit from
when I was putting this together I could only come up with 2 things it reminded me of: greeting cards (where you have text conforming around some clipart) and newspapers with images slapped between the text
but my actual usecase is driven primarily by 2 things, 1 is embedding
images in
chat without needing a whole-ass typeface for them, and the other is things like this
where you have dynamic controls in your text
one more feature in this series: dummy blocks
I considered this a while ago, but basically I reserved a special font family that's not the invalid family, which gets special-cased to get skipped in layout and rendering, and I can provide an extra width and height for
its function sort of overlaps with setting the line offsets/limits/truncation, but it's much more specifically the sticker/custom emoji feature, since it can behave like any other character with a little bit of bridging to the editor features
and of course the rects are drawn in afterwards, and can be anything (more UI controls, images, etc)
it also behaves a little bit more like a microsoft word image insertion into text, rather than a tool for morphing text around a first-class asset
So I tried out RmlUi for the first time just now, I have it scoped out for a secret project

