#Rich Text - Uncreatively Named Text Handling Prototype Tool

1907 messages Β· Page 2 of 2 (latest)

old harbor
#

rip

olive sapphire
#

These are the only ones I have:

molten galleon
#

I just keep 2 builds, debug with the debug and run benchmarks/profiling with release, luckily I don't use ue so I don't have any cases where debug is literally unusable

olive sapphire
#

Oh! I found it:

#

Wow, changing build types is so weird/complicated in VS.

old harbor
#

it should be a drop down list in the toolbar somewhere already, out of the box

molten galleon
#

random thing

#

someone mentioned obsidian here a while ago, so I tried it out and it's honestly pretty great

#

it's a solid markdown editor for starters but it also lets me keep a random ass pile of notes in pretty much the same way you'd want from talking to yourself (or others) on discord

#

would definitely recommend

#

it has ```language``` markdown support natively, instantly great

old harbor
#

yeah obsidian is great

nocturne hill
#

(i.e. is this necessary for some other project or are these tasks just self contained so to say)

molten galleon
#

no they're all completely unrelated

#

they're just the main projects I find fun atm, they do sort of feed off each other in abstract ways

#

in the sense that my raytracer uses my vulkan abstraction and I plan to use this as my general purpose text handling lib in the future

molten galleon
#

thinking up some ways to generate test data for benchmarking

left junco
#

Is notepad really able to render all that

#

That's honestly impressive

#

I never considered it as anything but an ASCII editor lol

molten galleon
#

what I ended up doing was using a normal distribution to generate word lengths from randomly picked unicode blocks (out of a subset of unicode blocks I know I have font info for), probably the best I can do without using a dictionary, which is too much of a pain in the ass

#

yeah it clearly has fallback font support too, of some extent

#

but I think hardcoded or using some windows registry default, I have my font set to noto sans but it wants to fallback to Segoe UI or its list of fallbacks when it wants

#

also this is laggy as fuck

#

this is 1MiB (1024 * 1024) bytes of UTF8 text

#

I just tried inserting some emoji (which, note to self, I need to add some block ranges for so I can test hitting that in my font cache too)

#

and it took a few seconds

#

n++ also struggles with it

#

resizing notepad is so slow too lol

molten galleon
#

generated randomized font size, weight, and style runs as well, and a debug html file dump

old harbor
#

do you also support that wonky ass unicode overflow thingymajigg where the characters go crazy?

molten galleon
#

zalgo text? yeah

old harbor
#

yep πŸ˜„

molten galleon
#

the thread image is a pic of zalgo text rendered with my thing

old harbor
#

ah

#

indeed πŸ˜„ i was looking at the thread pics earlier

olive sapphire
molten galleon
#

windows text rendering is not bad at all though

#

cleartype is like the best thing around

old harbor
#

i find windows' text rendering superior

olive sapphire
#

and Notepad does not do texture atlases or any kind of basic optimization.

old harbor
#

how is that relevant?

molten galleon
#

with a lot of text on the screen you can see the double buffering from when it tries to redraw on the WM_PAINT message

olive sapphire
#

@molten galleon BTW, how do you handle texture atlases? Is it per font?

#

Just wondering about different methods

molten galleon
#

nope lazily allocated like I mentioned

#

which is the method roblox ships to all platforms with

olive sapphire
#

What do you mean by that? I'm not too familiar with graphics terminology yet.

molten galleon
#

I start with an empty atlas, I request a glyph from my atlas, if it doesn't have it then it adds it

#

lazily [x] (just as general programming terminology) just means "only do it when someone asks for the result of it"

olive sapphire
molten galleon
#

I'm realizing that while memory mapping isn't the optimal thing for loading a file for a single read, like for a glTF or an image, it might be a lot better than trying to handroll some kind of LRU cache for font file memory, for a whole host of reasons:

  • an LRU cache like MuPDF's works on variably sized data and just holds whole handles to file memory, its eviction scheme isn't actually true LRU but it tries to scavenge for objects of closer size to the new memory you need, more like an allocator
  • evicting a file whole means that you might have to eat the cost of reading a whole file into memory multiple times during the course of a single text layouting operation
  • font files have disproportionately hotter areas (the certain charmaps/tables you want), and trying to serve that need would basically cause you to reinvent the OS's paging, virtual memory, and file caching
molten galleon
#

preliminary benchmarks (before any fancy alterations to font loading) show I can layout 1MiB of text in about 1.8 seconds on release

#

could probably shave some time here and there if I wanted to go extra hard but the vast, vast, majority of the time is spent on harfbuzz actually computing the shaping

#

so I'd say my main goal now would be to just try to reduce overall memory usage without causing too massive of a regression, and make this thread safe (since rn you can't use the font cache from multiple threads), since that's about all you can do to improve runtime

#

but really, it seems the main way to speed up layout is to tailor it to how you're actually using the text

old harbor
#

that function naming πŸ˜„

molten galleon
#

lol which ones? most of the functions in that profile are from hb/FT

old harbor
#

yeah underscore mixed with capitalized snek

#

PascalCase and lower_case_snek

#

its all in there : )

molten galleon
#

oh yeah lol, just seemed right to name it like that since it narrows the benchmarks by category

#

so you can see how BM_Layout_MultiFont_LineBreak would fit in

#

(which is currently disabled due to epic font cache bugs)

old harbor
#

yeah

molten galleon
#

fakk how do you make a multi-line-graph in excel

#

some random tut tells me my data needs to be kinda arranged like this

#

but how do I do that, if a pivot table didn't

#

fuckkkk how tf do I use excel

old harbor
#

shift+enter

#

or

#

even better

#

you remove the grid lines in that range πŸ˜›

livid patrol
#

One time I tried making a macro for something I was frequently doing in Excel. I never succeeded and I still haven't mentally recovered from it

molten galleon
#

so I got it to treat string length as numbers, that's a start, it still thinks its all 1 dataset though

#

I just want it to show multiple lines for each test

#

ack

#

rather, 1 line for each test

#

I need to figure out how to squeeze it out of this because this is how google benchmark dumps the data to me

#

what a pain

#

you'd think pivot tables would be the way you aggregate it by run name

molten galleon
#

ok I've given up on getting a nice graph view for now, excel has won for now

molten galleon
#

and linux's fontconfig

#

it looks like I've been wandering down essentially the right path, minus the specifics and complexity of my fallback/linking scheme

molten galleon
#

got my FontRegistry API looking like how I'd want it, still some internal stuff to smooth over but things are looking good

#

looks like there was basically no performance change before/after either, including changing from loading font files into memory to just mapping them

#

but now building text layouts is safely multithreadable

old harbor
#

i found something

#

which might be something for you

#

when you decide to bin vulkan and come back to the holy opengl land

#

or want to use gl for some dummy project

#

NV_path_rendering πŸ˜„

#

requires gl1.1 at least, and you can apparently feed novideo cards SVG and postscript path syntax

#
        const char *svgPathString =
          // star
          "M100,180 L40,10 L190,120 L10,120 L160,10 z"
          // heart
          "M300 300 C 100 400,100 200,300 100,500 200,500 400,300 300Z";
        glPathStringNV(pathObj, GL_PATH_FORMAT_SVG_NV,
                       (GLsizei)strlen(svgPathString), svgPathString);
#

probably compiles that into some draw list of sorts under the hood

#
        const char *psPathString =
          // star
          "100 180 moveto"
          " 40 10 lineto 190 120 lineto 10 120 lineto 160 10 lineto closepath"
          // heart
          " 300 300 moveto"
          " 100 400 100 200 300 100 curveto"
          " 500 200 500 400 300 300 curveto closepath";
        glPathStringNV(pathObj, GL_PATH_FORMAT_PS_NV,
                       (GLsizei)strlen(psPathString), psPathString);
#
        static const GLubyte pathCommands[10] =
          { GL_MOVE_TO_NV, GL_LINE_TO_NV, GL_LINE_TO_NV, GL_LINE_TO_NV,
            GL_LINE_TO_NV, GL_CLOSE_PATH_NV,
            'M', 'C', 'C', 'Z' };  // character aliases
        static const GLshort pathCoords[12][2] =
          { {100, 180}, {40, 10}, {190, 120}, {10, 120}, {160, 10},
            {300,300}, {100,400}, {100,200}, {300,100},
            {500,200}, {500,400}, {300,300} };
        glPathCommandsNV(pathObj, 10, pathCommands, 24, GL_SHORT, pathCoords);
#

turtle grafics en novideo

#

ideal to draw glyphs πŸ˜›

molten galleon
#

wow lmao

molten galleon
#

kinda makes me wonder too if this is/was a vector for exploits

#

because you have to think that at least at some point some browser might've been pasting raw <svg> strings into this

olive sapphire
#

don't browsers just use bigboi graphics apis like Skia

#

don't think they would need anything like this in the first place prob

old harbor
#

yes

old harbor
olive sapphire
#

I saw this during my research but it's kinda useless because only Nvidia GPUs support it lmao

old harbor
#

ye

#

it might not be useless though

#

if your vector grafics editor you are making can detect nvidia, you could potentially use it and have hardware accellerated vetor grafics

olive sapphire
#

yeah but if your other option doesn't use cpu prerendered in the first place, it's probably too dogshit to exist

#

and I don't think you can use it in place of a CPU prerender

old harbor
#

you dont know

#

and its there

#

use it or dont : ) doesnt mean its dogshit

olive sapphire
#

nah this is cool, but for 99% of people that are trying to make their games accessible at all, i don't think this would be an option πŸ˜”

#

i wonder if you could build a resizing algo that efficiently resizes based on the vectors instead of having freetype generate them for you again

molten galleon
#

I'd wanna figure out how the driver implements it

olive sapphire
#

good luck

molten galleon
#

I'm guessing there's a 'standard' set of techniques to rasterizing vectors tho

olive sapphire
#

not like the whole linux community spends decades trying to figure that shit out and still fails

old harbor
molten galleon
#

stuff like this is probably bundled as special shaders the driver people wrote

olive sapphire
old harbor
#

yes, fixed function stuff from before you were born probably

molten galleon
#

kinda like the BVH builder shaders that pixel writes

#

I bet mesa has an implementation of this ext

olive sapphire
#

i looked it up and nothing showed up

#

πŸ€”

#

it's mentioned here but i can't find docs for it

old harbor
#

ah its glNewList(... GL_COMPILE)/glCallList

#

my DSA brain turned it into glCreate...

#

its been a while but i have written code using those

#

23 years

olive sapphire
#

it renders subsequent draw commands?

#

is that kind of like a vulkan command buffer?

old harbor
#

no

#

much simpler

#

vulkan commandbuffer stuff is more like NV_command_list

molten galleon
#

iirc graphite posted a while back how he figured out how to do some ptr offset hacks into VkCommandBuffer on nvidia and write in NV device commands directly

olive sapphire
#

oh ok

molten galleon
#

fun cursed stuff

olive sapphire
molten galleon
#

idk how he figured it out

olive sapphire
#

i'm not experienced but i don't there there are too many diverse ways to go about that

old harbor
#

ah you said it, fun cursed stuff, indeed

molten galleon
#

yeah not at all, its just a neat hack, and a look into how commandbuffers are actually implemented on nv drivers

old harbor
#

a list of numbers with other numbers in between πŸ™‚

nocturne hill
#

same thing

molten galleon
#

haven't done anything more advanced than modifying values in cheat engine

nocturne hill
#

well those values you were modifying were inside a struct of some sort

#

you figure out some amount of the layout of that struct somehow

#

VkCommandBuffer is always a real pointer, basically

#

and so you do ((nv_cmd_buffer*)myVkCb)->things

#

very simple

#

but also if graphite learned this by poking at nv driver with a disassembler, the resulting work is potentially not legally shippable

#

this also won't work in presence of any layers that implement VkCommandBuffer

old harbor
molten galleon
#

pretty cool, I like how he goes from first principles and explores all the differentm ethods

balmy gulch
old harbor
#

sebastian can explain all the stuff he does so nicely

nocturne hill
old harbor
#

noice

nocturne hill
#

:frog_ancient:

#

I should redo my path rasterizer at some point, I'm not really sure I agree with lague's takeaways tbh

#

finding roots of even quadratics is a huge floating point pain

#

which he does demonstrate in the vids

#

in fact IIRC my solution to numeric robustness was to switch to iterative root finder

#

it was simply more robust against higher power coefficients being close to 0 than solving analytically

#

I was considering tessellating my stuff to linear segments

#

that stuff is very robust

#

or maybe I should do something else idk

#

and also I rather want path/vector textures

#

instead of just drawing text on-screen

#

though

#

that might be a meme use case

#

because there's no way to author that

#

and a much better solution for vector-looking memes on models is still SDF/threshold masking

#

because that's actually authorable in something like blender

old harbor
#

NV_path_rendering my beloved πŸ™‚ (jk)

molten galleon
#

wdym no way to author that, inkscape?

nocturne hill
#

well I guess you could do that but idk it's not the most convenient looking thingy

#

I don't think you can use SVG as a texture in blender

molten galleon
#

oh I see what you mean, using svgs as texture maps for models

#

that'd be cool but you'd probably have to roll your own tooling via a plugin

nocturne hill
#

that tooling would just be converting stuff to bitmap at like load time and it would suck a lot

#

I tried doing ktx2 images in blender that way once

molten galleon
#

Finally got around to removing all my optional git submodules and just pulling them if you have tests enabled and don't have them in tree

#

now the library is officially usable and not clutter

molten galleon
#

don't love synthetic bold, but it works

#

need to properly override the advance lookup

molten galleon
#

synthetic italic isn't nearly as bad

old harbor
#

i prefer the light and regular ones tbh

#

and medium for specific fonts

molten galleon
#

regular is the standard font weight

#

that's the one that's being scaled

#

most of the time you only ever see regular and bold

old harbor
#

yeah

#

i put Light on purpose for coding fonts sometimes

#

or imgui :3

molten galleon
#

and (non-synthetic) smallcaps

#

slowly working through the remaining feature list

molten galleon
#

oh boy I am getting to the fun parts, so far I've been doing smallcaps, superscript, and subscript glyph substitutions by passing the relevant opentype font tags to the harfbuzz shaper, and it just does the substitution lookup from the gsub table if possible, but now I'm looking into how to synthesize them if they're not present and harfbuzz doesn't seem to provide too much for that

#

I think relying on opentype features is on the whole, the wrong way of going about it, and if I want synthetic substitution glyphs I'd have to do a pre-scan in the step where I generate font runs and have some kind of font instance for handling the substitutions via the hb nominal glyph function override

#

if at all possible though I want to avoid the scenario where I need to parse the otf tables (or worse, write into them)

left junco
#

What is the input to this whole pipeline

#

Just font files?

#

Or is there some database that fonts only affect a subset of

molten galleon
#

there's font files, font family description files, and the formatting runs data for the target string

#

font family files describe the relationship between individual faces, which is mainly necessary for changing the weight, (style) italicness, and finding the correct sources for languages and symbols

#

and the formatting runs just describe the intervals of fonts, colors, italic, underline, stroke, etc. for the string, in my library I provide a generator that parses inline XML from your string for formatting, but your source could be anything, like syntax highlighting

molten galleon
#

LLVM ERROR: IO failure on output stream: No space left on device

#

fuck guess I'm not building chromium, only 450 files left

#

I could probably dig up an HDD and move the chromium dir there

molten galleon
#

in the absence of actually building chromium, I think I found where they do the smallcaps transformation

#

haven't traced entirely what goes on but it mimics what I assume I'd have to do, which is to split out smallcaps runs during my run segmentation stage

#

based on everything I've seen though, I think now is the time to start building custom harfbuzz font callbacks instead of relying on hb-ft.h

molten galleon
#

one of my original ideas was going to be to emit bitflags on the high bits of glyphs to signal if they need synthesis, I think it's actually ok since hb_codepoint_t is unsigned but truetype fonts max out at 16 bit glyph indices I think, but I am not confident enough in that to risk it, and it seems chromium doesn't, instead what they seem to do is split a string like Hello World to H ello W orld and mark those ello and orld ranges as handled by a font that always does a synthesis

#

though I'll have to see how to handle non-synthetic caps since from probing the hb buffer callback, it only does a glyph substitution if it actually has the smcp table and hits a possible substitution, so my get nominal glyph would have to look something like

if is OTF and has smcp table and smcp[chr]:
  return glyphof(chr)
else
  return glyphof(toupper(chr))
molten galleon
#

however I don't see any evidence of that in chromium, I might have to build it after all

molten galleon
#

been spending a few days fucking around with building chromium

#

whatever version I pulled from main and built (~9 hours) seems to crash on startup, with a mile deep stack trace I have no idea how to debug, so I had to get the git history and start changing tags (like an hour to pull and overnight to do the deltas)

#

switched to the latest tagged version and I'm rebuilding, only 37k files I need to rebuild

#

all to find out I probably already found all the references I need in the source code search, but I still want to have a working version to probe

molten galleon
#

I think I've placed too much trust in chromium, I finally did what I should have done to start and put together a simple test page to invoke synthetic smallcaps and subscript/superscript, and I realized that the reason I can't find these features in the source code is that chromium doesn't even support them

#

but my test page works in firefox, so time to start snooping through firefox instead

nocturne hill
#

@molten galleon hello where do I license your path rasterizer

molten galleon
#

I haven't bothered looking through actual glyph rasterization methods yet

#

been too deep in font features

nocturne hill
#

πŸ’€

molten galleon
#

for now I just rasterize via freetype to an atlas, and have path rasterization for "later"

#

though this is my last major feature before I do that

molten galleon
#

so firefox does pretty much what I expected I'd need to do

#

for synthetic subscript and superscript it basically completely overrides the font settings and doesn't emit a subs/sups tag, just creates a copy of the base font with scaling and translation applied, there's no mixing of synthetic and whatever scant few letters are maybe provided as subscript/superscript, which I guess works out best overall

#

for smallcaps it looks like substitution is never done in hb_font_get_nominal_glyph, but as a part of the subfont-splitting stage of run generation, where it writes out a transformed string based on the actual capping rules (smcp, pcap, c2sc), gonna have to check what ICU provides me here since mozilla is full of a lot of handrolled unicode support, and of particular note is that Greek casing is apparently complex enough to warrant a lot of notes

#

gonna have to look further to see if they check tables/glyphs for availability, but I think due to how CSS font-synthesis tags work, deciding what gets synthesized is mainly user driven

molten galleon
#

so the chromium SmallCapsIterator just marks out runs of small-cappable chars by using u_hasBinaryProperty(UCHAR_CHANGES_WHEN_UPPERCASED) and tacking on u_getCombiningClass() to make sure combining marks join with their adjacent run

#

in the firefox impl they seem to use their custom unicode functions to lookup if a char has some kind of alternate capitalization mapping, but the main thing is that they run a toupper transformation using their custom uppercase transformer thing, which I think I'd have to do with ICU's transformers

molten galleon
#

Case mappings may produce strings of different length than the original.
For example, the German character U+00DF "ß" small letter sharp s expands when uppercased to the sequence of two characters "SS". This also occurs where there is no precomposed character corresponding to a case mapping, such as with U+0149 "Ε‰" latin small letter n preceded by apostrophe.

nocturne hill
#

ΗΉ

molten galleon
#

gonna have to test out how this applies to editable text, if it does at all, since this messes with any kind of logic that'd let me build a remapping between the u_strToUpper result and the original string

#

so true

nocturne hill
#

unicode more like unicΓΆck

molten galleon
#

mozilla has a custom implementation of locale-aware str toupper that seems to obey turkish and greek casing rules but might just ignore funky stuff

molten galleon
#

hmm, so I actually don't know what kind of funky logic they use to actually decide whether to synthesize or not, because it seems to trip on the Straße, but if I force enable smallcap synthesis in the code I get der STRASSE just as I expected, and just as u_strToUpper gives me with lang="de", and based on how the selection cursor interacts with it, the 2 S's are treated logically as 1 character

#

I think I have enough info to start building my own solution, perhaps slightly better than firefox/chromium in these minor ways

old harbor
#

i have a huge respect for you putting up with this text stuff πŸ™‚

#

23:23:23.232

molten galleon
#

I just want it to work well lmao

#

I like how smallcaps look, and from the frontend of the renderer they're just toupper(c) scaled down a few percent, so how hard could it be?

#

little did I know...

molten galleon
#

so one minor thing I failed to consider was that mozilla's TransformString does also fill a bitset with what character indices it performed a special case substitution on, and marks if the string needs a merge, which I can't do without having my own implementation

#

which means it's NIH time baby

#

as a bonus, mozilla and ICU's uppercase transformations are both UTF-16 native so rolling my own UTF-8 native one means I can continue to avoid having to convert there and back, which I've successfully avoided so far

molten galleon
#

finally piped everything through to get synthesis for subscript and superscript (here it's actually seamlessly combining with natural smallcaps, not synthetic yet)

#

so now I should have everything I need for synthetic smallcaps from the perspective of glyph transformations, just need to add the char substitutions

old harbor
#

you could toss a little sneak peak in there as well "libFancyFont" with a superscript "tm" πŸ˜„

molten galleon
#

here's how it looks with the ℒ️ emoji

old harbor
#

that does look neat

molten galleon
#

this is important information

left junco
#

Linux terminals do the same thing, it's to save having to redraw the screen since on an old terminal it would have been the only thing moving

#

Idk if there's any reason for it these days beyond historical

#

But I like it

molten galleon
#

honestly shocked that it more or less "just works"

#

just using a hardcoded substitution for now, but it just treats the 2 glyphs like 1 character, just like it's supposed to

#

although it does need to be improved a bit since clearly only the 2nd S glyph is being used for the bounds

#

but now synthetic smallcaps just work, all that's left is to implement the unicode case mapping algorithm

molten galleon
#

ah yes, parseable raw data where some comments are data and some are comments

#

both ICU and firefox seem to have some kind of mix of generated lookup tables and handwritten logic

molten galleon
#

oh boy, I found the icu::Edits class and icu::CaseMap::utf8ToUpper, no need to do any complex casemapping crap myself

#

I think now I've covered every core font styling feature I wanted to add

#

and only 2 layout features left for later, vertical text and tab alignment

#

so now it's time to fix some bugs and start on glyph rendering

old harbor
#

what about angled text πŸ˜›

#

i do have some 45deg text in some of my excelsheeterinos hehe

molten galleon
#

like globally angled?

#

iirc there's a param in the GDI font api to set individual character spins

old harbor
#

ye globally, as in the whole text in the cell is angled

molten galleon
#

that's on the UI layer to do, and the glyph renderer to make not look aliased as fugg

#

since you just get rect positions in the space of your parent textbox

#

I found what it was called again, escapement and orientation

#

orientation is the rotation of the baseline (so the angle of the string) and escapement is the rotation of each char

#

so to draw a string at an angle, you set orientation = escapement, otherwise your characters are slanted

#

and I assume the reason it does this is probably because your glyph in your atlas has to be rendered as (char = 'A', font = ..., escapement = 45) so that the final bitmap doesn't alias, which is maybe neat to have support for, but I think in a game engine you'd rather handle that with your glyph shaders

#

with an SDF or vector based glyph

old harbor
#

tell that eve online players

#

with their excelsheets in game πŸ˜›

#

jk

molten galleon
#

hmm I feel sort of dumb, for a while I thought scaled fonts had some hidden internal scale factor, but I realize I knew how to calculate font metrics the whole time, given a ppem size (4/3 * pt size), it's just upem / ppem

old harbor
#

@radiant spear feel free to pester DR about font isms too, he knows his shit most on this server i think πŸ™‚

molten galleon
#

gonna interrupt the regularly scheduled text content to post about sails so I don't clutter other people's threads further lol

#

that convo got me thinking about taking another crack at modeling sail force

#

and generally you have these 2 main forces

#

plus your air drag which is proportional to v_effectiveWind^2 in the direction of -v_effectiveWind

#

I realized just now you could probably get a good enough approximation of the lift using standard wing models and using the chord/sweep/twist/whatever other descriptors of standard wings you have to interpolate the models, and just account for the fact that when v_effectiveWind is parallel to t_sail, the force is 0 because the sail goes flat, and it flips if dot(v_effectiveWind, n_sail) < 0

#

this btw is why you see a peak of power somewhere around a broad reach, since you get the maximal combo of lift and reaction

#

@left junco if you were further curious about the model

#

there's some interesting behaviors that arise from it too, since in either case the power is delivered in the direction of the sail's normal, but your forward speed is proportional to dot(d_ship, n_sail), so with an "idealized hull" with no lateral drift, there is an optimal angle to maximize the force delivered in the forward direction, but that can change with the lift force since that's a function of effective wind, which depends on your speed, so there's a feedback loop

#

and you'd probably have to solve a differential equation to find its optimal stable state

#

I might have better luck figuring this out 'acquiring' a book on aircraft physics instead of trying to get one of the rare sailing oriented ones, dunno why I never thought to do that

left junco
#

I'll read this in a bit

#

But yeah it could be helpful

molten galleon
#

back to text stuff, I've been coping with the fact that I can't seem to tune my MSDF generation by copying the windows menus for my debug vars

#

and it's a little compressed in the video so it takes away from it, but man my MSDF settings suck

#

I've been picking through the code in msdf-atlas-gen to match what he does

#

but I haven't quite got it

#

and I want to at least get a fair setting of MSDF before I compare it to Qin or direct curve rendering

molten galleon
#

so I tightened it up a tiny bit and double checked that my math matches msdf-atlas-gen, which it does, but MSDF is still fuzzy at large fonts and ugly at small fonts, I think this is down to how I'm rendering it perhaps

#

to render it I referenced this article which referenced in turn this github issue on msdfgen

Medium

MSDF is an efficient, modern way of drawing fonts at any size and this article looks at how you can implement it in your application!

GitHub

Hi Chlumsky, I noticed some aliasing artifacts when rendering fonts using msdfgen. By changing the shader a bit, I was able to improve the rendering quality: float sigDist = median( sample.r, sampl...

#

which gave me the following code:

vec3 msdf = texture(u_texture, v_texCoord).rgb;
float sigDist = median(msdf.r, msdf.g, msdf.b);
float w = fwidth(sigDist);
float opacity = smoothstep(0.5 - w, 0.5 + w, sigDist);
    
outColor = vec4(v_color.rgb, v_color.a * opacity);
#

hmmmmmmmm

#

might need to open it as a generic GP question because I don't really know how antialiased SDF rendering works

molten galleon
#

hmmm I think I'm getting there except for this annoying artifact

molten galleon
#

a lot of experimentation and yet I think I had the best solution the whole time

#

I think small letters just suck with SDF fonts

#
float w = fwidth(sd);
float opacity = smoothstep(0.5 - w, 0.5 + w, sd);

seems to just work, and is especially good for upsampling

#

this article with its more primitive version seems to show it even works in 3D

#

wonder what I can even do for downsampling, I tried generating mipmaps (automatically) and that didn't really give me much

nocturne hill
#

if you have chunky pixels you need to do subpixel rendering which can get real headacheful with blending and stuff

#

but it's just evaluating the sdf at 3 points instead of 1

nocturne hill
#

I don't think pretending SDF is a signal and low pass filtering it is a good idea

#

but let me think for a few seconds

#

ah actually

#

right

molten galleon
#

whatever glGenerateTextureMipmaps uses, I thought about it for a second too and since SDFs discretely sample a lienar thing it seemed fine

#

it didn't even seem necessary actually

nocturne hill
#

anyway

#

I don't think you can do anything at all with SDFs

molten galleon
#

though it makes sense overall why small images suck, you completely miss the important samples that describe the edges

#

potentially

nocturne hill
#

aliasing when letters are small is because you can't represent the edges or whatever

#

SDF can't help you there

#

well

#

you don't need normal images

#

you can just sprinkle some TAA at it

#

assuming you're drawing in 3D

#

also >using gl

#

cringe!

molten galleon
#

this is my shitty test program and not my main vulkan renderer so my goal is to just have as small of a renderer footprint as possible

nocturne hill
#

ok

#

well anyway

#

in 3D I just wouldn't care and let TAA eat it

#

but if you do just switch to normal bitmap with full mip chain

#

well I guess there's also the sucky case of when you're doing a very anisotropic sample

molten galleon
#

yeah, it seems good enough overall for 3D but I guess the lesson here is you can't have MSDF be a 1 size fits all solution for 2D text and 3D

nocturne hill
#

sdf just kinda sucks tbh

#

msdf too

#

msdf actually even worse because the algorithm is something crazy, slow and falls apart in some cases

molten galleon
#

yeah, I'm hoping the Qin paper that jasper mentioned might at least be a saner formulation of MSDF-like stuff

nocturne hill
#

just rasterize paths as is

#

no need for this msdf bs

#

you can also modify your path rasterizer slightly and have it render sdf

molten galleon
#

that's what I will probably end up doing, but the Qin thing seems interesting to at least try, esp. if it ends up being better than MSDF

nocturne hill
#

this way you can also generate bitmap representation at runtime + mips

molten galleon
#

since everyone seems to shill MSDF

#

and I don't think there's a publicly available implementation of it anywhere

nocturne hill
#

I'll take a look once it downloads LMAO

#

the pdf I mean

#

but it's probably a nexttonothingburger

#

ah nvm found the pdf that opens quickly actually

molten galleon
#

just paste the link into scihub

nocturne hill
#

ok I think I understand what it's doing

#

IIUC there's a list of lines per cell in the grid and they compute min or max { distances to all lines in the list }

#

or segments

#

something along these lines

#

hmm yea it's segments

#

section 5 goes in detail

molten galleon
#

yeah this is pretty weird and itneresting, this is the first time I actually inspected it closely

#

the way it was initiallly described to me was as some kind of much earlier proto-MSDF but with a saner distance function

#

the data representation seems a little funky for modern use though with the voronoi diagram and the indirection

nocturne hill
#

why

molten galleon
#

there's some kind of per-texel indirection and linear search

#

so each glyph seems to point to an index in the line segment table and then it also looks at the next 4 neighbors

molten galleon
#

been rolling the Qin stuff around in my head for a bit and I'm starting to appreciate the elegance of it

molten galleon
#

FUCK

#

I have 2 projects, 1 that's just a tester for my lib and 1 that's my actual engine

#

the tester can link the library and run code, my real engine tells me it can't find libicudt

#
ld.lld: error: undefined symbol: icudt73_dat
>>> referenced by libicuuc.a(udata.cpp.obj):(.refptr.icudt73_dat)
#

and I even set up my tester to have a sub-library like the LibEngine in my main repo

#

this literally makes 0 sense

#

the raw compile commands even reference the same libs in the same order

old harbor
#

are you on windows still?

molten galleon
#

yep

old harbor
#

delete build/ (or whatever $(IntermediateDir) in vs perhaps?

molten galleon
#

ok now I'm really confused

#

I'm building with gcc

old harbor
#

with cmake?

molten galleon
#

I copied the libicudt.dll.a from my working project to the non-working project

#

yep

#

not even the actual dll, just the dll.a file

#

and that fixed the build

#

so for some reason, just my engine project is building a bad dll.a file

old harbor
#

dll.a should just be the forward for the dll

molten galleon
#

very consistently

old harbor
#

hmm do you have a rogue setting somewhere in your cmakelists perhaps

#

which switches BUILD_SHARED or something

molten galleon
#

I shouldn't

#

hmm brb

#

I will get back to this

old harbor
#

: )

#

welcome back btw hehe

molten galleon
#

ok I came back, made a new cmake build directory and config'd it, and that directory doesn't have this issue

#

and I did try doing ninja clean in my old one and rebuilding completely

#

so I guess my old build dir had some magic state in the cmake cache that told it to fuck my shit up

#

average saturday morning C++ session

old harbor
#

: )

#

deleting build/ only didnt do the trick?

molten galleon
#

I just made a build2 in case I had some kind of untracked junk in my build dir I wanted to copy over

#

which is essentially the same as deleting build/

old harbor
#

ah

molten galleon
#

I don't think there's any particular setting that would've caused this, I don't have BUILD_SHARED_LIBS enabled in either, and my ICU setup is handrolled to build everything static except icudt.dll

molten galleon
#

so, asset manager = finally integrate LibRichText into my engine = set up a fresh text atlas using my new UploadPool = I don't have the desired subregion upload behavior

#

I think I understand why devsh has a synchronous mode for his upload pool

#

however there is no way I'm waiting on a semaphore to upload glyph subregions

wet drift
#

we don't actually wait per subregion/glyph

#

timeline semaphores are truly bae here

molten galleon
#

yeah that's the conclusion I ended up coming to

#

I added a "semi-synchronous" flush() operation to my upload pool where it will wait on intermediate submissions if it needs to wait to stage more upload jobs, but it won't wait on the last submission

#

so if there's only 1 submission required it doesn't wait at all

molten galleon
# wet drift timeline semaphores are truly bae here

btw, if I issue my upload submissions w/ a final transition barrier and it's issued to the graphics queue, I don't have to wait on the semaphore in my main render submission because it's also on the same queue, right?

wet drift
#

yes

molten galleon
#

epic

molten galleon
#

briefly derailing my own thread to come to what is perhaps the most obvious realization

#

displaying an ECS style level in editor is as simple as displaying the entities in their transform hierarchy in the explorer/tree view, and displaying the components in the properties pane, with each of the subheaders (Data, Selection, State, Text in the examples) just being representatives of the components

#

also >OpenTypeFeatures
I need to cook...

old harbor
#

: )

#

dotnet peddler spotted

molten galleon
#

weak, pathetic

#

I sleep

old harbor
#

you can even use "Category" on the left side to "group" properties into categories or "components"

molten galleon
#

yeah, that's how it seems to be used here

#

but this is a pretty convenient way to hijack this scheme to display ECS instead of a polymorphic node tree

old harbor
#

yeah

#

you cheapskate

livid patrol
#

It also helped to look at existing things that do something similar (blender, ue)

molten galleon
#

I didn't know blender was ECS-like in any way

#

it has pretty strongly typed nodes afaict

left junco
#

Now you just have to add all the serialization code bleakekw

#

Or the reflection rather

molten galleon
#

that's for my python scripts to do

left junco
#

Aha smort

#

I forgot you were codegen-pilled

molten galleon
#

behold the IDL for my UI classes

left junco
livid patrol
#

Idk how blender is structured internally

molten galleon
#

oh, yeah the explorer+properties dual-pane thing is pretty common once you start noticing it

#

I think it originates in VS actually, and it kinda makes sense that application developers would be most inspired by the tool they're using to make the app

#

"hmmm, I'm in the forms editor, but how should I make my application look?"

old harbor
#

wpf makes this also super sexy

#

you design the control you want for a specific type, add it as a resource, and have a DataTemplateSelector select it based on the current object'type

#

its just a big switch block

tropic shoal
#

Blender is more ecs base internally as the data model is mostly C and not cpp. The python api is a generated wrapper on top of the c data model.

molten galleon
#

that's pretty interesting, also it does look like blender is keen on visually displaying "has a" relationships in the tree view

#

I hadn't really thought too deeply about it

molten galleon
#

fuck I found a fun case θΆ³οΌ©οΌ«.R

#

the I and K are full width (so funky shift-jis compatability stuff), but they register as latin so it selects the latin face instead of the CJK face

#

I think I might just fix this by having my CJK face be the lowest priority general fallback for my latin face

#

ok that did it

old harbor
#

that explains why CJK people when writing latin text always have ugly looking ms sans serif stretched, wide text

molten galleon
#

adding engine features vs. scrolling through famfamfam/fatcow icons to find the perfect ones

old harbor
#

ah man that looks neat

#

famfamfam do be timeless

molten galleon
#

in my recent bikeshed rabbithole I started NIHing an LSP client to get syntax highlighting, only to find out LSPs don't feed you basic token info

#

so now I have an LSP client and need to roll some kind of tokenizer and syntax highlighting provider on top of that

#

and I think I was supposed to be working on asset libraries or something...

left junco
#

What's LSP

old harbor
#

langauge service/server protocol

#

the thing which gives you syntax highlighting or intellisense in your vim or else where

left junco
#

Ah

molten galleon
#

it doesnt' give you syntax highlighting, but it does give you a lot, like go to declaration/definition, code lens, semantic tokens, etc.

old harbor
#

it could though

#

LSPs usually run on a syntax tree

molten galleon
#

but what I found in practice was that the document tokens request doesn't actually give you tokens with a fine enough grain on the range in text for highlighting

#

it seems like both document tokens and semantic tokens are for augmenting syntax highlighting in ways that simple regex passes from textmate grammars wouldn't

#

and the recommendations all basically point to flagging tokens with a preliminary highlighting pass then augmenting with semantic tokens

old harbor
#

probably makes more sense to just regex the text you see right now in that editor view, and color that accordingly, rather than trying to color the whole 10mb worth of your current.cpp πŸ™‚

molten galleon
#

yeah, lsps support sending partial text and partial requests afaik

old harbor
#

the asset system of yours will greatly benefit from it either way πŸ˜›

#

i will allow the side quest

molten galleon
#

@balmy gulch

balmy gulch
#

ah it was this KEKW

molten galleon
#

all this text talk reminds me I need to finish my articles

old harbor
outer dune
#

It would be amazing forgelove . You are literally working on stuff that very very few people work on and most take it for granted. Information is scarce and often times mixed (just looking around, the name "font" is used in so many different context for so many different people).

molten galleon
#

tried to make it as short and direct as possible but have enough narrative structure that it wasn't just a dump of "WATCH OUT FOR THIS"

#

I'm kinda worried that it's perfectly clear to me but way too terse for anyone to actually find helpful lol

old harbor
#

that screenshot of your ui thing which looks like a winforms knockoff, needs to go into a neat library eventually

#

so that we can make use of it too : >

outer dune
#

I was checking in your library how you handle text line breaks, but I wasn't able to find it.
Any tips/pointer to make correct line breaks ?

molten galleon
#

I piggyback off of SheenBidi here, this is in my unpublished part2

molten galleon
#

I didn't even write that part in my blog yet, actually

outer dune
#

Well, I was looking for both. I will start by looking how sheenbidi works and what it does.
I'm looking at the whole concept here, soft/hard line break as well as hyphenation.

molten galleon
#

basically everything is in build_layout_info_utf8.cpp

#

I rely on SheenBidi for hard breaks, (aka character-induced linebreaks from CR, LF, LSEP, PSEP), those are really easy to iterate manually because you just search for those 4 characters

#

but SheenBidi just does it, so I let it happen

#

for soft linebreaks, it's basically how you'd expect it to happen conceptually - you do all your shaping first, then you try to find the max run of characters such that width(run) <= textAreaWidth

#

but the trick is you use an icu::LineBreakIterator to find safepoints to break the line

#

so it's max(safe linebreak point) s.t. width <= textAreaWidth

outer dune
#

Much much thanks πŸ™
What is s.t ?

max(safe linebreak point) s.t. width <= textAreaWidth

molten galleon
#

such that lol

outer dune
#

Ah lol

#

So it could be done this way :

  1. Shape the whole text through harfbuzz
  2. Find the closest linebreak to textWidth thanks to ICU
  3. Discard everything beyond into a new buffer
  4. Re-Shape the line
  5. Validate or re-iterate from 2
  6. Go down a line through lineHeight found in the font metrics by scale
  7. Restart from 2 for the rest of the text
#

For soft line breaks.
Hard lines breaks can be a pre-pass to that

#

I think I get the gist of it. I will look into your build_layout_info_utf8.cpp file πŸ™ forgelove

molten galleon
#

you don't reshape the line, the shaping data is always valid, you literally just cut it

#

that's the contract that the icu::LineBreakIterator helps you fulfill - it tells you based on unicode data what is visually safe to cut, so you never end up making a slice in a point that'd require reshaping

outer dune
#

Oh wow

outer dune
#

Just wanted to say a huge thanks for the pointers. πŸ™
I finally got the text rendering at an acceptable level and I would not have succeed without your help.

molten galleon
#

briefly back on the horse

#

maybe I'm the last to find this out, but apparently in text boxes the tab width is entirely based on visual metrics

#

so a 4 space tab aligns to sizeof(' ') not the 4th printable character position

#

and CSS gives you two options for it, either supplying the integer space count or specifying the size outright

#

notepad (or whatever text box it derives from) seems to have some kind of particular curve to the tab width in px as a function of the font size, I can't tell where it derives from, but maybe I measured wrong with sharex

#

it seems generally that it's ~6-6.25 pixels per font pt, so a 12 pt font has a 72px tab, an 8pt font has a 50px tab, a 72pt font has a 450px tab, with various values in between

#

I should probably graph it

#

it's also approximately, but not quite, 19 spaces

#

so essentially the tab step is applied after shaping, during the point where you sum the advances, you make the advance of any encountered \t chars equal to alignas(tabWidth) cursorX, however the slightly complicating part is that if you have to softbreak the line, I think you have to readjust the size of the tabs

outer dune
#

Everytime I see a post in this thread now, I expect greatness and good findings πŸ™
I believe if you have a soft break, in some cases that just considered part of the end of the line 1.
In fact it seems... harder to define ? I just tested in Notes. After a certain amount of tabs, it is displayed in line 2.

It seems to be if we are lower than maxWidth, allow one tab at maximum in the line (not visibly displayed) even if it gets bigger than maxWidth. If we are higher or equal maxWidth, 1 tab gets to the next line and is visibly displayed. This would currently break the display in my engine. facepalm

molten galleon
#

the status of softbreaks as lines is its own thing

#

but in this context because tabs are a purely visual operation, softbreaks affect it because the tab length affects the length of each line

#

also 99% sure tabs will break your display by default because by default they resolve to a tofu

#

well not "break" but look ugly

molten galleon
#

got myself worried for a second, but it looks like L1.1 enforces the fact that a bidi shuffle won't affect having to recalculate the dynamic width of tabs

left junco
#

You could probably make a living off of this library lol

molten galleon
#

I'm going to make the much smarter decision and use this technology solely for textboxes in a game engine that will never ship a game

left junco
#

Based

molten galleon
#

also idk why I didn't think to do it before but I've also got the CSS spec open

outer dune
#

Yeah the best implementation AFAIK is still CSS spec (so far that I found)

molten galleon
#

uh oh, codebase is calling in my technical debt

#

I can't properly accumulate line-space offsets to do tabs because of my shitty combination of bases I'm working in, I basically have some data buffers in whole-string space, paragraph space, logical run space, and some redundant representations of glyph widths

#

mainly because I started from ICU layoutex and I wanted to make sure I didn't fuck up the conversion to utf-8

#

but now I think it's refactor and rewrite time

molten galleon
#

I like how the CSS spec complexity is literally split up in levels

#

I wonder where the spec for text editing is, maybe it's in the html spec

#

I think I just don't know enough about how you define a text box in HTML+CSS to know who'd be responsible for specifying it, I found the html spec for input type="text" but it doesn't specify the actual input controls, and the CSS specs seem to only cover display

molten galleon
#

I guess the actual editing is left as an exercise to the implementer because individual stuff like the cursor is roughly spec'd but I guess it covers for onscreen vs. physical keyboard

#

or other funky input methods

molten galleon
#

hmm I really left a big stinky for myself, I didn't document the directionality any of my data is expected to be in so now I have to figure out what steps need what data in what order

#

my final layout info is strictly in (post-UBA) visual order, the data from shaping (which HB gives you in visual order in logical-run-space) I homogenize into logical order (following the order of the string) and then potentially reverse again into visual order based on the bidi level

#

the position data (float) stays forever in visual order without reversal but I currently generate a list of widths (in 26.6 fixed) in visual order that I want to get rid of but that might mean doing something funky like storing my secondary axis in visual order and passing it through to the end, but storing my primary axis in 26.6 fixed in logical order so I can do line breaks and then flipping it during the final run write

#

either way I need to document this mess...

molten galleon
#

did you know there's an hb-cplusplus.hh harfbuzz header with a unique_ptr for the buffers?

#

I've been obsessing over how to minimize the number of copies/reversals I need to do in total but maintain my logic, yet it seems like firefox calls hb_buffer_reverse to throw things into logical order... and I can't seem to tell what chromium does because they definitely don't call that, and I don't see them doing much directional branching

outer dune
#

That is interesting!

molten galleon
#

I also need to set up some benchmarks for conditional reverse iteration and binary search

#

I think I remember from that andrei alexandrescu video that binary search is slower up to like 4 or 5 digit arrays due to the power of cache

#

could be completely missing the mark with that estimate though

molten galleon
#

hmm, this might be a perf neutral thing but I think I'm gonna give up on this logic, despite having gotten it to work partially successfully

#

I have decided to say fuck people who use RTL languages, you will pay the price of an O(n) reversal of 32 bytes/glyph

#

and by perf neutral I mean "so far it's been inconclusive in the noise of my profiling" which may or may not have shit methodology

outer dune
#

I don't know if there is a lorem ipsum for RTL but that would be a good test. Like you, I expect that to be negligible(until it isn't). You are doing one search that is O(n) right ? That should be fine.

molten galleon
#

I just generate random mixture strings at roughly 50/50 distribution but realistically RTL langs are wayyy less common

#

the problem is basically I have 2 choices:

  1. convert everything to LTR order, perform logic on it, convert it back when necessary
  2. keep everything in native order but have a bunch of branching logic that looks like if (RTL) { ... } else { ... } but only have to do reversals during the really rare case where logicalRunDir != visualRunDir
old harbor
#
.mΟ…ΙΏodΙ’l Ɉƨɘ bi minΙ’ Ɉillom ɈnΟ…ΙΏΙ˜Ζ¨Ι˜b Ι’iΙ”iΚ‡Κ‡o iΟ…p Ι’qlΟ…Ι” ni ɈnΟ…Ζ¨ ,ɈnɘbioΙΏq non ΙˆΙ’ΙˆΙ’biqΟ…Ι” ΙˆΙ’Ι”Ι˜Ι’Ι”Ι”o ɈniΖ¨ ΙΏΟ…Ι˜ΙˆqΙ˜Ι”xƎ .ΙΏΟ…ΙˆΙ’iΙΏΙ’q Ι’llΟ…n ΙˆΙ’iΟ±Ο…Κ‡ Ο…Ι˜ ɘɿolob mΟ…lliΙ” ɘƨƨɘ Ɉilɘv Ι˜ΙˆΙ’ΙˆqΟ…lov ni Ɉiɿɘbnɘʜɘɿqɘɿ ni ΙΏolob Ι˜ΙΏΟ…ΙΏi Ι˜ΙˆΟ…Ι’ Ζ¨iΟ…α‚§ .ΙˆΙ’Ο…pɘƨnoΙ” obommoΙ” Ι’Ι˜ xɘ qiΟ…pilΙ’ ΙˆΟ… iΖ¨in Ζ¨iΙΏodΙ’l oΙ”mΙ’llΟ… noiΙˆΙ’ΙˆiΙ”ΙΏΙ˜xɘ bΟ…ΙΏΙˆΖ¨on Ζ¨iΟ…p ,mΙ’inɘv minim bΙ’ minɘ ɈU .Ι’Ο…pilΙ’ Ι’nΟ±Ι’m ɘɿolob Ɉɘ ɘɿodΙ’l ΙˆΟ… ɈnΟ…bibiΙ”ni ΙΏoqmɘɈ bomΖ¨Ο…iɘ ob bɘƨ ,Ɉilɘ Ο±niΙ”Ζ¨iqibΙ’ ΙΏΟ…ΙˆΙ˜ΙˆΙ”Ι˜Ζ¨noΙ” ,ɈɘmΙ’ ɈiΖ¨ ΙΏolob mΟ…Ζ¨qi mɘɿoβ…ƒ
``` there you go @outer dune
outer dune
#

What.The.Hell

molten galleon
#

what even is that language

old harbor
#

lorem upsom mirrored

outer dune
#

Is that parsed RTL ? I 99% sure it's LTR mirrored

molten galleon
#

it looks like all LTR glyphs to me

#

the only RTL languages in use today are arabic and hebrew

old harbor
#

this was just me do a little trolling, found a mirror text generator and pasted in lorem ipsum

molten galleon
#

then a bunch of ancient/obscure languages that are just technically listed

old harbor
#

yeah there are only like 8 or so common languages which are RTL by default

outer dune
#

I would go option 1. Make everything LTR and convert back if necessary. The other option will introduce a lot of conditionnal branching I believe.

#

Which might be a pain for maintenance later on

molten galleon
#

yeah I think I'm gonna stick with option 1 since that makes the code simplest

#

option 2 gives me literally double the code in certain places despite being technically slightly better

#

plus this is microscopic in terms of execution time, most of it comes from allocation (that I can improve)

#

and the rest comes from hb shaping, but that cost is basically immutable

molten galleon
#

I think the best data model for positions might just be to only store widths, in logical order

#
  • the offset0 for each logical run
#

though the more I test the less I'm sure I need to care about that

old harbor
#

then what do you need to care about?

#

if not widths and offsets

molten galleon
#

just the widths, in aggregate

old harbor
#

ah

molten galleon
#

basically, (and the more I investigate this the less I'm sure about it)

#

the concept of separating offset and advance is to homogenize layout math over vertical and horizontal text

#

and the aggregate offsetX + cursorX (where cursorX is the sum of all advances 0..n-1) is the only critical position for each glyph

old harbor
#

you could pick the one right now which your belly tells to use

#

also the one which might be more readable (code/shader wise)

molten galleon
#

I'm picking widths because it makes tabs the easiest to implement

#

this is internal representation while I'm building the text, the final object stores positions (since that makes rendering the fastest/simplest)

#

because when you're dealing with tabs you have to basically apply realWidth = isTab ? alignUpToNext(lineWidthSoFar, tabWidth) : width

old harbor
#

yeah

molten galleon
#

but the more I think about it, the LayoutEx code I referenced originally already takes the assumption that offset0 = 0 because otherwise accumulating widths wouldn't even get you the line width

#

and also a line split down the middle assumes that the offset/advance of the glyph - 1 doesn't matter

#

I think I'm gonna have to store them heterogenously based on what's the primary and secondary axis

#

because basically these widths are what line breaking (and your cursor position) care about

#

and the green arrows are the offset on your secondary axis, which I guess I could also store as widths but I definitely would have to store o0 separately in that case and it would effectively just be difference encoded

molten galleon
#

ok epic, having a hybrid width/position data model was definitely the way to go, actually simplified my code a bit

#

and now I have no redundant position/width representations in the data, at my goal of 16 bytes per glyph

#

which is like the theoretical minimum I can achieve uncompressed/unquantized, and doesn't count the memory required for stuff that doesn't scale directly with glyph/character count

#

and that raises some pretty interesting stuff to think about when you consider absolutely huge blocks of text, since they require at least 16x the memory to display (and I am being more efficient with my redundant data than chromium or firefox for sure)

#

so rendering absolutely fuckhuge text files (obviously) requires some pretty aggressive culling, and in turn some pretty efficient scans/heuristics to determine what you wanna display in the viewport

#

and I wonder what you'd do there with soft line breaks enabled, since just scanning the string for newlines up front wouldn't necessarily give you accurate positioning for what your viewport is looking at

old harbor
#

isnt it just counting

#

you always know how many characters fit in your viewport

#

and the viewport shows a span of text

molten galleon
#

and some aren't even visible

#

imagine the world's most adversarial 10GB logfile where the first GB is just several billion ZWNJ

#

so line 0 starts over 1gb into the file

#

ο·½

#

𒐫

#

and imagine some of these

#

notepad++ knows the right thing to do though

old harbor
#

ah i forgot about those things

#

hmm

#

you do seek within the bytestream of your text still

#

when you scroll through a file for example

#

then you read the span, and inteprete it somehow, and afterwards you know all the characters which are supposed to be visible and whichnt

#

hmm this isnt trivial πŸ™‚

molten galleon
#

the way I'd do it is just to populate it lazily, first by scanning the entire string for hard newlines (with SIMD + multithreading you could probably rip through several GB/second) and use that as the initial assumption, then actually try laying out whatever's currently supposed to be visible in your viewport

#

then use the softbreaks from that as an "adjustment"

#

but this also makes me wonder if I can use simdjson/simdutf8's code tricks + parsing UnicodeData.txt directly to do some kind of codegen for the theoretical most optimal text prescan possible, based on the actual unicode class queries I need

#

but that's a project for the far future

old harbor
#

man, a true brainworm hole

molten galleon
#

yeah Big Textℒ️ is its own separate rabbithole on top of regular small string rendering

left junco
#

I'm gonna have to make a mesopotamian faction whose soldiers talk in cuneiform

molten galleon
#

the dispute over low quality copper has gone violent

molten galleon
#

replaced building vectors and iterating them with iterators for all of my logical run building, benchmark result: inconclusive

#

however I know in my heart that like 8 less std::vectors get allocated per string, and that is enough to bring me joy

#

need to probably run profiling with MSVC so I can get the cool heatmap on the code and see what it tells me beyond the obvious hb_shape and "why are you allocating a hb_buffer and icu::LineBreakIterator per string shaping?"

molten galleon
#

I see why no one bothers to make nontrivial optimizations to anything outside of shaping now, lmao

#

something minorly stupid and annoying though is that calling icu::Locale::getDefault() too frequently has a tangible cost

#

though other than that, the main thing that registers outside of hb_shape (as like 3% of total time lol) is the allocation costs of SheenBidi

#

actually this was a bad test, since the profiling grouped all sizes of string (up to 1MB) into the same calculation

#

for my smallest test (string size = 64) hb_shape is 'only' 71.81% of the execution time

#

but as expected this only magnifies allocation snafus, e.g. I don't do any reserve calls on my final layout struct, but I do for my temp build vectors, and as such there's a huge cost of allocation from appending positions and glyphs

#

the allocation for the icu::LineBreakIterator is also significantly more expensive than the SBParagraph and hb_buffer, but that's fine because I can factor it out, and seemingly using it is mostly fine

#

and interestingly some attrition from using UText in computing subfonts instead of a dedicated UTF-8/16 iterator

molten galleon
#

profiling even shorter strings basically tells me what I expected - initialization and allocation dominates even harder and small buffer optimization (and finally not being lazy and factoring out the reusable stuff) will get me basically the only meaningful wins I can conceivably get

#

however... will I eventually try to create some kind of simd version of sheenbidi just so I can say I beat it in benchmarks? perhaps

#

and hb_shape in turn is basically bottlenecked by however quickly I can read glyphs

#

so, harfbuzz is itself pretty impressive in its optimization

outer dune
#

I'm happy to hear that I found the same issue. The bottleneck is also hb_shape on my end.

#

And I can't do much in there (or I don't want to look inside)

#

The other one I found was UText and LineBreakIterator that I heavily re-use when possible.

molten galleon
#

yeah, though luckily it's the LineBreakIterator allocation that's the majority of it, not even setText or preceding

#

finding sub-fonts is the only place I use a UText where I don't need to, just because I wanted to unify my code for handling UTF-16 for doing regression testing against ICU

#

but now that I'm confident I don't need to use ICU's stuff anymore I can just drop that

#

but UText is the best interface you have for the break iterators, and it's fine enough there

molten galleon
#

with the icu::LineBreakIterator creation factored out, hb_shape is about 80% of an 8 char string, with a scant few % (5 or less) in allocations for appending final data, get_sub_font, and SBAlgorithmCreateParagraph

#

still like 13% speedup just from getting rid of it lol

molten galleon
#

ack

#

yet another (several) annoyances I didn't account for

#
  1. SBParagraphDefaultLTR/RTL doesn't seem to correctly swap the run dir (I assume because these characters are strongly LTR), so to specify what I think are level overrides in ICU I need to explicitly specify base paragraph levels of 0/1 (equivalent to CSS direction: ltr | rtl)
  2. harfbuzz is actually smarter than I thought for my own good, and actually resolves glyph mirroring for me, however I think this ends up fine overall because the selection I posted above resolves to llo>, which is correct
#

#1 is a tiny bit annoying because the UBA spec has no definition for what a "default" level is supposed to mean...

#

ah

molten galleon
#

big data model refactor is done, L4 character mirroring is confirmed to be a non-issue, directionality overrides work, and tabs now work, also fixed an issue with string-terminating newlines

#

though what I'm seeing here is that I need to revisit subfont selection because it's able to use Twemoji for the numbers and that gets on my nerves

old harbor
#

can you render composite eomjis too?

#

like a female black vampire

#

i think thats vampire+darkskintone color + zwj + female sign

#

ah yeah

molten galleon
#

yeah I can

old harbor
#

where is the female coloured vampire? πŸ˜„

molten galleon
#

didn't realize you sent a specific emoji lol

old harbor
#

hehe

#

that is cool

molten galleon
#

glad I added clipboard support

old harbor
#

data point magic

molten galleon
#

but this is more of the macro scale, front-end of font picking where you match a named family to some file or files, not necessarily to figure out what file can render a substring

old harbor
#

somehow fonts feel as complex as operating systems, and browsers to me

#

AAA engines are nothing compared to that either hehe

molten galleon
#

making text "just work" is a huge part of browser magic

#

and OS magic, for that matter

balmy gulch
#

this is epic, but how complex of text can you render without dying?

molten galleon
#

idk, hypothetically anything

#

my coverage is pretty good, I can do emoji, I have inline markup for underlines, italic, bold, smallcaps, subscript, superscript (with synthetic glyph gen for all those if they're not available in any font files)

#

emoji with any joiners, zalgo text

#

I render tabs now

#

the single biggest thing I'm missing atm is vertical text, and I've been working up to it in my latest refactor

balmy gulch
#

nice

#

how do emojis work btw

#

are they like regular glyphs with color or something else

#

because if they are regular glyphs then they must have a whole lot of codepoints innit?

molten galleon
#

5

balmy gulch
#

no I mean

molten galleon
#

the first 2 are U+1xxxx which means they're 4 bytes I think

balmy gulch
#

points in the glyph

#

the actual unicode scalars you use to bezier the thingy

#

or is that not how this works

#

(my understanding is extremely limited)

molten galleon
#

oh yeah probably, atm I have freetype rasterize them precomposed because I've been too lazy to deal with too many actual rendering techniques

#

since I've been focusing mostly on layout

#

but you can also have freetype give you them as mono-color layers

balmy gulch
#

smh can't believe you didn't make harfbuzz and freetype from scratch

molten galleon
#

(really you get the outline 1 layer at a time + a color and you have to rasterize it yourself either through freetype or some other way)

#

the thing I always say is that it's amazing how much you still have to do even if you grab all the libs you need from the start

balmy gulch
#

but yeah cool stuff

#

yeah text is a lot of work

molten galleon
#

I literally have 4 giant dependencies and there's a ton I need to do on top of that

#

freetype+harfbuzz+icu+sheenbidi (which I technically wouldn't need if I wasn't stubborn about making everything work natively in utf-8 without utf-16 conversion)

#

it gets a shade more annoying because I have to make sure the data I generate from layout is optimal and reconciles with being able to use it for text editing

#

stuff like, e.g. pango is pretty full-featured but read only afaik

#

just shits finalized pixels to a render target

molten galleon
molten galleon
outer dune
#

Oh god that’s horrible to implement

molten galleon
#

yeah, you have to split the string into first char and the rest, use the base metrics of the text (line height and ascender) to figure out the glyph metrics of the first char, shape the first char, then shape the rest of the text in 2 pieces, determining the split where it falls off the first block

#

luckily this is way, way out of scope

#

also hyphenated word breaking

molten galleon
#

straight up can't find what firefox uses for vertical line height (or I guess line width, as it'd be in this case), I suspect they might just use the horizontal metrics

#

meanwhile

#

in chromium:

left junco
molten galleon
#

So it looks like what you're supposed to do, roughly speaking, is hit the TrueType vhea table for vertical ascent/descent https://learn.microsoft.com/en-us/typography/opentype/otspec180/vhea
and failing that, CJK fonts may provide a vertical advance via the OS/2 typographic ascender/descender, which itself is a little rabbit hole because it briefly covers the OTF concept of the "ideographic em-box", which luckily I can just skip the fancy stuff for because its secondary metrics are just em-width
https://learn.microsoft.com/en-us/typography/opentype/spec/recom#tad
https://learn.microsoft.com/en-us/typography/opentype/spec/baselinetags#ideoembox

#

whoever wrote the FreeType docs decided to copy paste this line for TT_HoriHeader and TT_VertHeader which sent me on a bit of a wild goose chase because this is actually true for horizontal metrics, and in CJK fonts the typographic ascender/descender affects the vertical ascent metrics, just not the secondary axis

outer dune
#

What's the fallback on fonts that do not support the field ?
Can't you just assume "easily" assume a sized glyph which is offsetted by it's normal height * size increment ?
Tho, it will affect the following lines x offset... UGH. I don't even know what it gives in Kanji for Top to Bottom lines.

molten galleon
#

I'm thinking either you use the line height as the line width, or just the em-width

#

what I'm going to do is find a font that does have vertical metrics and see which value is generally closer

outer dune
#

KEKW it breaks some renderers as expected

molten galleon
#

is that a browser

outer dune
#

Yes, but it's a little bit of a hack, I'm using <span> instead of css ::first-letter

#

Works great for Latin caps

#

Won't work for those with y-bearing though

molten galleon
#

at least if you use spans, because I think CSS can't use the spans as context for a line (but I may be wrong)

#

but the ascender and descender of my lines are max(ascender) and min(descender), so the large glyphs stay on the same baseline, at least

#

they look dumb, but don't collide

outer dune
#

I'm checking but :initial-letter is not fully supported on all browser. ::first-letter is though.

#

Yeah, CSS prioritise line-space to be the same. I have the same result as you (bigger gaps between baseline) in my engine

#

At least for <span>, it seems

outer dune
molten galleon
#

MS word also does this

outer dune
#

Yeah, I think it's CSS being dumb (Firefox implementation) πŸ€”

molten galleon
#

you should try it in a chromium based browser and see what's different

outer dune
#

Chrome KEKW

#

Somebody stole someone else implementation
"Lemme copy your code, I will change it I swear"

molten galleon
#

here's a snippet from the firefox source

#

everyone is copying everyone else because there's no unified body of work for how text should be rendered

#

with the exceptions of tidbits explicitly in the unicode or CSS spec

#

which isn't much for either

molten galleon
#

because mostly the biggest change is customizing the linebreak logic to break at a different width for the first n lines

outer dune
molten galleon
#

it's hard to say pango is wrong because idk what they did and didn't copy, I should honestly clone pango and snoop in it as well

#

I think a lot of linux distros use pango as the OS text display

#

at the very least, I believe gtk uses it and thus everything that uses gtk probably does too

outer dune
#

Yes they do use pango in most (all?) Linux distro. I guess not a lot of people uses that part of it (if it's wrong in pango).
It's quite rare to see such usage anyway πŸ€”

molten galleon
#

pango calls writing mode gravity lol

molten galleon
#

so it looks like generally non-CJK fonts don't have vhea, but the vertical fallback for non-CJK fonts is the standard ascender/descender, not the emwidth or typographic one

#

I should actually measure what the browsers choose for CJK line widths, because based on this (the metrics from Noto Sans CJK jp) they spec vertical line height as = to emwidth

molten galleon
#

yet another minor annoyance about vertical text is that 0 on the Y causes the offset of the glyph to end up above the line, so you still need the normal ascent either way

old harbor
#

am i stupid or is discord too stupid to render arabeeque?

#

i wanted this

#

it fails to render Ψ§Ω„Ω’Ψ­ΩŽΩƒΩŽΩ…

#

uh

#

wait

livid patrol
#

gm habibi

old harbor
#

πŸ‡¬πŸ‡² habibi πŸ™‚

#

discord nick length is fixed : ( it cut off

#

it cut "the ruler" off hehe

livid patrol
#

Can you type something again

old harbor
#

something again

livid patrol
#

You did it too quick

#

I'm trying to get the "x is typing" thing

#

Ty

old harbor
#

ahahsdjalkjalksjdlkjaskljlkjdflksdjfklwjerlkwjlksdjfkljasdkljlkwjerkljsldkfjslkdfjlkjwerkljlksdjflkjsdfkjlkasjdlkjsdflksjdflkjsdflkjslkdjflksjdflkjslkdjflksjdflksjdflkjsldkfjlskjdf

#

πŸ˜›

livid patrol
old harbor
#

ja it cut "the ruler" off

#

because the name is too long

livid patrol
#

mixed rtl and ltr is funny

old harbor
#

heh it do be

#

Abd al-Malik

#

that does look neat on the dark background

molten galleon
#

this is what I see

old harbor
#

and with the larger fontsize

livid patrol
#

It do

old harbor
#

yep i oh

molten galleon
#

ah neat its arabic combining characters

old harbor
#

interesting

#

i also like how United Arab Emirates uses those more straight looking characters for their stuff

#

like some sort of monospace πŸ™‚

#

"Allah"

#

almost looks like some simple power tower in arabic heh

#

latin alphabet is so boring

#

do you guys understand the word sprachbund?

#

thats english apparently

livid patrol
#

no

#

"a linguistic area"

old harbor
#

i should have studied languages

#

rather than doing computer bs πŸ™‚

molten galleon
#

for some reason linguistics attracts programmers

old harbor
#

: )

outer dune
#

It's because we are just typing and talking about text all day frogs

livid patrol
#

C++ is just another lingu to istic

molten galleon
#

ok time to wake this thread up for some new features

#

so I was thinking how in order to append the results of layouts together (e.g. if you wanted to insert another GUI in the middle of your text, like I do), you would need to provide an initial offset to the first line to join up where the last text block ended... so I thought, "why not allow the params to have an offset and count so you can implement drop caps", and then I thought "wow it'd be crazy and overengineered if I just let you pass a callback to dynamically resize the lines"

so that's what I did

#

so now drop-caps are totally possible

#

and then following that up, I added text truncation (i.e. stop laying out if you go out of bounds), which I needed for a different feature, but the combo of text truncation and line resizing means that you can actually relayout text around obstacles very dynamically

#

(imagine I actually put the big E here )

#

that little implementation is a bit buggy because I half assed it for a proof of concept

#

but the important part is that I can

left junco
#

insane stuff

#

We need to think up some more exotic typesetting behavior that games might benefit from

molten galleon
#

when I was putting this together I could only come up with 2 things it reminded me of: greeting cards (where you have text conforming around some clipart) and newspapers with images slapped between the text

#

but my actual usecase is driven primarily by 2 things, 1 is embedding frogapprove images in frognant chat without needing a whole-ass typeface for them, and the other is things like this

#

where you have dynamic controls in your text

molten galleon
#

one more feature in this series: dummy blocks

#

I considered this a while ago, but basically I reserved a special font family that's not the invalid family, which gets special-cased to get skipped in layout and rendering, and I can provide an extra width and height for

#

its function sort of overlaps with setting the line offsets/limits/truncation, but it's much more specifically the sticker/custom emoji feature, since it can behave like any other character with a little bit of bridging to the editor features

#

and of course the rects are drawn in afterwards, and can be anything (more UI controls, images, etc)

#

it also behaves a little bit more like a microsoft word image insertion into text, rather than a tool for morphing text around a first-class asset

molten galleon
#

So I tried out RmlUi for the first time just now, I have it scoped out for a secret project