#J40 discussion

1 messages ยท Page 1 of 1 (latest)

ruby fable
#

so, after wasting one full day with build systems, my next topic is probably a proper render system

#

essentially I would need something similar to libjxl's render_pipeline ๐Ÿ˜ฎ

unkempt knot
#

Do you want to be able to do streaming decode with progressive rendering and all that? Because if not, things are quite a bit simpler...

ruby fable
#

I already have lots of different "planes" (typed bitmaps, really) around, so I figured out that I already need some sort of pipeline anyway

#

for now, J40 decodes global modular channels (if any) and produces per-group XYB samples that are immediately converted to sRGB before integrated into the aforementioned global modular channels

#

there are multiple issues with this architecture, to name a few J40 in VarDCT always creates u16 planes for color channels, while ideally they should be f32

#

and sRGB conversion should happen a lot later

#

honestly speaking having a pipeline is that I can easily verify and adjust how planes are combined and converted

#

that's a main goal of pipeline, progressive rendering is a nice bonus but ultimately a side effect ๐Ÿ™‚

unkempt knot
#

I think you're probably right

#

@minor lark is the render pipeline architect in libjxl so he might have some tips...

ruby fable
#

I had the following in my multi-month-old notes:

render ops:
J40_OP_LOAD planeref (-- plane)
J40_OP_COPY x y sx sy (dst src -- dst)
J40_OP_IDCT coeffref (-- plane)
J40_OP_UNSQUEEZEH (avg residu -- merged)
J40_OP_UNSQUEEZEV (avg residu -- merged)
J40_OP_UNRCT 
#

as you can see this is a render graph encoded in forth-like IL

#

I'm yet to fully figure out which ops would be needed and which ops can be axed

ruby fable
#

today in J40: fuzzing! which means that I have really... a lot of things to fix.

ruby fable
#

one unorthodox possibility is to use Rust as a testing infrastructure (!)

prime grail
ruby fable
#

oh thank you for nothing that.

ruby fable
#

today in j40: now pretty much every fuzzer outcome is panicking at // TODO midbits can overflow! comment in j40__hybrid_int, which means I need to do something with this...

ruby fable
#

while not guaranteed, you can always file an issue.

#

to me the biggest blocker is that we don't have any freely available copy of ISO/IEC 18181-2

#

other aspects of container format can be inferred from libjxl code, but jbrd is complex

ruby fable
ruby fable
#

okay, I polished and pushed most commits in my working copy

#

you can see how I usually work, I start by tackling a particular task, solve any side tasks as needed, split commits for ease of reading and/or reviewing and push a bulk of commits.

#

(this is a bad strategy when that task turned out to be much larger than imagined though, I would have to detect this to avoid stalling)

#

yes, lots of edge cases!

#

it shouldn't affect valid images, except for the very last commit which finally limits the input to the Main level 5 (API pending).

ruby fable
#

today in j40: dealing with fallouts in MSVC.

ruby fable
#

you can run make and get dj40.exe if you have Visual Studio (!)

#

possibly, I think it should be done via CI and this commit is a part of preparation

ruby fable
#

okay, I've pushed all local commits (which took a while as I had tons of them in the queue)

#

and you can now fuzz this damn thing! well, not too long, my current corpus crashes after 500K iterations :p

#

(after fixing a lot of low-hanging fruits)

unkempt knot
#

Of course robustness is always good to have, but maybe it's not super critical for the main use cases of j40 โ€” which I imagine would be things like games that don't want to introduce a dependency on libjxl to decode game assets.

ruby fable
#

Yeah, but it is far less than ideal that just a few minutes of fuzzing finds yet another bug.

#

And fuzzing does help for finding leaks, which would be equally important even for trusted inputs as well.

unkempt knot
#

true

ruby fable
#

the latest bug is confirmed to be a swapped grows vs. gcolumns (i.e. groupwise height and width). it turns out that my quick test corpus doesn't have any image that has more than 1 groups and grows and gcolumns differ... facepalm

ruby fable
#

yet another bug:

-         for (i = 0; i < 4; ++i) J40__TRY(j40__modular_channel(st, &m, i, sidx2));
+         for (i = 0; i < m.num_channels; ++i) J40__TRY(j40__modular_channel(st, &m, i, sidx2));
#

basically modular subimages can have transformations and can have a different number of channels to decode

#

yet, yet another bug or something else: palette with nb_colours == 0, is it even possible????

#

ah it is possible, because every palette index will be synthetic, oh okay.

#

so it's just J40 not handling this edge case.

unkempt knot
#

yes, the default palette is quite useful: everything with index >= nb_colours maps to two color cubes, and everything with index < 0 maps to default delta palette entries

#

so you can actually encode a pretty nice image without specifying any custom palette colors

ruby fable
#

now I have to deal with a possibility that those zero-width images can be further transformed ๐Ÿ˜›

unkempt knot
#

oh right we still nominally insert a 0x0 palette channel in that case, just to keep the invariant that every palette transform adds one metachannel

#

empty channels do not end up getting encoded but you do need to take them into account for the channel indices

ruby fable
#

for example there can be 3 palette metachannels and they can undergo RCT... which should be valid but may need a special handling

unkempt knot
#

squeeze can also introduce empty channels btw

ruby fable
#

yeah, a large enough number of transforms

#

and I think it will be hairier than other transforms

unkempt knot
#

yeah rct on channel-palettes could actually be useful, current encoder doesn't do that but it would help a bit for the case where you e.g. have a 10-bit image encoded nominally as a 16-bit one where only 1024 sample values actually get used; in that case the encoder will currently produce 3 channel palettes that are identical but still encoded 3 times; doing an RCT on that would reduce it to 1 channel with some entropy and 2 channels that are just zeroes.

#

so we kept that possibility open in the spec even if the current encoder isn't doing it yet

#

but all the transforms (rct,palette and squeeze) have to operate either on metachannels or on real channels but not a mix of them

ruby fable
#

I came to realize that the fuzzing corpus from J40 can be used against other implementations including libjxl

#

I guess libjxl already has a lot of them though

unkempt knot
#

because that introduces too much weirdness: it's then no longer clear what channel is meta and what channel is nonmeta

#

yes, you could probably get a fuzzing corpus from libjxl to try it on j40 too

#

in libjxl we also have a funky way to make fuzzing more effective, which is to have a variant of the encoder/decoder where entropy coding is skipped and things are just raw bits that the fuzzer can flip directly โ€” or at least that's how I remember it, @minor lark or @golden basin probably know more about this

ruby fable
minor lark
#

(just for the corpus)

#

also removing the check for the final state

unkempt knot
#

ah right, huffman with 8-bit symbols so it's actually still a valid bitstream, just poorly compressed and symbols nicely byte-aligned, right?

ruby fable
#

Heh, I thought the thread is dead. Will reply after I get back home.

#

Spoiler: not that good I think ๐Ÿ˜‰

#

I believe GCC is a bit more aggressive on autovectorization, which J40 heavily relies on

ruby fable
#

today in j40: after a few days of fuzzing I've reached the point where I need a radically different corpus or source code modification to continue fuzzing, which seems like the perfect moment to stop active fuzzing ๐Ÿ™‚

#

so far fuzzing covered almost all modular code but not much vardct code, as expected

#

I tried to manually put known vardct images to the corpus but that didn't help much, possibly because those inputs are too large

unkempt knot
#

So what's next? Implementing the missing coding tools?

ruby fable
#

for now, finishing up restoration filters is the highest priority and that probably involves rendering

ruby fable
#

oh wow, it's based on ccgo, which is kinda unexpected

ruby fable
#

@boreal cairn will want to be pinged

boreal cairn
#

thanks! I gave my 2 cents to the issue.

#

I don't know enough about J40 or JXL in general to help with the real details, but happy to work out some of the architectural kinks with you if you wanna chat about it. Obviously as soon as their is code I'm happy to contribute!

#

I didn't get to work on my own implementation in a long time now, I will 100% make time for this sooner or later, so either way I'm gonna either contribute to J40 or roll my own thing, whatever happens first ๐Ÿ™‚

ruby fable
#

back from a bit of vacation, playing too much Horizon: Zero Dawn (what year is this), thinking about rendering again

ruby fable
#

I would say that libjxl will be lightweight in this case, because you absolutely need NEON-specific optimizations that are in libjxl

unkempt knot
#

The ARM build of libjxl-dec is about 200kb iirc

unkempt knot
#

Maybe has debug symbols still in there or something?

tender heath
#

Freely suggested improvements to j40 makefile to increase portability and readability and brevity:

CFLAGS     = -O3 $(CFLAGS_WRN)
CFLAGS_DBG = -DJ40_DEBUG -g -Og $(CFLAGS_WRN)
CFLAGS_WRN = -W -Wall -Wconversion -Wc++-compat
LDFLAGS    = -lm
CLANG      = clang

dj40: dj40.c j40.h extra/stb_image_write.h
    $(CC) $(CFLAGS) $(LDFLAGS) -o $@ dj40.c

dj40-cxx: dj40.c j40.h extra/stb_image_write.h
    $(CC) -xc++ $(CFLAGS) $(LDFLAGS) -o $@ dj40.c

dj40-o0g: dj40.c j40.h extra/stb_image_write.h
    $(CC) $(CFLAGS_DBG) -fsanitize=address,undefined $(LDFLAGS) -o $@ dj40.c

j40-fuzz: extra/j40-fuzz.c j40.h
    $(CLANG) $(CFLAGS_DBG) -fsanitize=fuzzer,address,undefined $(LDFLAGS) -o $@ extra/j40-fuzz.c
#

It should be pretty much equivalent...

tender heath
#

Since it's an all-in-one-file lib the makefile is probably not that important anyways.

unkempt knot
#

@ruby fable any plans to resume work on j40 at some point?

ruby fable
ruby fable
#

@unkempt knot by the way, I think jxl-oxide already surpassed what I wanted to achieve by J40 and wonder if I should keep working on J40 as I originally intended

#

specifically there were two goals I had in mind: producing a complete reimplementation from the spec is one, more or less achieved by now (not by J40, of course)

#

the second goal was to provide a minimal ground for working with JPEG XL

#

the minimal ground here means, for example, a test suite completely independent from libjxl

#

I expected J40 will eventually need them anyway and it might be easier to produce such one from J40 and not from libjxl

#

that's another goal, and I'm still unsure the optimal way to achieve that

#

it was another reason I didn't have much progress recently, if I had some concrete plans maybe I could have tried to make one (I indeed had other side projects that were going very slowly but still steady in the same period), but I didn't have any actionable plan

#

so if libjxl had some (ideally concrete) ideas that might be hard to do themselves, maybe I can look at them instead

ruby fable
#

rethinking about J40 rn, mostly about how to restructure it to avoid known pains

#

(yes, I've got a new job since then and a heavy milestone has been passed, so I now have some peace in my mind.)

#

(still recovering from the mental health issues though, mine is not as critical as others' but nevertheless medications really helped, anyway)

#

one of the main PITA in J40 was the cleanup path, which is... uh... always painful in C to be frank

#

it greatly relates to the testing strategy as well, because there should be a guarantee that the cleanup fully restores the known-good state

#

I knew this from the beginning, but after months of hiatus with fresh eyes I feel it more acutely

#

I'm starting to think about a lightweight preprocessor (still written in C) that help quite a bit, but not sure

#

my initial idea was to make the source code itself written in a non-portable C (i.e. allowing some GNU extensions) but it can be converted to a portable C with that preprocessor

#

but __attribute__((cleanup)) was something inferior compared to what I actually wanted, so... I'm not sure how I develop this idea further

fervent crescent
#

I always saw other projects implementing cleanup by prologue and epilogue macros.

fervent crescent
#

I don't think there's any other options to do it portably.

ruby fable
#

yes, the current J40 also does this, but it is increasingly harder to deal with new features compared to other languages like C++. (but I don't like to write C++.)

#

let me give a concrete example. I eventually want to support a progressive decoding, i.e. the decoder can signal increasingly precise renders at any time.

#

if the language supports a coroutine this is really an easy task, because you can park the decoder after the last available input and resume it whenever you want.

#

but it is really annoying and error-prone to write an equivalent state machine in C.

#

Brotli decoder is composed of tons of state machines with delicate invariants, and it is possible as you can see, but it is inhumane to be honest

#

and JPEG XL is many times more complex than Brotli (in fact, JPEG XL contains a big portion of Brotli)

#

so both J40 and libjxl are designed to roll back to the known-good state when the input is not enough.

#

for example, if the signature has been read but the image header cannot be fully read, you roll back to the end of the signature and next time you will try to re-read the image header.

#

this is suboptimal (if you supply a single byte every time, it will decode the same thing over and over) but practically easier to manage...

#

...if it is easy to roll back. which is mostly the case in C++, while it is not even trivial in C.

#

so in C you can't have an easy coroutine nor an easy state rollback. what to do then? this is my current question.

#

many C projects can cope with macros because they generally have (or constrain themselves to have) a small number of exit conditions per each function.

#

for example:

#
int foo(state_t *st) {
    // precondition: st->a thru st->c are not initialized
    st->a = malloc(sizeof(st->a));
    if (!st->a) goto error;
    st->b = malloc(sizeof(st->b));
    if (!st->b) goto free_a;
    st->c = malloc(sizeof(st->c));
    if (!st->c) goto free_a_and_b;
    // postcondition when return value is 1: st->a thru st->c are all valid
    return 1;

free_a_and_b:
    free(st->b);
free_a:
    free(st->a);
error:
    // postcondition when return value is 0: st->a thru st->c are not initialized
    // and no memory has been leaked
    return 0;
}
#

this is a contrived example, people know this is really wordy so they actually use shortcuts, but you'd get my point

#

people generally write the following instead (and J40 does this as well):

int foo(state_t *st) {
    // ensure that `error` is always free to jump
    st->a = NULL;
    st->b = NULL;
    st->c = NULL;

    st->a = malloc(sizeof(st->a));
    if (!st->a) goto error;
    st->b = malloc(sizeof(st->b));
    if (!st->b) goto error;
    st->c = malloc(sizeof(st->c));
    if (!st->c) goto error;
    return 1;

error:
    free(st->a); // safe to call for NULL
    free(st->b);
    free(st->c);
    return 0;
}
fervent crescent
#

How about using a callback cleanup mechanism?

Basically you introduce the concept of a callback structure, and a stack of those. Add callbacks to the structure and push it to the stack. Popping them from the stack would call those callback functions.

How does it sound?

ruby fable
#

first, C doesn't have a portable nested function (sadly). and second, I don't know how many of them are required :S

#

for now I'm currently considering about an arena-based approach, trading the strict memory usage with a convenience.

#

but the arena will not fully solve my problem because I have so many states...

ruby fable
#

to be accurate, if there is a way to make a non-portable but working equivalent of Go defer, I'll seriously consider that

#

I'm okay with the non-portability if it i) can be mechanically translated later and ii) works as is with GCC and clang

#

that would comfortably cover pretty much every use case

fervent crescent
#

Plans to have it in C23 got cancelled.

#

It's effectively pushed back to C30.

ruby fable
fervent crescent
#

Your only non-portable option in that case would be GCC's cleanup attribute, which you did try.

But yes, make sure that the source file is machine-preprocessable to portable C code if you decide to go with that option.

broken flint
#

@ruby fable i know quite some time has passed since this discussion stopped, but I normally use this macro to implement a Go-like defer in C:

#define PPCAT2(n,x) n ## x
#define PPCAT(n,x) PPCAT2(n,x)
#define DEFER2(stmt, counter) \
    void PPCAT(__cleanup, counter) (int* u) { stmt; } \
    int PPCAT(__var, counter) __attribute__((unused, cleanup(PPCAT(__cleanup, counter ))));
#define DEFER(stmt) DEFER2(stmt, __COUNTER__)
#

which is based on __attribute__((cleanup)) which you mention above, so I'm not sure if this is something you already attempted or not

#

Sample usage:

void *buf = malloc(128);
DEFER(free(buf));
#

this works on GCC only though as clang doesn't implment nested functions in C

#

I have also one question; is there a way to use cjxl to force to encode files that j40 will know how to decode? Or put in other terms, is there a way to disable features that j40 doesn't support, so that the current version of j40 can successfully be used without fear of generating images that can't be later decoded?

ruby fable
#

for j40-compatible images, any low enough level (I think it was up to -e 7?) works I think. see the README.

broken flint
#

Thanks, should I expect some runtime error for images using unsupported features, or just a corrupted picture?

ruby fable
#

modulo any bugs, it will probably reject them.

#

I haven't touched J40 for a long time now though, so there will be several bugs around...

#

I'm slowly investigating several paths to revive the project, but all paths depend on how to reliably write a C code without making it too large, and that's a big problem

ruby fable
#

I will revive J40 if I have a way to write a C code roughly equivalent to the current J40 code without exactly using barebones C

#

if Zig could have been translated to C I would have picked it, but AFAIK Zig-to-C is not supported

broken flint
#

in my humble opinion, you're already using several clever macro tricks in the current codebase to push the C boundaries

#

i don't think there's much more than can be done in the realm of C

#

i wish clang didn't decide to skip nested functions in C, that would have helped me greatly, but alas they went for that decision years ago and haven't looked back

#

that would solve defer, but then if you want to have more solid coroutines or stuff like that, I guess that's not something that really matches C

ruby fable
#

yeah, that is my current dilemma ๐Ÿ˜ฆ

broken flint
#

i guess translation from a higher level language is one option as you said

ruby fable
#

(Wuffs is truly great and I would like it to be more widespread, but it doesn't fit my use case)

broken flint
#

yes i like wuffs too but i think it fits more into the realm of safety, not sure if that's also your focus

#

i haven't really evaluated that as a higher level langauge by itself

ruby fable
#

I'd like to ensure that my library is reasonably safe, but not necessarily in the formally verifiable way (which is... hard I know)

#

it doesn't look so but there are several guiding principles throughout the J40's code to avoid usual problems, like consistent coding styles

broken flint
#

yeah, though some fuzzying will bring you somewhere into a safe area

ruby fable
#

J40 has been surely fuzzed, but it will need an additional restructuring to make it fuzz-friendly

#

I think the current fuzzing attempt covers roughly a half of the entire code

broken flint
#

I will test decompression speed on my target platform for this project, which is a Nintendo 64; that'll give me a first number to see whether a full porting is viable or not

ruby fable
#

oh, that would be adventerous to be sure ๐Ÿ˜‰

#

does libjxl compile in that platform after all?

broken flint
#

i haven't tried; I have been working on that platform for quite some time and I ported several modern formats like h264 and opus to it. I'm looking for a solution for lossy encoding, so i figured it out i'd start from jpegxl and move back in case of trouble ๐Ÿ™‚

#

notice that porting doesn't mean only recompiling (that's the easy part), because optimizing for n64 means offloading part of the calculations to a DSP with SIMD instructions that must be programmed in assembly

#

so the work ends up being similar to a task lilke "adding arm+neon acceleration to a C codebase" or something like that

ruby fable
#

is it just out of curiosity or do you have some concrete goal like a homebrew game?

broken flint
ruby fable
#

does JPEG work? if it isn't, there is not much chance for JPEG XL either (because it includes a baseline JPEG as a part of backward compatibility)

broken flint
#

yes, jpeg was also used by commercial games back at a time

ruby fable
#

oh, that's good to hear

broken flint
#

we are pushing the boundaries more than they were ever able to do, this is why i was targeting something more modern

ruby fable
#

I think a VarDCT subset of JPEG XL might be actually viable enough

#

that is, a single-frame lossy subset

#

(JPEG XL is designed for many more use cases, so you don't need the full library)

broken flint
#

yeah probably. BTW I have ported mpeg1 already so for jpeg actually i should have most of the blocks

#

the DSP with SIMD is fixed point though, i think VarDCT is defined with floating points?

#

H264 is luckily integer only, that helped quite a bit ๐Ÿ™‚

#

and Opus reference implementation supports both floating and fixed, that also helped

ruby fable
#

yeah, J40 also assumes a working floating point impl

broken flint
#

there are floating points on the CPU; it's more about the parts that I want to offload to the DSP ... those would have to be converted to fixed point

unkempt knot
# ruby fable I think a VarDCT subset of JPEG XL might be actually viable enough

You can't have fully VarDCT-only since even VarDCT uses Modular for the LF image etc, but yes, we're thinking about a "lightweight" profile that would also have a hardware implementation. It would restrict the use of Modular to put constraints on the kind of MA trees and predictors that can be used, probably not have extra channels at all, definitely no splines and patches, etc. How it will look will depend on what the hardware folks are willing to implement, it's still too early to tell.

ruby fable
unkempt knot
#

There's a subset of Modular that is just no-context with uniform West prediction, that's basically what JPEG does. Probably we can define a somewhat larger subset of Modular that still gives some of the gains while still being simple enough for a hardware implementation, where you obviously don't want to deal with arbitrary MA trees and funky predictors like the self-correcting Weighed predictor.

ruby fable
#

yeah, that might be a possible alternative

#

the point is that, I think N64 will need some specialized format and/or subset for a desired performance

broken flint
#

I think itโ€™s fine, we usually control both sides of the pipeline (encoding and decoding)

#

For h264 I compress videos only in baseline profile and I disable the in loop filter for instance as that creates performance issues

#

For opus I select Celt only (disable SILK) and fix a few internal parameters leaving a bit less flexibility to the encoder

#

So yes Iโ€™m ready to disable a few things at encoding time, thatโ€™s not an issue for my use case

#

Im just wondering if I should base the work on j40 or not given that its development is paused; I dont necessarily need it to be maintained and improved if whatโ€™s there is sufficient, I just fear of bugs

quasi carbon
#

@broken flint I think I can answer that, I had trouble until I settled for -e 4

#

(decodable with j40.h)

broken flint
#

@ruby fable i have a question on j40; can you please explain at the high level why j40__advance is designed as a coroutine? In what case it is necessary for it to yield leaving the work uncompleted?

ruby fable
broken flint
ruby fable
#

nope, it was never implemented to this day

#

I should mention that this approach is also similar to what libjxl does incremental parsing

#

(which inspired my design)

#

I already knew that a resumable coroutine would be necessary for incremental parsing in general, but C is too primitive to support that in a pleasant way, so I used that coroutine macro hack to retain a reasonable chunk of resumable routines

#

that said, in retrospect I think it wasn't enough because each individual routine does have to roll back perfectly on error, which was quite hard to do in general (especially in C!)

broken flint
#

i think it's complicated in general

#

a stackfull coroutine would have a much easier life of course

ruby fable
#

if we are not using C... ๐Ÿ˜‰

broken flint
#

yes but on the other hand, C is the standard for embedding and it is like that specifically because it is a simple language ๐Ÿ™‚

#

so of course one has to take compromises here; in my case for instance i don't need incremental parsing so i will just remove that

jagged iron
#

@ruby fable Hello there how's things going with the library?

ruby fable
#

I'm currently trying to revive the project with some AI sprinkles, let me see whether it would work or not...