#Convenient Containers: Ergonomic C Generics

33 messages · Page 1 of 1 (latest)

mystic nymph
#

Hello Together C & C++ 🙂

I'd like to share my C generic data-structure library Convenient Containers (CC).

CC's main advantages are summarized in the Rationale section of its README. In short, using some novel techniques, the library is able to provide a range of fully typesafe data structures with a generic API agnostic to both container type and data types, without requiring the user to make any boilerplate pre-declarations for every container/data type combination. Hence, CC containers function much like containers in languages with native support for generics (such as C++):

#include <stdio.h>
#include "cc.h"

int main( void )
{
  vec( int ) our_vec;
  init( &our_vec );
  push( &our_vec, 5 );
  printf( "%d\n", *get( &our_vec, 0 ) );
  cleanup( &our_vec );

  map( int, float ) our_map;
  init( &our_map );
  insert( &our_map, 5, 0.5f );
  printf( "%f\n", *get( &our_map, 5 ) );
  cleanup( &our_map );
}

Besides ergonomics, performance is also important. CC's hash tables have performed very well in benchmarks both by me and by others. Its red-black trees have also benchmarked on par with their C++ STL counterparts.

Some of the techniques upon which CC relies are explained briefly in its FAQ and more thoroughly in a series of Reddit comments that I made back when the library was first released. I am working on articles to describe these techniques more systematically, the first of which I published earlier.

GitHub

A small, ergonomic generic container library. Contribute to JacksonAllan/CC development by creating an account on GitHub.

mystic nymph
#

PS: Sorry about the gigantic logo! I didn't realize my thumbnail image would also appear inside the post.

honest bloom
#

looks cool

#

only thing i would say is that on some unix platforms, gcc is also aliased to cc, so idk, might get a little confusing

hidden wind
#

I am curious as to how this would break if you were using MSVC with a C compiler version less than C23:

#ifdef __cplusplus
#define CC_TYPEOF_XP( xp ) std::decay<std::remove_reference<decltype( xp )>::type>::type
#define CC_TYPEOF_TY( ty ) std::decay<std::remove_reference<decltype( std::declval<ty>() )>::type>::type
#elif __STDC_VERSION__ >= 202311L // C23.
#define CC_TYPEOF_XP( xp ) typeof( xp )
#define CC_TYPEOF_TY( ty ) typeof( ty )
#else // GNU.
#define CC_TYPEOF_XP( xp ) __typeof__( xp )
#define CC_TYPEOF_TY( ty ) __typeof__( ty )
#endif
#

also the amount of macro template stuff I am seeing is hard to comprehend nooo

mystic nymph
#

As for all the macro code, I do my best to explain everything through lengthy in-code comments. However, the macros are undoubtedly pretty labyrinthine. That's why I want to consolidate the documentation of these techniques into one or two more blog posts/articles. To understand the code in general, we need to be familiar with three or four main techniques, so these articles would serve as a primer for anyone wanting to explore it.

#

At the moment they are only documented loosely in the Reddit comments and initial article that I linked to in the original post above (besides, of course, the comments inside the header itself).

hidden wind
#

Thank God they added that because it was annoying me

mystic nymph
#

My general philosophy with this library is that more complex code inside the library is justified if it simplifies the user's application-level code. This is, of course, debatable, and that's why I also like to link users to alternative libraries that I also consider to be good and that use a more traditional approach, e.g. STC, Klib, M*LIB.

#

Those libraries have sound designs, responsive developers, and did well in my own performance benchmarks.

mystic nymph
#

The biggest problem now is not MSVC but Clang, which compiles _Generic expressions 3x slower than e.g. GCC.

mystic nymph
# honest bloom looks cool

Good point. But I guess it's okay. The library has been out for a few years now, and no users have brought up this naming issue so far.

green token
#

Hey Jackson! I was actually debating whether or not I should send you an email about CC, so it's cool to see you here 😄 I've been toying with non-glib C (the GNOME ecosystem is very different from the broader C ecosystem), and was really inspired by what you did with CC. In a small project I worked on I ended up implementing a similar API: https://godbolt.org/z/TK8dPM5xf

This has some problems though. My naive vec macro isn't really usable as a type currently, as it uses an unnamed struct. You can't use those in function parameters as the struct definition is only valid for those types, and you can't assign a vec(int) to another vec(int) because they're technically different types in the first place. CC seems to get around this in your own macro setup. My vec(int) expands to:

  struct {
    int *items;
    int size;
    int cap;
  }

Yours expands to:

typeof(typeof(typeof(int) (*)(typeof(size_t) *)) (*)[1])

Which evaluates to:

int (*(*)[1])(unsigned long *)

How that type works is quite opaque to me 😅 Are you planning to make a part 2 to your article that explains how generic types like this work?

mystic nymph
#

Hi BrainBlasted!

This has some problems though. My naive vec macro isn't really usable as a type currently, as it uses an unnamed struct. You can't use those in function parameters as the struct definition is only valid for those types, and you can't assign a vec(int) to another vec(int) because they're technically different types in the first place. CC seems to get around this in your own macro setup.

Right, the need for the user to predeclare types – whether via a macro, via typedef in conjunction with a macro (as in your code), or by repeatedly including a pseudo-template header with different "arguments" defined as macros – is the primary problem that CC solves. At the highest level, that solution looks something like this:

#

_ _
• A container’s metadata (e.g. the size and capacity of a vector) is stored in the same heap allocation as the data it contains (e.g. the elements of a vector). So CC does use a struct (namely cc_vec_hdr_ty) that looks similar to your vector struct, but it stashes it immediately before the vector’s actual data, rather than supplying it directly to the user as his or her handle to the container.

• The user’s handle to a container is a pointer to the dynamic allocation storing a container’s metadata struct and actual data.

• The exact type of that pointer (which doesn’t matter when it comes to accessing the data and metadata, notwithstanding some alignment constraints) is chosen by the library in a way that allows it to later infer everything it needs to know about the container – namely the container type (e.g. vector, list, map, etc., which CC's code refers as the "container ID"), key type, and element type – just from that pointer at compile time. This is, for example, the int (*(*)[1])(unsigned long *) type that you’re seeing. Because this pointer is a "normal" C type, users can happily pass it in and out of functions etc. without typedefing it.

• API macros infer the container type from the container handle and dispatch to the relevant internal container function. They also typically infer the key and element types from the handle and pass information about those types – i.e. their size, alignment, and pointers to their associated hash, comparison, and destructor functions – into the selected function. To make this whole process easier and faster to compile, internal container functions that can be called by the one API macro – e.g.cc_vec_insert, cc_map_insert,cc_list_insert, etc. – all have the same signature. Hence, container functions often have many unused parameters, but because these functions are declared as static inline, the compiler can just optimize those parameters/arguments away (and hopefully inline the whole function too).

#

_ _
For an explanation of how exactly CC builds a container handle type and then infers container and data type information from it, see these two Reddit comments. Inside the library’s code, we’re looking at the this section.

#

_ _

Are you planning to make a part 2 to your article that explains how generic types like this work?

Yes! I think providing a more methodical documentation of how the library works will be important for its long-term success.

#

_ _

In a small project I worked on I ended up implementing a similar API

I had a look over the linked code. Looks good 🙂 Here’s a few comments:

#define vec_push(self, item)                                                   \
  if ((self)->size == (self)->cap) {                                           \
    (self)->cap = (self)->cap == 0 ? (self)->cap + 1 : (self)->cap * 2;        \
    (self)->items = realloc((self)->items, (self)->cap * item_size(self));     \
  }                                                                            \
                                                                               \
  (self)->items[(self)->size++] = item;

The conventional way to implement the put-all-container-logic-directly-into-macros approach is to wrap the code in each macro in a do{ … }while{ 0 } loop. This ensures that the macro invocation behaves like a statement. To see the problem, just imagine what bad things would happen if vec_push, as it currently appears, is called as follows: if( foo ) vec_push( &v, bar );. An even better solution is to figure out how to make all the code that lives inside a macro into one big parenthesized expression so that the macro invocation behaves like a function call (e.g. it can return a value and be used inside expressions). That’s what CC does.

// C23's `typeof` helps us immensely by allowing us to get types with `typeof`
// instead of having to know ahead of time in each macro expansion.
#define item_size(vec) sizeof(typeof(*(vec)->items))

It’s not clear to me why typeof is needed in this instance. sizeof(*(vec)->items) should give the same result.

#

_ _

#define vec_cleanup_destroy(self, destructor)                                  \
  ({                                                                           \
    for_each((self), item) { destructor(item); }                               \
    free((self)->items);                                                       \
  })

Here we have a statement expression, but it’s not clear to me why this GCC/Clang extension is needed here. By the way, if we allow ourselves to use statement expressions then we can do everything CC does with far fewer tricks and workarounds (but we lose MSVC support).

#

In any case, just let me know if you'd like me to elaborate further on any particular part of the library.

green token
#

Thanks for the comprehensive response! It will take me a bit to digest, but I really appreciate it. And thanks for the comments as well 😄

An even better solution is to figure out how to make all the code that lives inside a macro into one big parenthesized expression so that the macro invocation behaves like a function call (e.g. it can return a value and be used inside expressions). That’s what CC does.

I tried toying with that for a while, but couldn't come up with something that compiled after a bunch of iterations. When I revisted the code to try to wrap macros properly, I ended up feeling that the GCC/Clang extension for statement expressions was the most satisfying solution. That's primarily because I don't know all of the other tricks 😅 I'll read that do...while article for sure, in addition to your reddit comments.

It’s not clear to me why typeof is needed in this instance. sizeof(*(vec)->items) should give the same result.

I think it was required in an earlier iteration of these macros, but after restructuring the remaining typeof() wasn't actually required, so thanks for catching that 🙂

mystic nymph
#

I tried toying with that for a while, but couldn't come up with something that compiled after a bunch of iterations.

Your main friend in this regard is the comma operator. But it is possible to partially emulate statement expressions inside a macro in standard C if your container handle is a pointer (in my case) or a struct containing at least one pointer (in your case). This technique allows us to declare new variables in an expression, but not do e.g. loops (you need to dispatch to a function for that purpose). CC uses this technique, too. Let me know if you'd like me to elaborate.

#

By the way, you're aware that C23 relaxed struct tag rules to allow redefinitions of the same struct with the same tag and making all these definitions compatible, right? Using that feature, you can make your vector-declaration macro something like this:

#define vec(type, tag) \
  struct tag {         \
    type *items;       \
    int size;          \
    int cap;           \
  }                    \

In C32, this would allow you/the user to avoid typedefing the struct as long as they consistently supply the same tag argument to the macro.

green token
#

Oh, neat! It seems to produce warnings though: https://godbolt.org/z/Gfd3Khxcq

Would definitely appreciate some elaboration on how to use the comma operator if you have the time. A friend suggested it, but I couldn't figure out how to make my code fit :/

#

Or, not the comma operator actually - I am interested in that other trick you mentioned to emulate statement expressions

mystic nymph
#

Oh, neat! It seems to produce warnings though
It looks like Clang doesn't support this feature of C23 yet 😦
GCC does, though. Switching to the latest version of GCC causes your code to compile fine.

mystic nymph
#

I am interested in that other trick you mentioned to emulate statement expressions
The basic idea here is that we temporarily hijack a pointer that the user provided as an argument to the macro to point to a compound literal that contains our extra variables. I'll try to demonstrate with a simplified example:

#

This gets us extra variables and, if we have typeof, the ability to return one of them. But the drawbacks (besides the general craziness of this code) are that we still can't do loops, and ptr can never be made into a safe macro argument (i.e. it will always be subject to multiple evaluation).