#Length-prefixed string literals

29 messages · Page 1 of 1 (latest)

dusty obsidian
#

Hi!
It's a common pattern to define "length-prefixed" string literals using compond literals in this way:

struct str {
  size_t len;
  const char* s;
};

#define str(literal) ((struct str){.len=sizeof(literal)-1, .s=literal})

const struct str hello = str("HELLO");

However, doing so will make the string length be stored in the stack while the literal will continue to be stored in the .rodata section of the program. I was wondering if there's a way to make the string literal's length be prefixed along with the string itself in the .rodata section of the program and be left with:

struct str {
  size_t len;
  const char s[];
};

// some magic
const struct str* hello = // ???
worthy terraceBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

mystic tundra
dusty obsidian
#

In my understanding, the above code won't have the desired memory layout, it will be:

struct str {
  size_t len;
  const char* s;
};

Requiring two indirections when used by a local function, the first to str (not in the stack now) and the second one to (s).

The memory layout I want is just:

struct str {
  size_t len;
  char s[]; // string characters immediately follow the length in memory
};

Which will require just a pointer to the .rodata memory section. But initializing a flexible struct member is not possible apparently:

error: initialization of a flexible array member
mystic tundra
dusty obsidian
#

I see, this makes sense! I can probably make a macro that casts an anonymous struct compound literal back to str too 🤔 .
I'll try it out and post the result here, thanks!

mystic tundra
#

;compile -pedantic-errors

static int i=(int){0};
chrome skiffBOT
#
Compiler Output
<source>: In function 'main':
<source>:3:14: error: initializer element is not constant [-Wpedantic]
    3 | static int i=(int){0};
      |              ^
Build failed
worthy terraceBOT
#

@dusty obsidian Has your question been resolved? If so, type !solved :)

dusty obsidian
#

hmm....let me try a few things

#

I think I got it! What do you think? Now all that's left is using make the macro:

struct str {
  size_t len;
  const char s[];
};

int main() {
    const struct str* hello = (const struct str*)&((static const struct {size_t len; char s[3];}){2, "ab"});
    printf(hello->s);
}
#

This is it I think:

#define str(literal) ((const struct str*)&((static const struct {size_t len; char s[sizeof(literal)];}){sizeof(literal)-1,literal}))
#

Hopefully the compiler is able to deduplicate these static const struct if they are the same thing across different parts of the program.

#

Thanks for the help, @mystic tundra !

worthy terraceBOT
#

Thank you and let us know if you have any more questions!

This thread is now set to auto-hide after an hour of inactivity

mystic tundra
#

you would need to create an actual str object, which is why I used the union before

dusty obsidian
#

Why is it UB?

mystic tundra
#
struct A{int m;};
struct B{int m;};
struct A a={42};
printf("%i\n",((B*)&a)->m);//UB
#

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
Section 6.5 "Expressions" Paragraph 7 C2X https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf#page=90

#

in general, when doing pointer casts (between different types) there is a good chance of violating this

dusty obsidian
#

Doesn't this fall under:

— a type compatible with the effective type of the object,

?
If not, is would casting this string object to a str object also UB then? I see this type of code all of the time though

https://godbolt.org/z/Wf45f7Mcd (fixed minor mistake)

mystic tundra
#

"compatible type" basically means "same type" within a translation unit; the term has important between translation units (for example including the same header from two translation units will result in the structures having compatible type)

mystic tundra
lethal wind