#General memory allocation

26 messages · Page 1 of 1 (latest)

azure shadow
#

Hello,

In my desire to learn something a bit more low level than Java I have recently reached/considered two languages: C and Zig. Here I would like to question if my understandings about memory allocation in C are correct, and how to optimize a fictional program.

If I'm not mistaken, C int occupy 32 bits, correct? Are 32 bits allocated even if one has a value that could be stored in less bits? e.g. 5 in 3 bits.
This caught my attention because in Zig you can specify an arbitrary memory size for a variable. For the previous example one could use a u3 which allocates an unsigned integer variable with 3 bits of space.

Suppose a fictional program that simply puts in memory 1 billion integer variables with value 5. As 3 bits are needed to express this value, the program requires at least 3 billion bits, or around 358MB. In Zig it seems one could allocate exactly that, but in C it seems one cannot? After googling for Data Types in C it appears the least amount of memory a variable occupies would be 8 bits, and it seems to be possible to compile and run the following code:

int main() {
  char test = 5;
  printf("%d", test);
  return 0;  
}

By this approach the program would require around 954MB of memory, almost 3x more than in Zig. Also, in my personal belief it feels a bit odd to assign the value to a variable of type char.

So my questions are: Have I made any wrong assumptions/considerations? Is there a way possible to optimize this fictional program in C so it would use the same amount of memory as if it were written in Zig? Where could I read and learn about this?

Thank you.

snow umbraBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

dull moss
#

In C, we usually define types independant of the values that go in them, so when we say int, we mean to reserve 4 bytes of storage for a signed integer without any prior knowledge of which values are actually in them.

#

It's for the most part impossible to have dynamic layout of structs based on their values "just work", as it would have to if we wanted field widths to be shortened to the smallest that's actually needed to store the value. If you want to save the 3x memory cost, the solution is to just make them signed chars

#

(also, are you sure that zig has bit-level integer allocation? this seems extremely suspicious)

#

(Also, it's not quite the case that variables can only take up a byte at minimum, we do have bitfields, but everything needs to have an address, and you can't have an object stored at 0x400100.5)

austere belfry
dull moss
#

ah

#

Well we have _BitInt too :P

azure shadow
austere belfry
#

For example, a u30 will use 30 bits for the value, but it will actually use up 32 bits of space

azure shadow
#

Oh so my assumption of allocating less memory in Zig is actually untrue?

austere belfry
#

Zig does have a PackedIntArray structure in the standard library to pack them together, but by default in arrays and slices there will be padding

azure shadow
#

Do you recommend any specific study resources on the topic for C?

dull moss
#

learn-c.org only gets up to like intermediate topics (and that might be stretching it) but it's allegedly pretty solid

#

Practice is always the best way to understand things

austere belfry
#

On the topic of padding and alignment? My favorite article on the subject is this one https://web.archive.org/web/20201021053824/https://developer.ibm.com/technologies/systems/articles/pa-dalign/

dull moss
#

ooh this is solid
link stolen and saved

#

except the alignment for reads doesn't matter much on x86 these days :P

azure shadow
#

Well

#

I would like to share that

#

This is the Powershell memory allocation running a Zig program I wrote that allocates 1 billion u3 variables with the value 5.

Powershell by itself was running on around 30MB of memory, so the value of the program would be around 967MB, which is very close to what I had calculated earlier of 954MB when using 1 byte/variable. It seems the variables here got padded using 1 byte as base.

Therefore its the same memory I was expecting to have when written in C.

snow umbraBOT
#

Thank you and let us know if you have any more questions!

This thread is now set to auto-hide after an hour of inactivity

azure shadow
#

I'm sorry. I forgot to send the C version memory usage before marking as solved.
But these are just proof that what raccoon said is correct, there is padding.