#std::mem::size_of weird behaviour on mac m1 chipset

58 messages ยท Page 1 of 1 (latest)

signal osprey
#

Hey ๐Ÿ‘‹

When I run the following code on my mac I get: "size of InnerNode: 96", while the actual size of that struct should be 88

use std::mem::size_of;

pub struct InnerNode {
    pub tag: u8,
    pub padding: [u8; 3],
    pub prefix_len: u32,
    pub key: u128,
    pub children: [u32; 2],
    pub child_earliest_expiry: [u64; 2],
    pub reserved: [u8; 40],
}

fn main() {
    println!("size of InnerNode: {}", size_of::<InnerNode>());
}

If I run the same program on the rust playground I get the correct size - https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=df6a72a6f427b5cbd60b79ac935424b7

What gives? Is this an expected behaviour?

late bramble
#

is your mac an M1 by any chance? I can totally imagine that ARM has different alignment requirements to x86.
in this case, I suspect it comes down to the alignment of the u128.

signal osprey
#

Yup it's an M1 - so this type of behaviour is expected then?

late bramble
#

at least it's not unexpected

#

If anything, 88 bytes is the unexpected value.

signal osprey
#

thanks! that was helpful

#

do you have any tips while coding libs that need to support both? let's say a lib like openbook-v2, asserts the size of specific structs, the size of those structs is "hardcoded" into a const

#

what would be your approach to support both scenarios?

late bramble
#

In general, you shouldn't really care. Hardcoding the size of as struct like you just linked to seems really bad

#

it should instead be

const NODE_SIZE: usize = std::mem::size_of::<Node>();
#

which also makes the asserts pointless

#

oh it compares the size of two different structs? that's very strange

vital oriole
#

the only reason I can see you'd care about the size of the struct if you're doing shenanigans with writing the struct to network/disk

#

in which case.. please be careful

#

if you are doing this, one way to solve it would be to also force align 1

#

or to use the types from something like zerocopy which are already align 1

late bramble
vital oriole
#

you can

late bramble
#

For align, if the specified alignment is less than the alignment of the type without the align modifier, then the alignment is unaffected.

vital oriole
#

repr(packed)

late bramble
#

packed(1) true

vital oriole
#

interesting that u128 has align 16 on m1

#

wonder why

late bramble
#

But even if you use both repr(C) and pepr(packed), you still shouldn't hardcode the sizes

#

E.g. usize could still be different between platforms

late bramble
vital oriole
#

not really

late bramble
#

x86 is the strange one here

vital oriole
#

it's still doing 2 ops on u64 for add for instance

late bramble
#

i imagine there could easily be 128 bit mem loads that are aligned

vital oriole
#

alignment is needed because cpus do loads and stores to aligned addresses because registers are that size

#

if you don't have 128bit registers and instructions working on 128bit

#

you don't need align 16

late bramble
#

on topic

vital oriole
#

for instance

#

u64 on i686 has align 4

#

because it doesn't need more

vital oriole
late bramble
#

on a cpu architecture level, you probably want your alignment to reflect your memory interface, not your data type

vital oriole
#

that's just wasted memory

late bramble
#

if you have a 32 bit memory interface, any alignment greater than 32 bits doesn't make sense

vital oriole
#

what's a 32 bit memory interface

late bramble
#

if you have a 64 bit memory interface, you want your u64 aligned to 8 bytes for maximum performance

late bramble
#

(this is about cpu internal architecture, whatever that is)

vital oriole
#

well

#

virtually all cpus these days can read 512 bits at a time

#

align 64 wouldn't be so great

late bramble
#

yeah in which case it makes sense to require u128 to be aligned to 16 bytes

#

because then it's always within a single read

#

instead of possibly two separate reads

vital oriole
#

that's a decent point

#

that's still assuming you're not using the data before and after the u128

#

which you should probably do

#

in which case the read from 2 cache lines is almost free on amd64