#Do I need to concern myself with endianness?

102 messages · Page 1 of 1 (latest)

sudden bramble
#

Hi everyone, I've come across the concept of big endian and little endian when writing binary data to a file. This was in polling an LLM for some feedback on a code snippet and it highlighted this feedback quite aggressively.

As I understand both ARM and x86/64 both use little endian. Do I need to concern myself with ensuring things are explicitly handled as big or little endian, or is that too 'in the weeds'?

It seems like if I'm targeting modern systems, I should be totally fine with just not caring and letting everything 'just work' with little endian... but I'm not 100% sure.

What are your thoughts?

violet boughBOT
#

When your question is answered use !solved or the button below to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

dire gulch
sudden bramble
# dire gulch well I think it depends on what you are making.. If it concerns binary data and...

Specifically I'm writing wav files, which do require little endian. But the program I'm writing is only ever really designed to run on a standard modern pc which would be ARM or x86/64... so all the ints in my code would already be little endian.

So I figure it's pretty redundant to care in this case? But it's good to know that networking does use big endian where that would have to be a consideration I guess?

dire gulch
#

I mean it seems pretty straightforwards - care if it matters

sudden bramble
#

Sweet. Thanks!

violet boughBOT
# violet bough

Thank you and let us know if you have any more questions!

This thread is now set to auto-hide after an hour of inactivity

sterile thorn
#

the answer is no

#

at least not with machine endianess

indigo dawn
#

Only time I had to care about endian was when interfacing with emulation of some kind :P

sudden bramble
#

or if theres a technical requirement with whatever you're interfacing with

sterile thorn
#

the point is that the endianess of the architecture you're running on should be irrelevant for this.

sand wigeon
sterile thorn
#

what

#

the post is about the fact that if machine endianess matters to your code, you're almost certainly doing it wrong

sand wigeon
sterile thorn
#

that is how you're supposed to do it. this code is independent of machine endianess. it only concerns itself with the endianess of the data in the actual file format.

#

the problematic approach is the one where you're checking what architecture you're on to decide whether you need to swap bytes or not.

sand wigeon
#

as in

#
uint32_t input;
stream.read(&input, sizeof(input));
#if CPU_USES_LITTLE_ENDIAN

if ( audio file uses big endian )
{
  char* arr = reinterpret_cast<char*>(&input);
  input = (arr[0] << 24) | (arr[1] << 16) | ( arr[3] << 8) | arr[4];
}
#else

if ( audio file uses little endian )
{
  char* arr = reinterpret_cast<char*>(&input);
  input = (arr[0] << 24) | (arr[1] << 16) | ( arr[3] << 8) | arr[4];
}
#endif

isn't this with better performance, assuming that there's a 50/50 chance to get an input file with both byte orders?

#

because, when there's a match in the byte ordering, we do not run anything and proceed

sterile thorn
sudden bramble
# sterile thorn compilers are smart enough to know that

What's the advantage of making it truly independent besides being righteously correct? I don't think anyone disagrees with the idea that making it truly independent is the correct thing to do. But... how do you know if you need to swap endianness without checking what your system is giving you?

#

Like, if I'm assembling my wav file (or whatever) on a big endian processor, how does my software know if it needs to switch to little endian before writing?

#

To be clear, the data isn't coming from another source of known endianness, it is data created by the processor and being taken from memory and written into a file. Surely you'd have to check the machine endianness?

sterile thorn
sterile thorn
#

and no, you don't need to check the machine endianess. you just need to shift successive 8-bit values out of your number in the order the file format expects. that will work correctly irrespective of machine endianess.

#

machine endianess is simply not something that should matter to you unless you're writing a compiler backend or smth like that.

sudden bramble
#

Which therefore means theres no reason to check

#

because the language has got u covered

#

when doing the bit shifting

sterile thorn
#

if you shift the same int value by 12 bits to the left, you will always get the same result on any machine, no matter its endianess.

#

machine endianess is really only a problem for programmers who believe that machine endianess is a problem. you make machine endianess into a problem by writing your code in a way that depends on machine endianess. almost always, that's a completely unnecessary and self-inflicted result of thinking in terms of machine endianess rather than in terms of operations on actual numbers.

#

that's kinda what that article was trying to get at.

#

the lowest 8 bits of some number are always the lowest 8 bits of that number. machine endianess affects at which memory address you will find those 8 bits. but you can always find them at bit positions 0 to 7 in that number.

sudden bramble
#

what I have learnt is don't write types to files where bit/byte order matters, do it with bitshifts

sterile thorn
#

yup

sudden bramble
#

I love how this is my second day learning cpp, this language is really fun

sterile thorn
#

when you save data into a file, conceptually, you're taking your program's internal representation of that data and turn it into a sequence of bytes that encodes that information according to some scheme. and when it read the data from such a file, you're taking the sequence of bytes and decode it into whatever internal representation your program uses.

sudden bramble
#

I like the level of control and depth

sterile thorn
sudden bramble
#

but I can see why people who aren't curious would find it difficult because this shit is dank as fuck

sudden bramble
#

so over the years I've picked up enough stuff that its not too tricky to deep dive everything to understand it correctly

#

its either I ask a lot of questions about small random things or I subject myself to reading learncpp front to back. and that sure as shit is never happening LOL

sterile thorn
#

k ^^

sand wigeon
sterile thorn
#

it's less expressive and inherently not portable, and for no reason whatsoever

sand wigeon
# sterile thorn it's less expressive and inherently not portable, and for no reason whatsoever

it's less expressive
meh that's subjective, different people find different C++ code snippets as expressive

not portable
ummm, okay fine, instead of a macro check, a runtime check with some global constant enum flag for endiannes, then it will be portable

and for no reason whatsoever as I said, the reason from what I see thus far is, when the endiannes of the file and the cpu are matching, then do not do anything, this should mean, faster code

sterile thorn
#

it will not be faster because compilers are smart enough to translate the endianess-independent version into optimal machine code

sand wigeon
indigo dawn
#

I had a manual byteswap implementation that compilers optimized to the bswap opcode. So I assume it's rather byte smart

sand wigeon
#

and the OP snippet is, 50% smaller than mine

#

(bullshit about devs byte swapping the same variable 5 times is a developer mistake, I don't care about those)

sterile thorn
#

and less expressive is not really subjective here. one can observe very clearly that the endianess-independent version is pretty explicitly about the desired byte order that the format prescribes, and that byte order only. the version that depends on machine endianess otoh is dealing with the relationship between some machine endianess and the desired byte order. instead of simply writing 8-bit chunks in the desired order, we check in which way the desired order differs from the order we assume we have, and in what way we need to swap what we have to turn it into what we want. that's pretty objectively a lot more complicated to understand and reason about.

sterile thorn
# sand wigeon aha, so at the end of the day the performance of my snippet compared to the perf...

ultimately, you should just have a function like read_uint32_le() or smth, that takes a bunch of bytes and returns the corresponding integer value by deserializing the given bytes in the given order. then you don't really need to worry about it anymore. msvc is the only compiler that still fails to perform the optimization. so if you care about msvc support and debug performance, you'll prove want to implement it in terms of std::endian and std::byteswap. personally, I would prefer the simple and expressive implementation unless there's a strong technical reason to do smth else.

#

it should be noted that little and big are technically not the only options, and machine endianess is not necessarily known at compile time, even though machine endianess usually isn't changed at runtime even on machines that would support that, such as ARM.

sand wigeon
#

ez

sand wigeon
sterile thorn
#

yes

sand wigeon
sterile thorn
#

user space code usually doesn't get to do that for obvious reasons

#

I forgot the details of how it's done exactly

#

iirc there were two different mechanisms, and it depends on the exact version of ARM

sand wigeon
#

hold up, the code of the operating system is shipped with a hardcoded requirement for endianness, little endian,
if an ARM cpu is switched to Big Endian, how is the operating system supposed to boot at all?

sterile thorn
sterile thorn
sand wigeon
sand wigeon
sterile thorn
#

the CPU can run in either mode, the os chooses what mode to run in

#

and yes, the CPU supports switching midair

sand wigeon
sterile thorn
#

bios doesn't really enter the picture here

#

the CPU just can use either endianess

#

the system can decide what it wants the CPU to do

#

you can in theory run in one endianess and then call some code that runs in another endianess. ofc that's usually gonna be a bad idea since all the memory contents will be wrong, except for stuff that's in the correct endianess. anyways, not really smth to worry about. you usually don't get to just switch the machine endianess when running on an os on ARM.

#

no sane OS allows that

#

it's just a capability that theoretically exists on some architectures

sand wigeon
#

TIL

sterile thorn
#

the point was mainly that this whole notion of endianess being either big or little, and known at compilation time is not strictly correct

sand wigeon
sterile thorn
#

yes

sand wigeon
#

meaning you would need to check endiannes at runtime, every time???

sterile thorn
#

in theory, that could happen, yes. it won't on any actual system ofc.

#

too much stuff would break

#

but endianess isn't necessarily a constant baked into the hardware

#

both ARM and Itanium at least had the ability to switch at runtime

#

iirc, the idea was that this would help with stuff like emulation

sand wigeon
indigo dawn
#

Isnt the endianess a built-in part of the cpu?

sand wigeon
indigo dawn
#

Huh