Type punning between (T*, size) and std::array<T, N> | Together C & C++ | Page 1

quick flame Feb 24, 2024, 2:47 AM

#

Hey all! Is type punning from a dynamically allocated array of structs T to a std::array<T, N> UB when N is known? For example:

#include <array>
#include <iostream>

#define T_ARRAY_SIZE 10 // definitely a constant but lets suppose it was a runtime value.

struct T { int data; };

int main() {
   T* tArray = new T[T_ARRAY_SIZE]; // create T C style dyn-alloc'd array
   for (int i = 0; i < T_ARRAY_SIZE; i++) { // fill array the C way
      tArray[i].data = i + 1;
   }

   const std::array<T, 10>* const tStdArray = reinterpret_cast<std::array<T, 10>*>(tArray); // shady reinterpret not withT_ARRAY_SIZE but with 10 because we magically know it will be 10
   for (int i = 0; i < tStdArray->size(); i++) { // print values (do some work) using std::array api
      std::cout << (*tStdArray)[i].data << std::endl;
   }

   delete[] tArray;
   return 0;
}```

It obviously works with x86-64 GCC (trunk) and clang (trunk) on godbolt.org (https://godbolt.org/z/YsaxKo1Gs). However, is this even legal in GCC, clang and MSVC? Would relying on this behavior be bad? Would the same piece of code work if `tArray` were to be obtained from an `std::vector<T>`? Please offer your insights!

P.S. I'm doing this because I have one piece of 3rd party code giving me a C style dynamic array of `T` and another piece of 3rd party code expecting an `std::array<T, N>` (N is template parameter).

long fieldBOT Feb 24, 2024, 2:47 AM

#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

azure ivy Feb 24, 2024, 5:25 AM

#

yes, this is UB

#

there is no way to reinterpret a plain C array as an std::array

azure ivy Feb 24, 2024, 12:45 PM

#

you might want to look into std::span

quick flame Feb 24, 2024, 7:28 PM

#

azure ivy there is no way to reinterpret a plain C array as an std::array

Thank you!

#

!solved

long fieldBOT Feb 24, 2024, 7:28 PM

#

Thank you and let us know if you have any more questions!

This thread is now set to auto-hide after an hour of inactivity

naive vortex Feb 25, 2024, 6:46 AM

#

quick flame Hey all! Is type punning from a dynamically allocated array of structs `T` to a ...

you can use std::start_lifetime_as to reinterpret it as a std::array but that would permanently change it

#

I guess you can then change it back with std::start_lifetime_as_array

#

that's C++23 stuff though

#

and it's just a way to technically do it, you probably want to use std::span; yeah

quick flame Feb 25, 2024, 7:53 AM

#

naive vortex you can use `std::start_lifetime_as` to reinterpret it as a `std::array` but tha...

Hey eisie, could you teach me what those are and how to use them? I checked out the cppreference page but couldn't really understand their purpose.

P.S. I already handled my business with std::span so I'm only asking to learn and not to fudge my way around the problem.

naive vortex Feb 25, 2024, 8:09 AM

#

quick flame Hey eisie, could you teach me what those are and how to use them? I checked out ...

in summary, it's like reinterpret_cast but actually valid because it begins the lifetime of a new object in existing storage

#

i.e. you can use it to begin the lifetime of a float inside of an int (which ends the lifetime of the int)

#

or in other words, it's std::bit_cast but in-place, not in new storage

#

https://eel.is/c++draft/obj.lifetime#3 it's even specified to behave like std::bit_cast

quick flame Feb 25, 2024, 8:51 AM

#

ah so it constructs the specified type from the memory contents? very interesting. how can this be different from reinterpret_cast? is there an in-depth explanation as to how it works?

naive vortex Feb 25, 2024, 9:09 AM

#

quick flame ah so it constructs the specified type from the memory contents? very interestin...

it's different because it doesn't just access an object through a different type (which is UB); it actually starts the lifetime of a new type

#

like placement new but who no action at run-time

#

it's probably hard to understand if you don't yet intuit that storage, type of access, and the actual object are all independent

#

int x = 0; is storage for an int, there is an actual int inside, and using x would access that int inside

#

but all of these things are actually independent

#

you can put a float in that storage, you can access the int through an unsigned*, and there might not be an int inside at all

quick flame Feb 25, 2024, 9:44 AM

#

naive vortex you can put a `float` in that storage, you can access the `int` through an `unsi...

Hmm this seems to derail from the original question but please bear with me. I can't wrap my head around the notion of UB in this case. ```
int x = 0;

float* f = reinterpret_cast<float*>(&x); // UB here?
f = 5.0f; // or UB here?

int x = 0;

unsigned* u = reinterpret_cast<unsigned*>(&x); // UB here?
unsigned u2 = u + 1; // or UB here?

Is the act of `reinterpret_cast`'ing UB or is it the act of accessing/modifying the punned memory?
Are there bening UB's? When I think about UB, I think about the quote _all bets are off_. The program itself is in an undefined state. There is no guarantee the compiler will produce code working as expected.

If that is the case, how does this function able to be called thousands of times every frame of a game?```c++
float q_rsqrt(float number) {
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;                       // evil floating point bit level hacking
  i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration

  return y;
}```

I see type punning there, isn't this UB too? If so, how can the game Quake run without any problems? Why does it not, at some point, start to behave unexpectadly? I guess because the presence of UB doesn't necessarily mean something unexpected will happen. It just means something unexpected _may_ happen.

--

Also, how does `std::start_lifetime_as` even guarantees the access/modification will not be UB? We established it doing it's magic in compile time, so there are no destructor calls to old object or constructor calls to the new type. Does the compiler simply see a call to `std::start_lifetime_as` and say:

**"Oh yeah this guy is trying to use this piece of memory as this type of object - so I should not make any assumptions about this piece of memory while trying to optimize it and cause unexpected stuff to happen.** (Since it is generally the compiler' assumptions causing unexpected stuff when there are UB's)

I have so many question and I'm confused as heck.

naive vortex Feb 25, 2024, 9:45 AM

#

quick flame Hmm this seems to derail from the original question but please bear with me. I c...

the cast is fine; it's the access of int through a glvlaue of type float which is UB

#

and yeah, q_rsqrt is littered with UB

#

if it runs fine it's because the implementation is lenient and doesn't optimize it out

#

and std::start_lifetime_as works because it actually starts the lifetime of a new object T which you can then access through T* just fine

azure ivy Feb 25, 2024, 9:47 AM

#

do note that start_lifetime_as here is implementation-defined at best, since it relies on all sorts of assumptions in order to work.

naive vortex Feb 25, 2024, 9:48 AM

#

unsigned u2 = u + 1; // or UB here?

and yeah, UB there

#

you can't do pointer arithmetic through the wrong kind of pointer

quick flame Feb 25, 2024, 9:49 AM

#

naive vortex you can't do pointer arithmetic through the wrong kind of pointer

ahh it should've been unsigned u2 = *u + 1; sorry

#

But yeah, I don't even have to access the memory. Even using the pointer to it is UB?

#

Damn

quick flame Feb 25, 2024, 9:50 AM

#

azure ivy do note that start_lifetime_as here is implementation-defined *at best*, since i...

ok this actually made it click. The compiler maintainers have to somehow implement this functionality. I shouldn't care how.

azure ivy Feb 25, 2024, 9:50 AM

#

uh, that's not really what I meant

naive vortex Feb 25, 2024, 9:50 AM

#

yeah, it's UB to do pointer arithmetic the wrong way https://eel.is/c++draft/expr.add#4.3

#

https://eel.is/c++draft/expr.add#6 this paragraph may be more relevant

azure ivy Feb 25, 2024, 9:52 AM

#

start_lifetime_as basically gives you a way to create an object on top of another object and, thus, "reinterpret" the bytes of the previous object as bytes of this new object. but since C++ makes basically no guarantees with regards to memory layout, the meaning of doing this is dubious at best.

#

also, the act of creating the object on top of the other object ends the lifetime of the original object, so that original object stops existing at that point, all pointers and references to it become invalid.

quick flame Feb 25, 2024, 9:53 AM

#

azure ivy also, the act of creating the object on top of the other object ends the lifetim...

and this all happens on compile time right?

azure ivy Feb 25, 2024, 9:53 AM

#

no

#

I mean, depends on what you mean by "at compile time"

#

start_lifetime_as won't emit any instructions if that's what you mean

quick flame Feb 25, 2024, 9:56 AM

#

yes that's what I meant

azure ivy Feb 25, 2024, 9:56 AM

#

at least in general

naive vortex Feb 25, 2024, 9:56 AM

#

azure ivy start_lifetime_as won't emit any instructions if that's what you mean

well, it might, but not a lot

azure ivy Feb 25, 2024, 9:57 AM

#

it's basically just an optimization barrier.

naive vortex Feb 25, 2024, 9:57 AM

#

class pointers can be differently sized from fundamental pointers

#

so you might need to zero-extend or truncate the pointer

azure ivy Feb 25, 2024, 9:57 AM

#

yeah well, aside from stuff like that.

#

in practice, an actual systems, it'll be a noop.

quick flame Feb 25, 2024, 9:59 AM

#

azure ivy it's basically just an optimization barrier.

it stops optimizations by telling the compiler to not assume, right? Compiler assumes the program doesn't have UB so it optimizes whatever it can. It only cares about the optimized program's validity when the program in non UB. Because when it is UB, the standard doesn't guarantee a correct execution. Am I understing this correct?

#

Therefore, when I call std::start_lifetime_as I'm basically ordering the compiler to not optimize (previously) UB statements?

azure ivy Feb 25, 2024, 10:01 AM

#

I guess. it tells the compiler there is now an object of this other type.

#

it's not so much telling it not to optimize

naive vortex Feb 25, 2024, 10:01 AM

#

it comes with its own UB because if you std::start_lifetime_as and access it through the type it had before, that's now UB

#

int x = 0;
float* f = std::start_lifetime_as<float>(&x);
x = 1; // UB

quick flame Feb 25, 2024, 10:02 AM

#

naive vortex it comes with its own UB because if you `std::start_lifetime_as` and access it t...

that's not what I tried to say but thank you for clarifying that, I was about to ask the same thing.

#

wow this is wild.

azure ivy Feb 25, 2024, 10:03 AM

#

an "optimization barrier" doesn't tell the compiler to stop optimizing. it just establishes a certain correctness constraint that the optimizer must respect.

quick flame Feb 25, 2024, 10:03 AM

#

I understand now thank you.

azure ivy Feb 25, 2024, 10:05 AM

#

also, to answer another question: there's no such thing as "benign UB". that's a notion invented by people who refuse to learn about what UB is and insist on doing things the way they would like things to work as opposed to the way things actually work.

quick flame Feb 25, 2024, 10:06 AM

#

even if it is "bening" now doesn't mean it will be on the next compiler version. I think it is foolhardy to depend on UB's working.

#

There is no guarantee

azure ivy Feb 25, 2024, 10:14 AM

#

the core problem is that people tend to think of UB as some kind of local phenomenon. they think that UB is the compiler going like "this line is UB so I'll take my pick of what to do there". they think that you can know what UB will "actually do" if you just know what "happens under the hood". but that's not at all how UB actually works. the compiler generally doesn't know that some line invokes UB. that's generally impossible for a compiler to know (see rice's theorem), which is the reason why UB exists in the first place. what the compiler actually does is look at what conditions would result in UB being invoked, and then derive constraints from that regarding what conditions the generated machine code actually has to handle correctly. UB tells the compiler about what it can assume your code won't ever do.

#

UB isn't just unspecified behavior, it's the literal absence of behavior.

#

and it's not a localized phenomenon.

#

if your program invokes UB at any point, the entire program has UB.

#

UB doesn't mean that you don't know what the result of calling some function will be. it means that you don't know whether that function call or anything else you program ever does will even happen or not.

#

the assumptions the compiler derives using UB propagate both directions, not just downwards. and due to things like inlining, they can travel far and wide, and in completely unpredictable ways.

#

for example, if you dereference a pointer, then the compiler now knows that that pointer cannot be a null pointer, including before the point where the actual dereference happens. and it'll use that fact to optimize.

#

there's actually been a famous security bug in the linux kernel where such a dereference caused a check to be optimized away, because the compiler knew that the pointer cannot be null because it was being dereferenced somewhere along the same control flow path.

quick flame Feb 25, 2024, 10:25 AM

#

Well this has been a very informative wild ride. Started with a simple software design and implementation problem and ended with an indispensible lesson on UB's, type-punning and a standard library function previously unknown to me. I really appreciate both your efforts to educate me. So a big thank you to both of you, dot and eisie. I can't thank you enough for taking time out of your day to explain like I'm 5. Have a great day!

azure ivy Feb 25, 2024, 10:26 AM

#

here's a good talk on the matter: https://youtu.be/g7entxbQOCc?si=JxKbkwHQmYoD_uQ3

YouTube

CppCon

CppCon 2016: Michael Spencer “My Little Optimizer: Undefined Behavi...

http://CppCon.org
—
Presentation Slides, PDFs, Source Code and other presenter materials are available at: https://github.com/cppcon/cppcon2016
—
Compiler exploitation of undefined behavior has been a topic of recent discussion in the programming community. This talk will explore the magic of Undefined Behavior, Covering how and why modern optim...

▶ Play video

#

and yes, not all compilers take advantage of UB in the same ways. especially ancient compilers don't do much of that. way back in the day, the idea that you can know what your compiler will do was a little more true than it is in this century. and hacks like the quake inverse square root were often the only way to get what you needed. but just because something was a good idea for John Carmack to do back in 1992 doesn't mean its something we should be doing in 2024. people insisting on "using UB" is actually a big part of why we can't have nice things. there's still so much crappy code out there that relies on this kind of stuff "working" that compilers nowadays actually have to actively hold back and literally turn off optimizations they could perform for all of us. because some of that code is unfortunately too important to be broken.

#Type punning between (T*, size) and std::array<T, N>

float* f = reinterpret_cast<float*>(&x); // UB here? f = 5.0f; // or UB here?

float* f = reinterpret_cast<float*>(&x); // UB here?
f = 5.0f; // or UB here?