#Type punning between (T*, size) and std::array<T, N>

77 messages · Page 1 of 1 (latest)

quick flame
#

Hey all! Is type punning from a dynamically allocated array of structs T to a std::array<T, N> UB when N is known? For example:

#include <array>
#include <iostream>

#define T_ARRAY_SIZE 10 // definitely a constant but lets suppose it was a runtime value.

struct T { int data; };

int main() {
   T* tArray = new T[T_ARRAY_SIZE]; // create T C style dyn-alloc'd array
   for (int i = 0; i < T_ARRAY_SIZE; i++) { // fill array the C way
      tArray[i].data = i + 1;
   }

   const std::array<T, 10>* const tStdArray = reinterpret_cast<std::array<T, 10>*>(tArray); // shady reinterpret not withT_ARRAY_SIZE but with 10 because we magically know it will be 10
   for (int i = 0; i < tStdArray->size(); i++) { // print values (do some work) using std::array api
      std::cout << (*tStdArray)[i].data << std::endl;
   }

   delete[] tArray;
   return 0;
}```

It obviously works with x86-64 GCC (trunk) and clang (trunk) on godbolt.org (https://godbolt.org/z/YsaxKo1Gs). However, is this even legal in GCC, clang and MSVC? Would relying on this behavior be bad? Would the same piece of code work if `tArray` were to be obtained from an `std::vector<T>`? Please offer your insights!

P.S. I'm doing this because I have one piece of 3rd party code giving me a C style dynamic array of `T` and another piece of 3rd party code expecting an `std::array<T, N>` (N is template parameter).
long fieldBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

azure ivy
#

yes, this is UB

#

there is no way to reinterpret a plain C array as an std::array

azure ivy
#

you might want to look into std::span

quick flame
#

!solved

long fieldBOT
#

Thank you and let us know if you have any more questions!

This thread is now set to auto-hide after an hour of inactivity

naive vortex
#

I guess you can then change it back with std::start_lifetime_as_array

#

that's C++23 stuff though

#

and it's just a way to technically do it, you probably want to use std::span; yeah

quick flame
naive vortex
#

i.e. you can use it to begin the lifetime of a float inside of an int (which ends the lifetime of the int)

#

or in other words, it's std::bit_cast but in-place, not in new storage

quick flame
#

ah so it constructs the specified type from the memory contents? very interesting. how can this be different from reinterpret_cast? is there an in-depth explanation as to how it works?

naive vortex
#

like placement new but who no action at run-time

#

it's probably hard to understand if you don't yet intuit that storage, type of access, and the actual object are all independent

#

int x = 0; is storage for an int, there is an actual int inside, and using x would access that int inside

#

but all of these things are actually independent

#

you can put a float in that storage, you can access the int through an unsigned*, and there might not be an int inside at all

quick flame
# naive vortex you can put a `float` in that storage, you can access the `int` through an `unsi...

Hmm this seems to derail from the original question but please bear with me. I can't wrap my head around the notion of UB in this case. ```
int x = 0;

float* f = reinterpret_cast<float*>(&x); // UB here?
f = 5.0f; // or UB here?

int x = 0;

unsigned* u = reinterpret_cast<unsigned*>(&x); // UB here?
unsigned u2 = u + 1; // or UB here?

Is the act of `reinterpret_cast`'ing UB or is it the act of accessing/modifying the punned memory?
Are there bening UB's? When I think about UB, I think about the quote _all bets are off_. The program itself is in an undefined state. There is no guarantee the compiler will produce code working as expected.

If that is the case, how does this function able to be called thousands of times every frame of a game?```c++
float q_rsqrt(float number) {
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;                       // evil floating point bit level hacking
  i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration

  return y;
}```

I see type punning there, isn't this UB too? If so, how can the game Quake run without any problems? Why does it not, at some point, start to behave unexpectadly? I guess because the presence of UB doesn't necessarily mean something unexpected will happen. It just means something unexpected _may_ happen.

--

Also, how does `std::start_lifetime_as` even guarantees the access/modification will not be UB? We established it doing it's magic in compile time, so there are no destructor calls to old object or constructor calls to the new type. Does the compiler simply see a call to `std::start_lifetime_as` and say:

**"Oh yeah this guy is trying to use this piece of memory as this type of object - so I should not make any assumptions about this piece of memory while trying to optimize it and cause unexpected stuff to happen.** (Since it is generally the compiler' assumptions causing unexpected stuff when there are UB's)

I have so many question and I'm confused as heck.
naive vortex
#

and yeah, q_rsqrt is littered with UB

#

if it runs fine it's because the implementation is lenient and doesn't optimize it out

#

and std::start_lifetime_as works because it actually starts the lifetime of a new object T which you can then access through T* just fine

azure ivy
#

do note that start_lifetime_as here is implementation-defined at best, since it relies on all sorts of assumptions in order to work.

naive vortex
#
unsigned u2 = u + 1; // or UB here?

and yeah, UB there

#

you can't do pointer arithmetic through the wrong kind of pointer

quick flame
#

But yeah, I don't even have to access the memory. Even using the pointer to it is UB?

#

Damn

quick flame
azure ivy
#

uh, that's not really what I meant

naive vortex
azure ivy
#

start_lifetime_as basically gives you a way to create an object on top of another object and, thus, "reinterpret" the bytes of the previous object as bytes of this new object. but since C++ makes basically no guarantees with regards to memory layout, the meaning of doing this is dubious at best.

#

also, the act of creating the object on top of the other object ends the lifetime of the original object, so that original object stops existing at that point, all pointers and references to it become invalid.

quick flame
azure ivy
#

no

#

I mean, depends on what you mean by "at compile time"

#

start_lifetime_as won't emit any instructions if that's what you mean

quick flame
#

yes that's what I meant

azure ivy
#

at least in general

naive vortex
azure ivy
#

it's basically just an optimization barrier.

naive vortex
#

class pointers can be differently sized from fundamental pointers

#

so you might need to zero-extend or truncate the pointer

azure ivy
#

yeah well, aside from stuff like that.

#

in practice, an actual systems, it'll be a noop.

quick flame
# azure ivy it's basically just an optimization barrier.

it stops optimizations by telling the compiler to not assume, right? Compiler assumes the program doesn't have UB so it optimizes whatever it can. It only cares about the optimized program's validity when the program in non UB. Because when it is UB, the standard doesn't guarantee a correct execution. Am I understing this correct?

#

Therefore, when I call std::start_lifetime_as I'm basically ordering the compiler to not optimize (previously) UB statements?

azure ivy
#

I guess. it tells the compiler there is now an object of this other type.

#

it's not so much telling it not to optimize

naive vortex
#

it comes with its own UB because if you std::start_lifetime_as and access it through the type it had before, that's now UB

#
int x = 0;
float* f = std::start_lifetime_as<float>(&x);
x = 1; // UB
quick flame
#

wow this is wild.

azure ivy
#

an "optimization barrier" doesn't tell the compiler to stop optimizing. it just establishes a certain correctness constraint that the optimizer must respect.

quick flame
#

I understand now thank you.

azure ivy
#

also, to answer another question: there's no such thing as "benign UB". that's a notion invented by people who refuse to learn about what UB is and insist on doing things the way they would like things to work as opposed to the way things actually work.

quick flame
#

even if it is "bening" now doesn't mean it will be on the next compiler version. I think it is foolhardy to depend on UB's working.

#

There is no guarantee

azure ivy
#

the core problem is that people tend to think of UB as some kind of local phenomenon. they think that UB is the compiler going like "this line is UB so I'll take my pick of what to do there". they think that you can know what UB will "actually do" if you just know what "happens under the hood". but that's not at all how UB actually works. the compiler generally doesn't know that some line invokes UB. that's generally impossible for a compiler to know (see rice's theorem), which is the reason why UB exists in the first place. what the compiler actually does is look at what conditions would result in UB being invoked, and then derive constraints from that regarding what conditions the generated machine code actually has to handle correctly. UB tells the compiler about what it can assume your code won't ever do.

#

UB isn't just unspecified behavior, it's the literal absence of behavior.

#

and it's not a localized phenomenon.

#

if your program invokes UB at any point, the entire program has UB.

#

UB doesn't mean that you don't know what the result of calling some function will be. it means that you don't know whether that function call or anything else you program ever does will even happen or not.

#

the assumptions the compiler derives using UB propagate both directions, not just downwards. and due to things like inlining, they can travel far and wide, and in completely unpredictable ways.

#

for example, if you dereference a pointer, then the compiler now knows that that pointer cannot be a null pointer, including before the point where the actual dereference happens. and it'll use that fact to optimize.

#

there's actually been a famous security bug in the linux kernel where such a dereference caused a check to be optimized away, because the compiler knew that the pointer cannot be null because it was being dereferenced somewhere along the same control flow path.

quick flame
#

Well this has been a very informative wild ride. Started with a simple software design and implementation problem and ended with an indispensible lesson on UB's, type-punning and a standard library function previously unknown to me. I really appreciate both your efforts to educate me. So a big thank you to both of you, dot and eisie. I can't thank you enough for taking time out of your day to explain like I'm 5. Have a great day!

azure ivy
#

here's a good talk on the matter: https://youtu.be/g7entxbQOCc?si=JxKbkwHQmYoD_uQ3

http://CppCon.org

Presentation Slides, PDFs, Source Code and other presenter materials are available at: https://github.com/cppcon/cppcon2016

Compiler exploitation of undefined behavior has been a topic of recent discussion in the programming community. This talk will explore the magic of Undefined Behavior, Covering how and why modern optim...

▶ Play video
#

and yes, not all compilers take advantage of UB in the same ways. especially ancient compilers don't do much of that. way back in the day, the idea that you can know what your compiler will do was a little more true than it is in this century. and hacks like the quake inverse square root were often the only way to get what you needed. but just because something was a good idea for John Carmack to do back in 1992 doesn't mean its something we should be doing in 2024. people insisting on "using UB" is actually a big part of why we can't have nice things. there's still so much crappy code out there that relies on this kind of stuff "working" that compilers nowadays actually have to actively hold back and literally turn off optimizations they could perform for all of us. because some of that code is unfortunately too important to be broken.