Question about IEEE-754 floating point standard | Together Java | Page 1

karmic trench Apr 29, 2025, 1:32 PM

#

Mainly talking about 32 bit floats, I will try to explain how I understand it and I would appreciate
if someone told me **if I understand it correctly **(bold to emphasise the actual question)
the first bit(fb) is the sign, 0 = positive and 1 equals negative. ( (-1)^fb)

the next 8 bits(exp) are the exponent, which is an 8 bit number, but the actual exponent of the value is the 8 bit number - 2^(exp length-1)=> 8 bit number-2^7( (-1)^fb * 2^exp)

the last 23 bits(mn) are the mantissa, which is like the base of a number in scientific notation(what a float basically is). The first number of the base, so like 1 in 1.23, is implied, because it will always be 1, because in binary the only other value it can take is 0, but if it was 0, the exponent would be +1 bigger and the mantissa(for example) would be 0.1234.. instead of 1.234, resulting in loss of accuracy or 1 more bit used. So the base is 1.(mn digits) . so for example, the mantissa 11000..(everything else 0) would be 2^-1 + 2^-2, so 0.5+0.25=0.75 in decimal.
So converting the number from binary to decimal floating point would be (-1)^fb * 2^exp * base (mantissa converted to decimal, in the way I explained)

slim solarBOT Apr 29, 2025, 1:32 PM

#

<@&987246746478460948> please have a look, thanks.

#

Here is an AI assisted attempt to answer your question 🤖. Maybe it helps! In any case, a human is on the way 👍. To continue talking to the AI, you can use </chatgpt:1108714622413963314>.

#

TJ-Bot

Mainly talking about 32 bit floats, I will try to explain how **I understand it** and I would appreciate if someone told me **if I understand it correctly **(bold to emphasise the actual question) the first bit(fb) is the sign, 0 = positive and 1 equals ne

The IEEE-754 floating point standard is a widely-used format for representing real numbers in a binary format. In the case of 32-bit floats, they are divided into three parts: the sign bit, the exponent, and the mantissa.

The sign bit determines whether the number is positive or negative. The exponent represents the magnitude of the number, while the mantissa holds the precision or significant digits of the number.

For a 32-bit float, it is typically divided into 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa. The exponent is biased to allow both positive and negative exponents to be represented.

The formula to calculate a float value from its components is as follows:

(-1)^sign * (1 + mantissa) * 2^(exponent - bias)

It's important to note that not all real numbers can be represented exactly in this format due to limitations on precision and range. This can lead to rounding errors and loss of precision in calculations involving floats.

Overall, your understanding seems correct based on your explanation. If you have any specific questions or want further clarification on any aspect of IEEE-754 floats, feel free to ask!

dull cloak Apr 29, 2025, 1:44 PM

#

yeah, pretty much spot on 🙂

#

I don't know if the mantissa calculation is exact, but everything else sounds right

karmic trench Apr 29, 2025, 1:45 PM

#

thanks, I have few more questions

dull cloak Apr 29, 2025, 1:45 PM

#

shoot

karmic trench Apr 29, 2025, 1:47 PM

#

why does this converter do this? when converting, do I have to shift the mantissa exp digits before converting the mantissa to decimal? another converter does something else, not sure what is correct

#

wait I might be dumb let me try something

dull cloak Apr 29, 2025, 1:49 PM

#

please use https://float.exposed/

Float Exposed

Floating point format explorer – binary representations of common floating point formats.

#

way better tool

#

well, it gives less info

#

But the dude made an amazing explainer https://ciechanow.ski/exposing-floating-point/

Exposing Floating Point – Bartosz Ciechanowski

In depth explanation of floating point format.

karmic trench Apr 29, 2025, 1:58 PM

#

I understand it now thanks

#

(I kept messing up the mantissa calculation when doing it semi manually)

#

let me think if I have any more questions and I will close the thread

dull cloak Apr 29, 2025, 2:17 PM

#

no worries 🙂

#

also, look at bfloat 16, really funny

#

"we don't need the rest of this mantissa" yeet

#

They just chopped it in half and threw away the other part 😄

karmic trench Apr 29, 2025, 2:18 PM

#

yeah 💀

#

okay I dont have any more questions, thanks for helping!

#Question about IEEE-754 floating point standard