When conversion from Integer to Single may lose precision
Why there is a red color in right side?
How can I ping multiple IP addresses at the same time?
Counterfeit checks were created for my account. How does this type of fraud work?
Leaving job close to major deadlines
Is using legacy mode instead of UEFI mode a bad thing to do?
How to make all magic-casting innate, but still rare?
Summing cube roots in fractions
How much steel armor can you wear and still be able to swim?
First occurrence in the Sixers sequence
I just entered the USA without passport control at Atlanta airport
Is the author of the Shu"t HaRidvaz the same one as the one known to be the rebbe of the Ariza"l?
Why are there no file insertion syscalls
Definition of 'vrit'
King or Queen-Which piece is which?
Umlaut character order when sorting
Story of a Witch Boy
Synaptic Static - when to roll the d6?
Am I legally required to provide a (GPL licensed) source code even after a project is abandoned?
Are there examples of rowers who also fought?
Densest sphere packing
How is linear momentum conserved in circular motion?
Name for a function whose effect is canceled by another function?
I found a password with hashcat but it doesn't work
Draw a symmetric alien head
When conversion from Integer to Single may lose precision
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I was reading an article from Microsoft regarding Widening Conversions and Option Strict On when I got to the part
The following conversions may lose precision:
- Integer to Single
- Long to Single or Double
- Decimal to Single or Double
However, these conversions do not lose information or magnitude.
.. but according to another article regarding data types,
Integer type can store from -2.147.483.648 to 2.147.483.647 and
Single type can store from
- 1,401298E-45 to 3,4028235E+38 for positive numbers,
- and -3,4028235E+38 to - 1,401298E-45 for negative numbers
.. so Single can store much more numbers than Integer. I couldn't understand in what situation such conversion from Integer to Single may lose precision. Could someone explain, please?
vb.net floating-point
add a comment |
I was reading an article from Microsoft regarding Widening Conversions and Option Strict On when I got to the part
The following conversions may lose precision:
- Integer to Single
- Long to Single or Double
- Decimal to Single or Double
However, these conversions do not lose information or magnitude.
.. but according to another article regarding data types,
Integer type can store from -2.147.483.648 to 2.147.483.647 and
Single type can store from
- 1,401298E-45 to 3,4028235E+38 for positive numbers,
- and -3,4028235E+38 to - 1,401298E-45 for negative numbers
.. so Single can store much more numbers than Integer. I couldn't understand in what situation such conversion from Integer to Single may lose precision. Could someone explain, please?
vb.net floating-point
add a comment |
I was reading an article from Microsoft regarding Widening Conversions and Option Strict On when I got to the part
The following conversions may lose precision:
- Integer to Single
- Long to Single or Double
- Decimal to Single or Double
However, these conversions do not lose information or magnitude.
.. but according to another article regarding data types,
Integer type can store from -2.147.483.648 to 2.147.483.647 and
Single type can store from
- 1,401298E-45 to 3,4028235E+38 for positive numbers,
- and -3,4028235E+38 to - 1,401298E-45 for negative numbers
.. so Single can store much more numbers than Integer. I couldn't understand in what situation such conversion from Integer to Single may lose precision. Could someone explain, please?
vb.net floating-point
I was reading an article from Microsoft regarding Widening Conversions and Option Strict On when I got to the part
The following conversions may lose precision:
- Integer to Single
- Long to Single or Double
- Decimal to Single or Double
However, these conversions do not lose information or magnitude.
.. but according to another article regarding data types,
Integer type can store from -2.147.483.648 to 2.147.483.647 and
Single type can store from
- 1,401298E-45 to 3,4028235E+38 for positive numbers,
- and -3,4028235E+38 to - 1,401298E-45 for negative numbers
.. so Single can store much more numbers than Integer. I couldn't understand in what situation such conversion from Integer to Single may lose precision. Could someone explain, please?
vb.net floating-point
vb.net floating-point
edited Jun 3 at 14:59
muru
1237
1237
asked Jun 1 at 23:13
Vinicius VVinicius V
14223
14223
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
Single can store much more numbers than Integer
No, it can't. Both Single
and Integer
are 32 Bit, which means that both can store the exact same amount of numbers, namely 232 = 4294967296 distinct numbers.
Since the range of Single
is clearly larger than that, it is immediately obvious (because of the Pigeonhole Principle) that it cannot possibly represent all numbers within that range.
And since the range of Integer
is exactly the same size as the maximum amount of numbers that both Integer
and Single
can represent, but Single
can also represent numbers outside of that range, it is clear that it cannot possibly represent all numbers inside the range of Integer
.
If there are some numbers of Integer
that cannot be represented in Single
, converting from Integer
to Single
must be capable of losing information.
3
+1 for this great explanation of why that has to be the case, even though the question was actually when ("in what situation") it happens...
– doubleYou
Jun 2 at 11:25
1
I'd +1, but the wording in that last sentence feels a bit off. If the ints are small, then the mapping is locally injective, so there is no loss. There must be values where the conversion is lossy, but the conversion itself doesn't imply lose of information.
– VisualMelon
Jun 2 at 12:34
20
@doubleYou: 4261412864 of the 4294967296Integer
s (99.2%) cannot be represented asSingle
, so "when" is "pretty much always".
– Jörg W Mittag
Jun 2 at 17:50
10
en.wikipedia.org/wiki/Single-precision_floating-point_format explains the limitations nicely for IEEE754 binary32. Integers in[-16777216,16777216]
(2^24 = the significand width) can be exactly represented. Larger numbers are rounded to the nearest multiple of 2, 4, 8, ... depending on how large they are.
– Peter Cordes
Jun 3 at 5:04
14
“which means that both can store the exact same amount of numbers” — It does not mean that. It would only mean that if both types have the exact same number of ways of storing each number. And this isn’t the case; for instance,Single
has two ways of storing zero. SoSingle
can in fact represent fewer distinct numbers thanInteger
.
– Konrad Rudolph
Jun 3 at 15:01
|
show 5 more comments
Floating point types (such as Single and Double) are represented in memory by a sign, a mantissa and an exponent. Think of it as scientific notation:
Sign*Mantissa*Base^Exponent
They - as you may expect - use base 2. There are other tweaks that allow for representing infinity and NaN, and the exponent is offset (will come back to that), and a shorthand for the mantissa (will come back to that too). Look for the standard IEEE 754 which covers its representation and operations for more details.
For our purposes we can imagine it as a binary number "mantissa", and an "exponent" that tells you where to put the decimal separator.
In the case of Single, we have 1 bit for he sign, 8 for the exponent and 23 for the mantissa.
Now, the thing is, we will store the mantissa from the most significant digit. Remember that all zeroes to the left are not relevant. And giving that we are working in binary, we know that the most significant digit is a 1※. Well, since we know that, we do not have to store it. Thanks to that shorthand, the effective range of the mantissa is 24 bits.
※: Unless the number we are storing is zero. For that we will have all the bits set to zero. However, if we try to interpret that under the description I gave, you would have a 2^24 (the implicit 1) multiplied by 1 (2 to the power of the exponent 0). So, to fix it, exponent zero is a special value. There are also special values to store infinity and NaN in the exponent.
As per the exponent offset - aside from avoiding the special values - having it offset allows to place the decimal point before the start of the mantissa or after its end, without the need to have a sign for the exponent.
This means that for large numbers, the floating point type will put the decimal point beyond the end of the mantissa.
Remember that the mantissa is a 24 bit number. It will never represent a 25 bit number... it does not have that extra bit. Thus, the single cannot distinguish between 2^24 and 2^24+1 (these are the first 25 bit numbers, and they differ on the last bit, which is not represented in the single).
Thus, for integers the range of the single is -2^24 to 2^24. And trying to add 1 to 2^24 will result in 2^24 (because as far as the type is concerned, 2^24 and 2^24+1 are the same value). Try it Online. This is why there is a loss of information when converting from integer to single. And this is also why a loop that uses a single or a double could actually be an infinite loop without you noticing.
This isn't a perfect explanation of the implicit leading1
bit in the significand. It's implied by the biased-exponent field being non-zero. Subnormals (aka denormals) including+-0.0
have a leading0
bit of their significand. I guess you could simplify to just consider0.0
a totally special case, but0.0
actually follows the same encoding rules as other subnormals.
– Peter Cordes
Jun 3 at 5:11
add a comment |
Here is an actual example of when converting from Integer
to Single
may lose precision:
The Single
type can store all integers from -16777216 to 16777216 (inclusive), but it cannot store all integers outside of this range. For example, it cannot store the number 16777217. For that matter, it cannot store any odd number greater than 16777216.
We can use Windows PowerShell to see what happens if we convert an Integer
to a Single
and back:
PS C:Userstanne> [int][float]16777213
16777213
PS C:Userstanne> [int][float]16777214
16777214
PS C:Userstanne> [int][float]16777215
16777215
PS C:Userstanne> [int][float]16777216
16777216
PS C:Userstanne> [int][float]16777217
16777216
PS C:Userstanne> [int][float]16777218
16777218
PS C:Userstanne> [int][float]16777219
16777220
Notice that 16777217 got rounded down to 16777216, and 16777219 got rounded up to 16777220.
4
And with increasing magnitude, the distance between nearest representablefloat
s keeps growing as powers of. en.wikipedia.org/wiki/…
– Peter Cordes
Jun 3 at 5:05
add a comment |
Floating point types are similar to "scientific notation" in physics. The number is split up into a sign bit, an exponent (multiplier) and a mantissa (significant digits). So as the magnitude of the value increases the step size also increases.
Single precision floating point has 23 mantissa bits, but there is an "implicit 1", so the mantissa is effectively 24 bits. Therefore all integers with a magnitude up to 224 can be represented exactly in single precision floating point.
Above that successively fewer numbers can be represented.
- From 224 to 225 only even numbers can be represented.
- From 225 to 226 only multiples of 4 can be represented.
- From 226 to 227 only multiples of 8 can be represented.
- From 227 to 228 only multiples of 16 can be represented
- From 228 to 229 only multiples of 32 can be represented
- From 229 to 230 only multiples of 64 can be represented
- From 230 to 231 only multiples of 128 can be represented
So of the 232 possible 32 bit signed integer values only 2 * (224 + 7 * 223) = 9 * 224 can be represented in single precision floating point. That is 3.515625% of the total.
add a comment |
Single precision floats have 24 bits of precision. Anything over that is rounded to the nearest 24-bit number. It might be easier to understand in decimal scientific notation, but keep in mind actual floats use binary.
Say you have 5 decimal digits of memory. You can choose to use those like a regular unsigned int, allowing you to have any number between 0 and 99999. If you want to be able to represent larger numbers, you can use scientific notation and just allocate two digits to be the exponent, so you can now represent anything between 0 and 9.99 x 1099.
However, the biggest number you can represent exactly is now only 999. If you tried to represent 12345, you can get 1.23 x 104, or 1.24 x 104, but you can't represent any of the numbers in between, because you don't have enough digits available.
3
Using decimal digits is a nice idea that makes it easier to understand, but the last paragraph is a bit misleading: actually you can represent numbers higher than 999, and your example shows it: 12300 would be 1.23 x 10<sup>4<sup>. What you mean is that starting from that number there are gaps. Would you mind rephrasing it a bit?
– Fabio Turati
Jun 3 at 14:04
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "131"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f392723%2fwhen-conversion-from-integer-to-single-may-lose-precision%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
Single can store much more numbers than Integer
No, it can't. Both Single
and Integer
are 32 Bit, which means that both can store the exact same amount of numbers, namely 232 = 4294967296 distinct numbers.
Since the range of Single
is clearly larger than that, it is immediately obvious (because of the Pigeonhole Principle) that it cannot possibly represent all numbers within that range.
And since the range of Integer
is exactly the same size as the maximum amount of numbers that both Integer
and Single
can represent, but Single
can also represent numbers outside of that range, it is clear that it cannot possibly represent all numbers inside the range of Integer
.
If there are some numbers of Integer
that cannot be represented in Single
, converting from Integer
to Single
must be capable of losing information.
3
+1 for this great explanation of why that has to be the case, even though the question was actually when ("in what situation") it happens...
– doubleYou
Jun 2 at 11:25
1
I'd +1, but the wording in that last sentence feels a bit off. If the ints are small, then the mapping is locally injective, so there is no loss. There must be values where the conversion is lossy, but the conversion itself doesn't imply lose of information.
– VisualMelon
Jun 2 at 12:34
20
@doubleYou: 4261412864 of the 4294967296Integer
s (99.2%) cannot be represented asSingle
, so "when" is "pretty much always".
– Jörg W Mittag
Jun 2 at 17:50
10
en.wikipedia.org/wiki/Single-precision_floating-point_format explains the limitations nicely for IEEE754 binary32. Integers in[-16777216,16777216]
(2^24 = the significand width) can be exactly represented. Larger numbers are rounded to the nearest multiple of 2, 4, 8, ... depending on how large they are.
– Peter Cordes
Jun 3 at 5:04
14
“which means that both can store the exact same amount of numbers” — It does not mean that. It would only mean that if both types have the exact same number of ways of storing each number. And this isn’t the case; for instance,Single
has two ways of storing zero. SoSingle
can in fact represent fewer distinct numbers thanInteger
.
– Konrad Rudolph
Jun 3 at 15:01
|
show 5 more comments
Single can store much more numbers than Integer
No, it can't. Both Single
and Integer
are 32 Bit, which means that both can store the exact same amount of numbers, namely 232 = 4294967296 distinct numbers.
Since the range of Single
is clearly larger than that, it is immediately obvious (because of the Pigeonhole Principle) that it cannot possibly represent all numbers within that range.
And since the range of Integer
is exactly the same size as the maximum amount of numbers that both Integer
and Single
can represent, but Single
can also represent numbers outside of that range, it is clear that it cannot possibly represent all numbers inside the range of Integer
.
If there are some numbers of Integer
that cannot be represented in Single
, converting from Integer
to Single
must be capable of losing information.
3
+1 for this great explanation of why that has to be the case, even though the question was actually when ("in what situation") it happens...
– doubleYou
Jun 2 at 11:25
1
I'd +1, but the wording in that last sentence feels a bit off. If the ints are small, then the mapping is locally injective, so there is no loss. There must be values where the conversion is lossy, but the conversion itself doesn't imply lose of information.
– VisualMelon
Jun 2 at 12:34
20
@doubleYou: 4261412864 of the 4294967296Integer
s (99.2%) cannot be represented asSingle
, so "when" is "pretty much always".
– Jörg W Mittag
Jun 2 at 17:50
10
en.wikipedia.org/wiki/Single-precision_floating-point_format explains the limitations nicely for IEEE754 binary32. Integers in[-16777216,16777216]
(2^24 = the significand width) can be exactly represented. Larger numbers are rounded to the nearest multiple of 2, 4, 8, ... depending on how large they are.
– Peter Cordes
Jun 3 at 5:04
14
“which means that both can store the exact same amount of numbers” — It does not mean that. It would only mean that if both types have the exact same number of ways of storing each number. And this isn’t the case; for instance,Single
has two ways of storing zero. SoSingle
can in fact represent fewer distinct numbers thanInteger
.
– Konrad Rudolph
Jun 3 at 15:01
|
show 5 more comments
Single can store much more numbers than Integer
No, it can't. Both Single
and Integer
are 32 Bit, which means that both can store the exact same amount of numbers, namely 232 = 4294967296 distinct numbers.
Since the range of Single
is clearly larger than that, it is immediately obvious (because of the Pigeonhole Principle) that it cannot possibly represent all numbers within that range.
And since the range of Integer
is exactly the same size as the maximum amount of numbers that both Integer
and Single
can represent, but Single
can also represent numbers outside of that range, it is clear that it cannot possibly represent all numbers inside the range of Integer
.
If there are some numbers of Integer
that cannot be represented in Single
, converting from Integer
to Single
must be capable of losing information.
Single can store much more numbers than Integer
No, it can't. Both Single
and Integer
are 32 Bit, which means that both can store the exact same amount of numbers, namely 232 = 4294967296 distinct numbers.
Since the range of Single
is clearly larger than that, it is immediately obvious (because of the Pigeonhole Principle) that it cannot possibly represent all numbers within that range.
And since the range of Integer
is exactly the same size as the maximum amount of numbers that both Integer
and Single
can represent, but Single
can also represent numbers outside of that range, it is clear that it cannot possibly represent all numbers inside the range of Integer
.
If there are some numbers of Integer
that cannot be represented in Single
, converting from Integer
to Single
must be capable of losing information.
edited Jun 4 at 15:00
Mason Wheeler
75.2k19214300
75.2k19214300
answered Jun 2 at 1:06
Jörg W MittagJörg W Mittag
71.2k14147234
71.2k14147234
3
+1 for this great explanation of why that has to be the case, even though the question was actually when ("in what situation") it happens...
– doubleYou
Jun 2 at 11:25
1
I'd +1, but the wording in that last sentence feels a bit off. If the ints are small, then the mapping is locally injective, so there is no loss. There must be values where the conversion is lossy, but the conversion itself doesn't imply lose of information.
– VisualMelon
Jun 2 at 12:34
20
@doubleYou: 4261412864 of the 4294967296Integer
s (99.2%) cannot be represented asSingle
, so "when" is "pretty much always".
– Jörg W Mittag
Jun 2 at 17:50
10
en.wikipedia.org/wiki/Single-precision_floating-point_format explains the limitations nicely for IEEE754 binary32. Integers in[-16777216,16777216]
(2^24 = the significand width) can be exactly represented. Larger numbers are rounded to the nearest multiple of 2, 4, 8, ... depending on how large they are.
– Peter Cordes
Jun 3 at 5:04
14
“which means that both can store the exact same amount of numbers” — It does not mean that. It would only mean that if both types have the exact same number of ways of storing each number. And this isn’t the case; for instance,Single
has two ways of storing zero. SoSingle
can in fact represent fewer distinct numbers thanInteger
.
– Konrad Rudolph
Jun 3 at 15:01
|
show 5 more comments
3
+1 for this great explanation of why that has to be the case, even though the question was actually when ("in what situation") it happens...
– doubleYou
Jun 2 at 11:25
1
I'd +1, but the wording in that last sentence feels a bit off. If the ints are small, then the mapping is locally injective, so there is no loss. There must be values where the conversion is lossy, but the conversion itself doesn't imply lose of information.
– VisualMelon
Jun 2 at 12:34
20
@doubleYou: 4261412864 of the 4294967296Integer
s (99.2%) cannot be represented asSingle
, so "when" is "pretty much always".
– Jörg W Mittag
Jun 2 at 17:50
10
en.wikipedia.org/wiki/Single-precision_floating-point_format explains the limitations nicely for IEEE754 binary32. Integers in[-16777216,16777216]
(2^24 = the significand width) can be exactly represented. Larger numbers are rounded to the nearest multiple of 2, 4, 8, ... depending on how large they are.
– Peter Cordes
Jun 3 at 5:04
14
“which means that both can store the exact same amount of numbers” — It does not mean that. It would only mean that if both types have the exact same number of ways of storing each number. And this isn’t the case; for instance,Single
has two ways of storing zero. SoSingle
can in fact represent fewer distinct numbers thanInteger
.
– Konrad Rudolph
Jun 3 at 15:01
3
3
+1 for this great explanation of why that has to be the case, even though the question was actually when ("in what situation") it happens...
– doubleYou
Jun 2 at 11:25
+1 for this great explanation of why that has to be the case, even though the question was actually when ("in what situation") it happens...
– doubleYou
Jun 2 at 11:25
1
1
I'd +1, but the wording in that last sentence feels a bit off. If the ints are small, then the mapping is locally injective, so there is no loss. There must be values where the conversion is lossy, but the conversion itself doesn't imply lose of information.
– VisualMelon
Jun 2 at 12:34
I'd +1, but the wording in that last sentence feels a bit off. If the ints are small, then the mapping is locally injective, so there is no loss. There must be values where the conversion is lossy, but the conversion itself doesn't imply lose of information.
– VisualMelon
Jun 2 at 12:34
20
20
@doubleYou: 4261412864 of the 4294967296
Integer
s (99.2%) cannot be represented as Single
, so "when" is "pretty much always".– Jörg W Mittag
Jun 2 at 17:50
@doubleYou: 4261412864 of the 4294967296
Integer
s (99.2%) cannot be represented as Single
, so "when" is "pretty much always".– Jörg W Mittag
Jun 2 at 17:50
10
10
en.wikipedia.org/wiki/Single-precision_floating-point_format explains the limitations nicely for IEEE754 binary32. Integers in
[-16777216,16777216]
(2^24 = the significand width) can be exactly represented. Larger numbers are rounded to the nearest multiple of 2, 4, 8, ... depending on how large they are.– Peter Cordes
Jun 3 at 5:04
en.wikipedia.org/wiki/Single-precision_floating-point_format explains the limitations nicely for IEEE754 binary32. Integers in
[-16777216,16777216]
(2^24 = the significand width) can be exactly represented. Larger numbers are rounded to the nearest multiple of 2, 4, 8, ... depending on how large they are.– Peter Cordes
Jun 3 at 5:04
14
14
“which means that both can store the exact same amount of numbers” — It does not mean that. It would only mean that if both types have the exact same number of ways of storing each number. And this isn’t the case; for instance,
Single
has two ways of storing zero. So Single
can in fact represent fewer distinct numbers than Integer
.– Konrad Rudolph
Jun 3 at 15:01
“which means that both can store the exact same amount of numbers” — It does not mean that. It would only mean that if both types have the exact same number of ways of storing each number. And this isn’t the case; for instance,
Single
has two ways of storing zero. So Single
can in fact represent fewer distinct numbers than Integer
.– Konrad Rudolph
Jun 3 at 15:01
|
show 5 more comments
Floating point types (such as Single and Double) are represented in memory by a sign, a mantissa and an exponent. Think of it as scientific notation:
Sign*Mantissa*Base^Exponent
They - as you may expect - use base 2. There are other tweaks that allow for representing infinity and NaN, and the exponent is offset (will come back to that), and a shorthand for the mantissa (will come back to that too). Look for the standard IEEE 754 which covers its representation and operations for more details.
For our purposes we can imagine it as a binary number "mantissa", and an "exponent" that tells you where to put the decimal separator.
In the case of Single, we have 1 bit for he sign, 8 for the exponent and 23 for the mantissa.
Now, the thing is, we will store the mantissa from the most significant digit. Remember that all zeroes to the left are not relevant. And giving that we are working in binary, we know that the most significant digit is a 1※. Well, since we know that, we do not have to store it. Thanks to that shorthand, the effective range of the mantissa is 24 bits.
※: Unless the number we are storing is zero. For that we will have all the bits set to zero. However, if we try to interpret that under the description I gave, you would have a 2^24 (the implicit 1) multiplied by 1 (2 to the power of the exponent 0). So, to fix it, exponent zero is a special value. There are also special values to store infinity and NaN in the exponent.
As per the exponent offset - aside from avoiding the special values - having it offset allows to place the decimal point before the start of the mantissa or after its end, without the need to have a sign for the exponent.
This means that for large numbers, the floating point type will put the decimal point beyond the end of the mantissa.
Remember that the mantissa is a 24 bit number. It will never represent a 25 bit number... it does not have that extra bit. Thus, the single cannot distinguish between 2^24 and 2^24+1 (these are the first 25 bit numbers, and they differ on the last bit, which is not represented in the single).
Thus, for integers the range of the single is -2^24 to 2^24. And trying to add 1 to 2^24 will result in 2^24 (because as far as the type is concerned, 2^24 and 2^24+1 are the same value). Try it Online. This is why there is a loss of information when converting from integer to single. And this is also why a loop that uses a single or a double could actually be an infinite loop without you noticing.
This isn't a perfect explanation of the implicit leading1
bit in the significand. It's implied by the biased-exponent field being non-zero. Subnormals (aka denormals) including+-0.0
have a leading0
bit of their significand. I guess you could simplify to just consider0.0
a totally special case, but0.0
actually follows the same encoding rules as other subnormals.
– Peter Cordes
Jun 3 at 5:11
add a comment |
Floating point types (such as Single and Double) are represented in memory by a sign, a mantissa and an exponent. Think of it as scientific notation:
Sign*Mantissa*Base^Exponent
They - as you may expect - use base 2. There are other tweaks that allow for representing infinity and NaN, and the exponent is offset (will come back to that), and a shorthand for the mantissa (will come back to that too). Look for the standard IEEE 754 which covers its representation and operations for more details.
For our purposes we can imagine it as a binary number "mantissa", and an "exponent" that tells you where to put the decimal separator.
In the case of Single, we have 1 bit for he sign, 8 for the exponent and 23 for the mantissa.
Now, the thing is, we will store the mantissa from the most significant digit. Remember that all zeroes to the left are not relevant. And giving that we are working in binary, we know that the most significant digit is a 1※. Well, since we know that, we do not have to store it. Thanks to that shorthand, the effective range of the mantissa is 24 bits.
※: Unless the number we are storing is zero. For that we will have all the bits set to zero. However, if we try to interpret that under the description I gave, you would have a 2^24 (the implicit 1) multiplied by 1 (2 to the power of the exponent 0). So, to fix it, exponent zero is a special value. There are also special values to store infinity and NaN in the exponent.
As per the exponent offset - aside from avoiding the special values - having it offset allows to place the decimal point before the start of the mantissa or after its end, without the need to have a sign for the exponent.
This means that for large numbers, the floating point type will put the decimal point beyond the end of the mantissa.
Remember that the mantissa is a 24 bit number. It will never represent a 25 bit number... it does not have that extra bit. Thus, the single cannot distinguish between 2^24 and 2^24+1 (these are the first 25 bit numbers, and they differ on the last bit, which is not represented in the single).
Thus, for integers the range of the single is -2^24 to 2^24. And trying to add 1 to 2^24 will result in 2^24 (because as far as the type is concerned, 2^24 and 2^24+1 are the same value). Try it Online. This is why there is a loss of information when converting from integer to single. And this is also why a loop that uses a single or a double could actually be an infinite loop without you noticing.
This isn't a perfect explanation of the implicit leading1
bit in the significand. It's implied by the biased-exponent field being non-zero. Subnormals (aka denormals) including+-0.0
have a leading0
bit of their significand. I guess you could simplify to just consider0.0
a totally special case, but0.0
actually follows the same encoding rules as other subnormals.
– Peter Cordes
Jun 3 at 5:11
add a comment |
Floating point types (such as Single and Double) are represented in memory by a sign, a mantissa and an exponent. Think of it as scientific notation:
Sign*Mantissa*Base^Exponent
They - as you may expect - use base 2. There are other tweaks that allow for representing infinity and NaN, and the exponent is offset (will come back to that), and a shorthand for the mantissa (will come back to that too). Look for the standard IEEE 754 which covers its representation and operations for more details.
For our purposes we can imagine it as a binary number "mantissa", and an "exponent" that tells you where to put the decimal separator.
In the case of Single, we have 1 bit for he sign, 8 for the exponent and 23 for the mantissa.
Now, the thing is, we will store the mantissa from the most significant digit. Remember that all zeroes to the left are not relevant. And giving that we are working in binary, we know that the most significant digit is a 1※. Well, since we know that, we do not have to store it. Thanks to that shorthand, the effective range of the mantissa is 24 bits.
※: Unless the number we are storing is zero. For that we will have all the bits set to zero. However, if we try to interpret that under the description I gave, you would have a 2^24 (the implicit 1) multiplied by 1 (2 to the power of the exponent 0). So, to fix it, exponent zero is a special value. There are also special values to store infinity and NaN in the exponent.
As per the exponent offset - aside from avoiding the special values - having it offset allows to place the decimal point before the start of the mantissa or after its end, without the need to have a sign for the exponent.
This means that for large numbers, the floating point type will put the decimal point beyond the end of the mantissa.
Remember that the mantissa is a 24 bit number. It will never represent a 25 bit number... it does not have that extra bit. Thus, the single cannot distinguish between 2^24 and 2^24+1 (these are the first 25 bit numbers, and they differ on the last bit, which is not represented in the single).
Thus, for integers the range of the single is -2^24 to 2^24. And trying to add 1 to 2^24 will result in 2^24 (because as far as the type is concerned, 2^24 and 2^24+1 are the same value). Try it Online. This is why there is a loss of information when converting from integer to single. And this is also why a loop that uses a single or a double could actually be an infinite loop without you noticing.
Floating point types (such as Single and Double) are represented in memory by a sign, a mantissa and an exponent. Think of it as scientific notation:
Sign*Mantissa*Base^Exponent
They - as you may expect - use base 2. There are other tweaks that allow for representing infinity and NaN, and the exponent is offset (will come back to that), and a shorthand for the mantissa (will come back to that too). Look for the standard IEEE 754 which covers its representation and operations for more details.
For our purposes we can imagine it as a binary number "mantissa", and an "exponent" that tells you where to put the decimal separator.
In the case of Single, we have 1 bit for he sign, 8 for the exponent and 23 for the mantissa.
Now, the thing is, we will store the mantissa from the most significant digit. Remember that all zeroes to the left are not relevant. And giving that we are working in binary, we know that the most significant digit is a 1※. Well, since we know that, we do not have to store it. Thanks to that shorthand, the effective range of the mantissa is 24 bits.
※: Unless the number we are storing is zero. For that we will have all the bits set to zero. However, if we try to interpret that under the description I gave, you would have a 2^24 (the implicit 1) multiplied by 1 (2 to the power of the exponent 0). So, to fix it, exponent zero is a special value. There are also special values to store infinity and NaN in the exponent.
As per the exponent offset - aside from avoiding the special values - having it offset allows to place the decimal point before the start of the mantissa or after its end, without the need to have a sign for the exponent.
This means that for large numbers, the floating point type will put the decimal point beyond the end of the mantissa.
Remember that the mantissa is a 24 bit number. It will never represent a 25 bit number... it does not have that extra bit. Thus, the single cannot distinguish between 2^24 and 2^24+1 (these are the first 25 bit numbers, and they differ on the last bit, which is not represented in the single).
Thus, for integers the range of the single is -2^24 to 2^24. And trying to add 1 to 2^24 will result in 2^24 (because as far as the type is concerned, 2^24 and 2^24+1 are the same value). Try it Online. This is why there is a loss of information when converting from integer to single. And this is also why a loop that uses a single or a double could actually be an infinite loop without you noticing.
edited Jun 2 at 11:48
Glorfindel
2,49041828
2,49041828
answered Jun 2 at 10:49
TheraotTheraot
1,5421015
1,5421015
This isn't a perfect explanation of the implicit leading1
bit in the significand. It's implied by the biased-exponent field being non-zero. Subnormals (aka denormals) including+-0.0
have a leading0
bit of their significand. I guess you could simplify to just consider0.0
a totally special case, but0.0
actually follows the same encoding rules as other subnormals.
– Peter Cordes
Jun 3 at 5:11
add a comment |
This isn't a perfect explanation of the implicit leading1
bit in the significand. It's implied by the biased-exponent field being non-zero. Subnormals (aka denormals) including+-0.0
have a leading0
bit of their significand. I guess you could simplify to just consider0.0
a totally special case, but0.0
actually follows the same encoding rules as other subnormals.
– Peter Cordes
Jun 3 at 5:11
This isn't a perfect explanation of the implicit leading
1
bit in the significand. It's implied by the biased-exponent field being non-zero. Subnormals (aka denormals) including +-0.0
have a leading 0
bit of their significand. I guess you could simplify to just consider 0.0
a totally special case, but 0.0
actually follows the same encoding rules as other subnormals.– Peter Cordes
Jun 3 at 5:11
This isn't a perfect explanation of the implicit leading
1
bit in the significand. It's implied by the biased-exponent field being non-zero. Subnormals (aka denormals) including +-0.0
have a leading 0
bit of their significand. I guess you could simplify to just consider 0.0
a totally special case, but 0.0
actually follows the same encoding rules as other subnormals.– Peter Cordes
Jun 3 at 5:11
add a comment |
Here is an actual example of when converting from Integer
to Single
may lose precision:
The Single
type can store all integers from -16777216 to 16777216 (inclusive), but it cannot store all integers outside of this range. For example, it cannot store the number 16777217. For that matter, it cannot store any odd number greater than 16777216.
We can use Windows PowerShell to see what happens if we convert an Integer
to a Single
and back:
PS C:Userstanne> [int][float]16777213
16777213
PS C:Userstanne> [int][float]16777214
16777214
PS C:Userstanne> [int][float]16777215
16777215
PS C:Userstanne> [int][float]16777216
16777216
PS C:Userstanne> [int][float]16777217
16777216
PS C:Userstanne> [int][float]16777218
16777218
PS C:Userstanne> [int][float]16777219
16777220
Notice that 16777217 got rounded down to 16777216, and 16777219 got rounded up to 16777220.
4
And with increasing magnitude, the distance between nearest representablefloat
s keeps growing as powers of. en.wikipedia.org/wiki/…
– Peter Cordes
Jun 3 at 5:05
add a comment |
Here is an actual example of when converting from Integer
to Single
may lose precision:
The Single
type can store all integers from -16777216 to 16777216 (inclusive), but it cannot store all integers outside of this range. For example, it cannot store the number 16777217. For that matter, it cannot store any odd number greater than 16777216.
We can use Windows PowerShell to see what happens if we convert an Integer
to a Single
and back:
PS C:Userstanne> [int][float]16777213
16777213
PS C:Userstanne> [int][float]16777214
16777214
PS C:Userstanne> [int][float]16777215
16777215
PS C:Userstanne> [int][float]16777216
16777216
PS C:Userstanne> [int][float]16777217
16777216
PS C:Userstanne> [int][float]16777218
16777218
PS C:Userstanne> [int][float]16777219
16777220
Notice that 16777217 got rounded down to 16777216, and 16777219 got rounded up to 16777220.
4
And with increasing magnitude, the distance between nearest representablefloat
s keeps growing as powers of. en.wikipedia.org/wiki/…
– Peter Cordes
Jun 3 at 5:05
add a comment |
Here is an actual example of when converting from Integer
to Single
may lose precision:
The Single
type can store all integers from -16777216 to 16777216 (inclusive), but it cannot store all integers outside of this range. For example, it cannot store the number 16777217. For that matter, it cannot store any odd number greater than 16777216.
We can use Windows PowerShell to see what happens if we convert an Integer
to a Single
and back:
PS C:Userstanne> [int][float]16777213
16777213
PS C:Userstanne> [int][float]16777214
16777214
PS C:Userstanne> [int][float]16777215
16777215
PS C:Userstanne> [int][float]16777216
16777216
PS C:Userstanne> [int][float]16777217
16777216
PS C:Userstanne> [int][float]16777218
16777218
PS C:Userstanne> [int][float]16777219
16777220
Notice that 16777217 got rounded down to 16777216, and 16777219 got rounded up to 16777220.
Here is an actual example of when converting from Integer
to Single
may lose precision:
The Single
type can store all integers from -16777216 to 16777216 (inclusive), but it cannot store all integers outside of this range. For example, it cannot store the number 16777217. For that matter, it cannot store any odd number greater than 16777216.
We can use Windows PowerShell to see what happens if we convert an Integer
to a Single
and back:
PS C:Userstanne> [int][float]16777213
16777213
PS C:Userstanne> [int][float]16777214
16777214
PS C:Userstanne> [int][float]16777215
16777215
PS C:Userstanne> [int][float]16777216
16777216
PS C:Userstanne> [int][float]16777217
16777216
PS C:Userstanne> [int][float]16777218
16777218
PS C:Userstanne> [int][float]16777219
16777220
Notice that 16777217 got rounded down to 16777216, and 16777219 got rounded up to 16777220.
answered Jun 2 at 20:13
Tanner SwettTanner Swett
1,242513
1,242513
4
And with increasing magnitude, the distance between nearest representablefloat
s keeps growing as powers of. en.wikipedia.org/wiki/…
– Peter Cordes
Jun 3 at 5:05
add a comment |
4
And with increasing magnitude, the distance between nearest representablefloat
s keeps growing as powers of. en.wikipedia.org/wiki/…
– Peter Cordes
Jun 3 at 5:05
4
4
And with increasing magnitude, the distance between nearest representable
float
s keeps growing as powers of. en.wikipedia.org/wiki/…– Peter Cordes
Jun 3 at 5:05
And with increasing magnitude, the distance between nearest representable
float
s keeps growing as powers of. en.wikipedia.org/wiki/…– Peter Cordes
Jun 3 at 5:05
add a comment |
Floating point types are similar to "scientific notation" in physics. The number is split up into a sign bit, an exponent (multiplier) and a mantissa (significant digits). So as the magnitude of the value increases the step size also increases.
Single precision floating point has 23 mantissa bits, but there is an "implicit 1", so the mantissa is effectively 24 bits. Therefore all integers with a magnitude up to 224 can be represented exactly in single precision floating point.
Above that successively fewer numbers can be represented.
- From 224 to 225 only even numbers can be represented.
- From 225 to 226 only multiples of 4 can be represented.
- From 226 to 227 only multiples of 8 can be represented.
- From 227 to 228 only multiples of 16 can be represented
- From 228 to 229 only multiples of 32 can be represented
- From 229 to 230 only multiples of 64 can be represented
- From 230 to 231 only multiples of 128 can be represented
So of the 232 possible 32 bit signed integer values only 2 * (224 + 7 * 223) = 9 * 224 can be represented in single precision floating point. That is 3.515625% of the total.
add a comment |
Floating point types are similar to "scientific notation" in physics. The number is split up into a sign bit, an exponent (multiplier) and a mantissa (significant digits). So as the magnitude of the value increases the step size also increases.
Single precision floating point has 23 mantissa bits, but there is an "implicit 1", so the mantissa is effectively 24 bits. Therefore all integers with a magnitude up to 224 can be represented exactly in single precision floating point.
Above that successively fewer numbers can be represented.
- From 224 to 225 only even numbers can be represented.
- From 225 to 226 only multiples of 4 can be represented.
- From 226 to 227 only multiples of 8 can be represented.
- From 227 to 228 only multiples of 16 can be represented
- From 228 to 229 only multiples of 32 can be represented
- From 229 to 230 only multiples of 64 can be represented
- From 230 to 231 only multiples of 128 can be represented
So of the 232 possible 32 bit signed integer values only 2 * (224 + 7 * 223) = 9 * 224 can be represented in single precision floating point. That is 3.515625% of the total.
add a comment |
Floating point types are similar to "scientific notation" in physics. The number is split up into a sign bit, an exponent (multiplier) and a mantissa (significant digits). So as the magnitude of the value increases the step size also increases.
Single precision floating point has 23 mantissa bits, but there is an "implicit 1", so the mantissa is effectively 24 bits. Therefore all integers with a magnitude up to 224 can be represented exactly in single precision floating point.
Above that successively fewer numbers can be represented.
- From 224 to 225 only even numbers can be represented.
- From 225 to 226 only multiples of 4 can be represented.
- From 226 to 227 only multiples of 8 can be represented.
- From 227 to 228 only multiples of 16 can be represented
- From 228 to 229 only multiples of 32 can be represented
- From 229 to 230 only multiples of 64 can be represented
- From 230 to 231 only multiples of 128 can be represented
So of the 232 possible 32 bit signed integer values only 2 * (224 + 7 * 223) = 9 * 224 can be represented in single precision floating point. That is 3.515625% of the total.
Floating point types are similar to "scientific notation" in physics. The number is split up into a sign bit, an exponent (multiplier) and a mantissa (significant digits). So as the magnitude of the value increases the step size also increases.
Single precision floating point has 23 mantissa bits, but there is an "implicit 1", so the mantissa is effectively 24 bits. Therefore all integers with a magnitude up to 224 can be represented exactly in single precision floating point.
Above that successively fewer numbers can be represented.
- From 224 to 225 only even numbers can be represented.
- From 225 to 226 only multiples of 4 can be represented.
- From 226 to 227 only multiples of 8 can be represented.
- From 227 to 228 only multiples of 16 can be represented
- From 228 to 229 only multiples of 32 can be represented
- From 229 to 230 only multiples of 64 can be represented
- From 230 to 231 only multiples of 128 can be represented
So of the 232 possible 32 bit signed integer values only 2 * (224 + 7 * 223) = 9 * 224 can be represented in single precision floating point. That is 3.515625% of the total.
answered Jun 3 at 3:25
Peter GreenPeter Green
1,792514
1,792514
add a comment |
add a comment |
Single precision floats have 24 bits of precision. Anything over that is rounded to the nearest 24-bit number. It might be easier to understand in decimal scientific notation, but keep in mind actual floats use binary.
Say you have 5 decimal digits of memory. You can choose to use those like a regular unsigned int, allowing you to have any number between 0 and 99999. If you want to be able to represent larger numbers, you can use scientific notation and just allocate two digits to be the exponent, so you can now represent anything between 0 and 9.99 x 1099.
However, the biggest number you can represent exactly is now only 999. If you tried to represent 12345, you can get 1.23 x 104, or 1.24 x 104, but you can't represent any of the numbers in between, because you don't have enough digits available.
3
Using decimal digits is a nice idea that makes it easier to understand, but the last paragraph is a bit misleading: actually you can represent numbers higher than 999, and your example shows it: 12300 would be 1.23 x 10<sup>4<sup>. What you mean is that starting from that number there are gaps. Would you mind rephrasing it a bit?
– Fabio Turati
Jun 3 at 14:04
add a comment |
Single precision floats have 24 bits of precision. Anything over that is rounded to the nearest 24-bit number. It might be easier to understand in decimal scientific notation, but keep in mind actual floats use binary.
Say you have 5 decimal digits of memory. You can choose to use those like a regular unsigned int, allowing you to have any number between 0 and 99999. If you want to be able to represent larger numbers, you can use scientific notation and just allocate two digits to be the exponent, so you can now represent anything between 0 and 9.99 x 1099.
However, the biggest number you can represent exactly is now only 999. If you tried to represent 12345, you can get 1.23 x 104, or 1.24 x 104, but you can't represent any of the numbers in between, because you don't have enough digits available.
3
Using decimal digits is a nice idea that makes it easier to understand, but the last paragraph is a bit misleading: actually you can represent numbers higher than 999, and your example shows it: 12300 would be 1.23 x 10<sup>4<sup>. What you mean is that starting from that number there are gaps. Would you mind rephrasing it a bit?
– Fabio Turati
Jun 3 at 14:04
add a comment |
Single precision floats have 24 bits of precision. Anything over that is rounded to the nearest 24-bit number. It might be easier to understand in decimal scientific notation, but keep in mind actual floats use binary.
Say you have 5 decimal digits of memory. You can choose to use those like a regular unsigned int, allowing you to have any number between 0 and 99999. If you want to be able to represent larger numbers, you can use scientific notation and just allocate two digits to be the exponent, so you can now represent anything between 0 and 9.99 x 1099.
However, the biggest number you can represent exactly is now only 999. If you tried to represent 12345, you can get 1.23 x 104, or 1.24 x 104, but you can't represent any of the numbers in between, because you don't have enough digits available.
Single precision floats have 24 bits of precision. Anything over that is rounded to the nearest 24-bit number. It might be easier to understand in decimal scientific notation, but keep in mind actual floats use binary.
Say you have 5 decimal digits of memory. You can choose to use those like a regular unsigned int, allowing you to have any number between 0 and 99999. If you want to be able to represent larger numbers, you can use scientific notation and just allocate two digits to be the exponent, so you can now represent anything between 0 and 9.99 x 1099.
However, the biggest number you can represent exactly is now only 999. If you tried to represent 12345, you can get 1.23 x 104, or 1.24 x 104, but you can't represent any of the numbers in between, because you don't have enough digits available.
answered Jun 2 at 2:32
Karl BielefeldtKarl Bielefeldt
122k34217420
122k34217420
3
Using decimal digits is a nice idea that makes it easier to understand, but the last paragraph is a bit misleading: actually you can represent numbers higher than 999, and your example shows it: 12300 would be 1.23 x 10<sup>4<sup>. What you mean is that starting from that number there are gaps. Would you mind rephrasing it a bit?
– Fabio Turati
Jun 3 at 14:04
add a comment |
3
Using decimal digits is a nice idea that makes it easier to understand, but the last paragraph is a bit misleading: actually you can represent numbers higher than 999, and your example shows it: 12300 would be 1.23 x 10<sup>4<sup>. What you mean is that starting from that number there are gaps. Would you mind rephrasing it a bit?
– Fabio Turati
Jun 3 at 14:04
3
3
Using decimal digits is a nice idea that makes it easier to understand, but the last paragraph is a bit misleading: actually you can represent numbers higher than 999, and your example shows it: 12300 would be 1.23 x 10<sup>4<sup>. What you mean is that starting from that number there are gaps. Would you mind rephrasing it a bit?
– Fabio Turati
Jun 3 at 14:04
Using decimal digits is a nice idea that makes it easier to understand, but the last paragraph is a bit misleading: actually you can represent numbers higher than 999, and your example shows it: 12300 would be 1.23 x 10<sup>4<sup>. What you mean is that starting from that number there are gaps. Would you mind rephrasing it a bit?
– Fabio Turati
Jun 3 at 14:04
add a comment |
Thanks for contributing an answer to Software Engineering Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f392723%2fwhen-conversion-from-integer-to-single-may-lose-precision%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown